dplyr across() in R: Apply the Same Function to Multiple Columns at Once
across() lets you apply the same function, or several functions, to many columns at once inside mutate(), summarise(), and (via if_any()/if_all()) filter(). It replaces the old _at, _if, and _all scoped verbs with one unified, tidyselect-aware tool.
How does across() round or scale many columns in one line?
The fastest way to feel why across() exists is to round every numeric column of a data frame in one line, instead of typing each column name. The block below loads dplyr, then rounds the four numeric iris columns to one decimal in a single mutate(across(...)) call. The first three rows print so you can see the result.
All four numeric columns were rounded in a single call. The Species column was left untouched because where(is.numeric) skipped it, it is a factor, not a number. That is the whole idea behind across(): pick the columns with one expression, apply the function with another.
The two arguments you care about are .cols (which columns) and .fns (what function). .cols accepts the same selectors select() understands, names, helpers like starts_with(), or predicates like where(is.numeric). .fns accepts a function (mean), an anonymous function (\(x) round(x, 1)), or a named list of functions for multi-output summaries.
across() example in the wild becomes a variation on the same shape, no matter how baroque it looks.Try it: Use across() to double every numeric column of mtcars, then show the first three rows. Save the result to ex_doubled.
Click to reveal solution
Explanation: where(is.numeric) matches all columns (every mtcars column happens to be numeric); the lambda \(x) x * 2 runs against each in turn.
How do you pick which columns across() touches?
Choosing columns is half the job. across() accepts every selector you already know from select(): bare names, prefix helpers, type predicates, and exclusions. Picking the right selector is what turns a brittle script into one that survives schema changes.
Four selectors, four different column sets, same summarise() shell. c(mpg, hp, wt) is fine for an ad-hoc one-off. where(is.numeric) is the workhorse: it adapts when columns are added, removed, or renamed. Prefix helpers (starts_with, ends_with, contains, matches) are best when your columns share a naming convention. Exclusion (-c(...)) is the inverse.
where(is.numeric) over hard-coded names. A pipeline that selects by type silently picks up new numeric columns the day they appear; one that lists names breaks loudly when a column gets renamed.Try it: Compute the mean of every iris column whose name starts with "Petal". Save the result to ex_petal.
Click to reveal solution
Explanation: starts_with("Petal") matches Petal.Length and Petal.Width, the two iris columns whose names begin with that prefix.
How do you apply several functions and name the output columns?
Often you want more than one summary per column, a mean and a standard deviation, or min, max, and median together. Pass a named list of functions to .fns and across() produces one output column per (input column, function) pair. Use .names to control how those output columns are named.
Two input columns (mpg, hp) times two functions (avg, sd) yields four output columns: mpg_avg, mpg_sd, hp_avg, hp_sd. The glue spec "{.col}_{.fn}" joins each column name with each function name. Flip it to "{.fn}_{.col}" and you get avg_mpg, sd_mpg, etc., the same numbers, different shape.
.names when you want a different shape. With a single function, the default name is just the column name. With many functions, the default is "{.col}_{.fn}". Set .names only when those defaults do not match what you want.Try it: Summarise mtcars with the min and max of mpg and wt together. Use .names = "{.fn}_{.col}" so the output columns are min_mpg, max_mpg, min_wt, max_wt. Save to ex_minmax.
Click to reveal solution
Explanation: The glue "{.fn}_{.col}" puts the function name first, then the column name. Two columns × two functions = four outputs.
How do you use across() inside mutate() to make new columns?
Inside mutate(), across() replaces the matched columns by default, the originals are gone. To keep both the original and the transformed values, pass a .names template that produces new column names. This is the standard feature-engineering pattern.
The originals (mpg, hp, wt) sit next to their z-scored siblings (mpg_z, hp_z, wt_z). The lambda runs once per column, mean(x) and sd(x) use that column's own values, not a global mean. This is the most common shape you will reach for in real pipelines: take a few columns, transform them, keep both old and new.
.names, across() overwrites the originals. If you write mutate(across(c(mpg, hp, wt), scale)) the old mpg, hp, wt columns are gone. Set .names = "{.col}_something" whenever you want both versions side-by-side.Try it: Add mpg_log and hp_log columns to mtcars using log(), keeping the originals. Save to ex_log.
Click to reveal solution
Explanation: Passing log (no parentheses) tells across() to apply the function as-is. The .names glue keeps both originals and the new *_log columns.
How do you filter rows with across(), using if_any() and if_all()?
across() itself does not work directly inside filter(), filter() expects a single logical vector per row, but across() returns one per column. The companions if_any() and if_all() collapse those per-column logicals into one row-wise verdict.
if_any keeps a row when the condition fires in at least one selected column, useful for "anything weird?" checks like if_any(everything(), is.na). if_all is the strict cousin: every selected column must satisfy the predicate. Same selectors, same lambdas, opposite logic.
if_any is OR across columns, if_all is AND across columns. Once you read the names that way, every filter you write with them becomes self-documenting, no need to remember which is which.Try it: Keep mtcars rows where all three of disp, hp, and wt are above their own column means. Save to ex_strong and show the first three rows.
Click to reveal solution
Explanation: The shared predicate \(x) x > mean(x) runs against each of the three columns. if_all() keeps a row only when every column's value beats its own mean, this is the kind of uniform predicate if_all() is built for.
Practice Exercises
Exercise 1: Mean and median of every numeric column
Summarise airquality with the mean and median of every numeric column, ignoring missing values. Use .names = "{.fn}_{.col}" so the output columns are mean_Ozone, med_Ozone, etc. (use med not median as the function-name alias). Save the result to my_aq_summary.
Click to reveal solution
Explanation: The named list list(mean = ..., med = ...) produces two outputs per column. The na.rm = TRUE lives inside the lambda because across() cannot pass extra arguments through ... anymore, the lambda is the modern way.
Exercise 2: Express columns as a percentage of their max
In mtcars, create new columns mpg_pct, hp_pct, and wt_pct that express each value as a percentage of that column's maximum. Round to one decimal. Use one mutate(across(...)) call with .names. Save to my_pct and show the first three rows side-by-side with the originals.
Click to reveal solution
Explanation: max(x) runs once per column inside the lambda, producing column-specific percentages. .names = "{.col}_pct" keeps the originals and appends the new columns.
Complete Example
Here is a mini end-to-end pipeline that uses every idea from this tutorial: pick all numeric columns, group by a categorical, summarise the mean of each numeric column per group, and keep only groups where at least one summary is not missing.
Three across()-family calls cooperate. select(-films, ...) strips the list-columns that would break numeric summaries. summarise(across(where(is.numeric), ...)) aggregates every remaining numeric column with one shared lambda. filter(if_any(where(is.numeric), \(x) !is.nan(x))) drops species rows that are entirely NaN, using if_any to mean "keep me if any column has a real value".
Summary
| Pattern | Code |
|---|---|
| Same function on all numerics | mutate(across(where(is.numeric), fn)) |
| By name | across(c(a, b), fn) |
| By prefix or suffix | across(starts_with("x"), fn) |
| Many functions, custom names | across(cols, list(avg = mean, sd = sd), .names = "{.col}_{.fn}") |
| New columns, keep originals | mutate(across(cols, fn, .names = "{.col}_new")) |
| Filter, match ANY column | filter(if_any(cols, \(x) x > 0)) |
| Filter, match ALL columns | filter(if_all(cols, \(x) x > 0)) |
| Replace deprecated scoped verbs | mutate_if(is.numeric, fn) → mutate(across(where(is.numeric), fn)) |
Three things to remember: across() is a selector + function pair; .names controls whether you keep or replace the originals; if_any and if_all are the only way to use it inside filter().
References
- dplyr,
across()reference. - dplyr, Column-wise operations vignette.
- Wickham, H. & Grolemund, G., R for Data Science, 2nd Edition, Chapter 28: Iteration.
- tidyverse blog, dplyr 1.0.0 release notes (introduces
across()). - tidyselect, selection language reference.
Continue Learning
- dplyr mutate() and rename(), the parent tutorial where
across()is most often used for feature engineering. - dplyr group_by() and summarise(), how
across()slots into grouped aggregation. - dplyr filter() and select(), where
if_any()andif_all()shine for row filtering.