dplyr across() in R: Apply Functions to Many Columns
The across() function inside mutate() or summarise() applies a function to many columns at once, replacing the older _at, _if, and _all variants. Pair it with where(), starts_with(), or any tidyselect helper.
mutate(df, across(where(is.numeric), scale)) # all numeric cols summarise(df, across(c(mpg, hp), mean)) # specific cols summarise(df, across(where(is.numeric), mean, .names = "avg_{.col}")) # rename mutate(df, across(starts_with("x"), ~ . * 100)) # by prefix summarise(df, across(where(is.numeric), list(mean = mean, sd = sd))) # multi fn mutate(df, across(everything(), as.character)) # all cols mutate(df, across(c(a, b), ~ replace_na(., 0))) # custom lambda
Need explanation? Read on for examples and pitfalls.
What across() does in one sentence
across() is the bulk-column verb. Inside mutate() or summarise(), you give it a column selector (tidyselect helper) and one function (or list of functions). dplyr applies the function to every selected column and either replaces the columns (in mutate) or produces summary columns (in summarise).
Unlike the deprecated mutate_at(), mutate_if(), mutate_all() family, across() uses the same tidyselect grammar as select(), so the same where(), starts_with(), c(a, b) syntax works everywhere.
Syntax
across() takes a column selector plus a function (or list of functions). The selector can be tidyselect helpers, bare names, or a vector. The function can be a name, a lambda (~ . * 2), or a named list of functions for multi-output.
The full signature is:
across(.cols = everything(), .fns = NULL, ..., .names = NULL, .unpack = FALSE)
.cols selects columns via tidyselect. .fns is the function or list of functions. .names is a glue template for output column names (default keeps original names for one-fn, appends fn name for multi-fn).
across() only works INSIDE another dplyr verb. You cannot call across(df, ...) directly. It must be wrapped: mutate(df, across(...)), summarise(df, across(...)), or filter(df, if_any(across(...))).Six common patterns
1. Apply one function to all numeric columns
where(is.numeric) selects columns where the predicate returns TRUE. scale() is applied to each.
2. Specific columns by name
Pass a vector of bare names to c(...) to target specific columns.
3. Multiple functions producing multiple outputs
A named list of functions creates one output column per (input column, function) pair. Default naming is {.col}_{.fn}.
4. Custom lambda with anonymous function
The ~ . * 100 is shorthand for function(x) x * 100. The . placeholder refers to each column's values.
5. Custom output names with .names
.names is a glue template. {.col} is the input column name; {.fn} is the function name (when using a named list).
6. Conditional replacement across columns
The lambda receives each numeric column; non-numeric columns (name) are untouched.
across() made mutate_at(), mutate_if(), and mutate_all() obsolete in dplyr 1.0+. The old code mutate_if(df, is.numeric, scale) becomes mutate(df, across(where(is.numeric), scale)). Same result, more composable, fewer functions to remember. If you see _at/_if/_all in tutorials from 2019 or earlier, they still work but are superseded.across() vs the legacy _at/_if/_all family
across() is the modern unified replacement. The legacy functions still work but are not recommended for new code.
| Task | Modern (across) | Legacy (_at / _if / _all) |
|---|---|---|
| Apply to numeric | mutate(across(where(is.numeric), scale)) |
mutate_if(is.numeric, scale) |
| Apply to specific | mutate(across(c(a,b), log)) |
mutate_at(vars(a,b), log) |
| Apply to all | mutate(across(everything(), as.character)) |
mutate_all(as.character) |
| Multi-function | summarise(across(., list(m=mean, s=sd))) |
summarise_at(vars(...), funs(mean, sd)) |
| Rename outputs | summarise(across(., mean, .names="avg_{.col}")) |
(awkward) |
When to use which:
- Always use
across()in new code. - The legacy functions still exist in dplyr for backward compatibility but produce deprecation warnings.
Common pitfalls
Pitfall 1: forgetting that across() returns columns, not a single value. Inside summarise(), across(c(mpg, hp), mean) returns TWO columns (mpg, hp), not a vector. The result is a row of summaries, not a single number.
Pitfall 2: trying to use across() outside a verb. across(df, ...) errors. It is a helper that only works inside mutate(), summarise(), filter(if_any(...)), etc.
~ . * 2 uses . as the column placeholder. If you want to write a multi-step transform, you must wrap in braces: ~ { .x <- as.numeric(.); .x * 2 }. The default . is fine for one-step expressions; for multi-step, use a named function or function(x) {...} syntax.Pitfall 3: where() predicate is column-level, not value-level. where(is.na) selects columns that are ENTIRELY NA, not columns containing any NA. To check for any NA in a column, use where(~ any(is.na(.))).
Try it yourself
Try it: Use across() to compute the mean of every numeric column in mtcars. Save the result to ex_means.
Click to reveal solution
Explanation: across(where(is.numeric), mean) selects every column where is.numeric() returns TRUE, then applies mean() to each. The output is one summary column per input column.
Related dplyr functions
After mastering across(), look at:
where(): column-level predicate selector inside tidyselectif_any(),if_all(): row-level predicate combiners (use insidefilter())pick(): tidyselect insidearrange()andsummarise()to select column subsetsrename_with(): rename columns by applying a function to names- Legacy
_at/_if/_all: avoid in new code, but readable in old codebases
For multi-column transformations that need different functions per column, just write multiple name = expression pairs inside mutate() directly.
FAQ
What is the difference between mutate_if and across in dplyr?
mutate_if(df, is.numeric, scale) is the legacy syntax (dplyr 0.x). mutate(df, across(where(is.numeric), scale)) is the modern equivalent (dplyr 1.0+). Same result, but across() is more composable and uses the unified tidyselect grammar.
How do I use across with multiple functions in dplyr?
Pass a named list: summarise(df, across(where(is.numeric), list(mean = mean, sd = sd, n = ~ sum(!is.na(.))))). The result has one column per (input column, function) pair, named {.col}_{.fn} by default.
Can I use across inside filter?
Not directly. Use if_any() or if_all(): filter(df, if_any(c(a, b), ~ . > 0)) keeps rows where at least one of a or b is positive. if_all() requires all selected columns to satisfy the predicate.
Why does my across call return weird column names?
If you pass a named list of functions, output is {.col}_{.fn}. If you pass a single function, output keeps the original column name (overwriting them). If you want custom names, use the .names argument with a glue template: .names = "z_{.col}".
Does across work with character columns?
Yes, but you need to select them explicitly: across(where(is.character), toupper). By default where(is.numeric) skips character columns. Use where(is.character) or c(name1, name2) to target specifics.