dplyr c_across() in R: Combine Columns Within rowwise()
The c_across() function in dplyr concatenates values from multiple columns into a single vector, ROW BY ROW, when used inside rowwise(). It is the row-wise counterpart of across().
df |> rowwise() |> mutate(total = sum(c_across(x:z)))
df |> rowwise() |> mutate(avg = mean(c_across(starts_with("score"))))
df |> rowwise() |> mutate(n_na = sum(is.na(c_across(everything()))))
df |> rowwise() |> mutate(min_v = min(c_across(where(is.numeric))))
df |> mutate(total = rowSums(across(x:z))) # often faster
df |> mutate(total = pmap_dbl(across(x:z), sum)) # purrr alternativeNeed explanation? Read on for examples and pitfalls.
What c_across() does in one sentence
c_across(cols) collects the values from cols of the CURRENT row into a single vector, so you can call sum, mean, paste, etc. on it. It only makes sense inside rowwise(); outside it, behavior is undefined.
c_across() is the row-wise sister of across(). Where across() applies a function to many COLUMNS, c_across() collects values from many columns into ONE row-wise vector.
Syntax
c_across(cols). cols uses tidyselect helpers: everything(), starts_with(), where(is.numeric), x:z.
rowSums() and rowMeans() are faster than rowwise() + c_across(). Reserve c_across for non-vectorized operations like paste, min, max over arbitrary column subsets.Five common patterns
1. Row-wise sum
For pure sums, mutate(total = rowSums(across(x:z))) is faster.
2. Row-wise mean of selected columns
where(is.numeric) selects numeric columns dynamically.
3. Count NAs per row
A common data-quality check.
4. Per-row min or max
base R has pmin(x, y, z) for parallel min; c_across is more general.
5. Combine string columns row-wise
For text-joining, tidyr::unite() is often cleaner.
c_across() requires rowwise() to make sense. Without rowwise, c_across either errors or returns the entire column. They are a paired idiom. If you forget rowwise, you get all values across the whole table, not per-row.c_across() vs across() vs rowSums()
Three approaches to "operate over columns" in dplyr.
| Function | Style | Speed | Best for |
|---|---|---|---|
c_across(cols) |
Row-wise (one vector per row) | Slower | Non-vectorized reductions |
across(cols, fn) |
Column-wise (apply fn to each col) | Fast | Per-column transformations |
rowSums() / rowMeans() |
Built-in row-wise | Fastest | Sum / mean across columns |
When to use which:
rowSums(across(x:z))for fast row-wise sum.c_acrossinside rowwise for arbitrary row-wise reductions.across()for "apply fn to each column" (no rowwise needed).
A practical workflow
Most c_across uses fall into three categories: row sums of subsets, NA counts per row, and string-paste per row. For each, there is a faster specialized tool, but c_across handles the irregular cases:
- Variable subset of columns chosen via tidyselect each call.
- Reductions that aren't
sum/mean(e.g.,paste,var,median). - Custom logic per row (e.g., "is at least one column > threshold?").
If you find yourself writing rowwise() + c_across() for sum/mean, switch to rowSums()/rowMeans() with across() for 10-100x speedup on large data.
Common pitfalls
Pitfall 1: forgetting rowwise. df |> mutate(total = sum(c_across(x:z))) returns the SUM OF THE ENTIRE x-to-z block, not per row. Add rowwise().
Pitfall 2: forgetting to ungroup. rowwise() is a special grouping. Downstream operations stay rowwise unless you ungroup(). Subtle bugs result from this.
rowwise() + c_across() is SLOW on large data frames. Each row does a separate function call. For sum/mean over many rows, use rowSums() or rowMeans() (vectorized C code). Reserve c_across for situations where vectorized alternatives don't exist.Try it yourself
Try it: For each row of a data frame with columns a, b, c, d, compute the maximum value across the four columns. Save to ex_maxes.
Click to reveal solution
Explanation: c_across(a:d) collects each row's a, b, c, d values; max() reduces to a scalar. For pure pmax, base R is faster.
Related dplyr functions
After mastering c_across, look at:
across(): per-column counterpartrowwise(): required pairing with c_acrossrowSums()/rowMeans(): fast specializedpmap()/pmap_dbl(): purrr alternative for row-wise mappingpmin()/pmax(): parallel min/max in base Rtidyr::unite(): text concatenation across columns
For most numeric row reductions, the rowSums/rowMeans family beats c_across on speed and clarity.
FAQ
What is the difference between c_across and across in dplyr?
across() applies a function to each COLUMN; c_across() collects values from many columns into ONE row-wise vector. across is column-wise; c_across is row-wise (and requires rowwise).
Why do I need rowwise() with c_across?
c_across() only makes sense per row. Without rowwise(), it returns the entire concatenated block of columns, not a per-row vector. They must be paired.
Is c_across slow for row sums?
Yes. For pure row sum or mean, prefer rowSums(across(x:z)) or rowMeans(across(x:z)): vectorized C code, much faster than rowwise + c_across.
How do I count NAs per row in dplyr?
df |> rowwise() |> mutate(n_na = sum(is.na(c_across(everything())))) |> ungroup(). Or for speed: mutate(n_na = rowSums(is.na(across(everything())))).
Can I use tidyselect helpers inside c_across?
Yes. c_across(starts_with("x_")), c_across(where(is.numeric)), c_across(everything()) all work. Same syntax as select() and across().