dplyr pick() in R: Select Columns Inside dplyr Verbs
The pick() function in dplyr 1.1 selects columns INSIDE a dplyr verb (mutate, summarise, filter), returning a TIBBLE of the chosen columns for use in expressions. It complements across() for "many columns at once" computations.
df |> mutate(total = rowSums(pick(x:z)))
df |> summarise(across(everything(), mean), n = nrow(pick(everything())))
df |> filter(rowSums(pick(where(is.numeric)) > 0) > 0)
df |> mutate(combo = paste(pick(a, b), collapse = "-"))
across(cols, fn) # related: applies fn per columnNeed explanation? Read on for examples and pitfalls.
What pick() does in one sentence
pick(...) selects columns using tidyselect helpers and returns them as a TIBBLE; it is meant for use INSIDE dplyr verbs (mutate, summarise, filter). Outside those contexts, behavior is undefined.
pick was added in dplyr 1.1 as a cleaner replacement for across(.fns = NULL) and the older cur_data() patterns.
Syntax
pick(...). ... uses tidyselect: everything(), where(is.numeric), x:z.
pick() when a function takes a DATA FRAME (or tibble) as input, like rowSums, nrow, n_distinct. Use across() when applying a function PER COLUMN.Five common patterns
1. Row-wise sum / mean
pick(x:z) returns a tibble; rowSums operates on it.
2. Count distinct combinations
n_distinct(pick(...)) counts distinct combinations of x and y.
3. Filter by row-wise condition
pick(where(is.numeric)) selects numeric columns; rowSums(... > 5) checks if any are > 5 per row.
4. Pass multiple columns to a custom function
When my_function expects a tibble of inputs.
5. Replace older cur_data() pattern
pick(everything()) is the modern replacement for cur_data().
pick() returns a TIBBLE; across() applies a function per column. They solve different problems: pick when the downstream function takes a data frame; across when it takes a vector and you want to apply per column. Both use tidyselect for column selection.pick() vs across() vs select()
Three "select columns" verbs in dplyr.
| Function | Returns | Where used | Best for |
|---|---|---|---|
pick(cols) |
Tibble of cols | Inside mutate/summarise/filter | Pass multiple cols to one function |
across(cols, fn) |
Auto-named columns | Inside mutate/summarise | Apply fn per column |
select(df, cols) |
Filtered data frame | Top-level pipeline | Keep only chosen columns |
When to use which:
selectfor picking columns to keep in the output.acrossfor "apply fn to each of these columns".pickfor "give me these columns as a tibble for a single computation".
A practical workflow
The "row-wise computation" pattern is pick's killer use case.
Compute total, count of non-NA, and average across all score_* columns. Without pick, you'd need to list each score column manually or use rowwise (slower).
For categorical concatenation:
Common pitfalls
Pitfall 1: pick outside dplyr verbs. mtcars |> pick(mpg:wt) errors. pick must be called inside mutate, summarise, or filter.
Pitfall 2: confusing pick with across. pick returns a tibble (one object); across applies a function and returns multiple columns. They are different shapes.
pick() is dplyr 1.1+ only. Older dplyr installations don't have it. For older code, the equivalent was select(df, cols) outside the verb, or across(cols) with NULL function (deprecated).When pick replaced cur_data
Before pick was added in dplyr 1.1, accessing the current group's data inside summarise required cur_data(). This was confusing because cur_data sometimes meant "all columns" and sometimes meant "selected columns" depending on context. pick replaces it with explicit column selection: pick(everything()) is the modern equivalent of cur_data(), and pick(specific, cols) is the explicit subset version. This change makes it clearer in code reviews exactly which columns a function call operates on, reducing the cognitive overhead of debugging summarise expressions.
Try it yourself
Try it: Compute a row-wise total of mpg, disp, and hp for each car using pick + rowSums. Save to ex_total.
Click to reveal solution
Explanation: pick(mpg, disp, hp) returns a 3-column tibble; rowSums adds them per row.
Related dplyr functions
After mastering pick, look at:
across(): per-column transformationselect(): top-level column selectioncur_data(): deprecated; use pick(everything())c_across(): rowwise sister of acrossrowwise(): per-row evaluation contextn_distinct(): count unique combinations
For "apply fn to each column", across is correct. For "operate on multi-column block", pick is the modern tool.
FAQ
What does pick do in dplyr?
pick(...) selects columns using tidyselect helpers and returns them as a tibble. Used inside dplyr verbs to pass multiple columns to a function that expects a data frame.
What is the difference between pick and across in dplyr?
pick returns ONE tibble of the selected columns. across applies a function PER column and returns multiple columns. Different return shapes, different use cases.
When was pick introduced in dplyr?
In dplyr 1.1.0 (Jan 2023). Older versions don't have it; use cur_data() or across() workarounds.
Can pick be used outside dplyr verbs?
No. pick errors outside mutate, summarise, filter. For top-level column selection, use select().
How do I count distinct combinations of multiple columns?
n_distinct(pick(col1, col2)) returns the count of unique (col1, col2) tuples. Cleaner than n_distinct(col1, col2) which works on the same data but is less explicit.