dplyr where() in R: Select Columns by Predicate
The where() tidyselect helper in dplyr selects columns where a PREDICATE function (like is.numeric or is.character) returns TRUE. It is the workhorse for "all numeric columns" or "all character columns" selection.
df |> select(where(is.numeric)) # all numeric columns df |> select(where(is.character)) # all character columns df |> mutate(across(where(is.numeric), ~ .x * 2)) # multiply numerics df |> summarise(across(where(is.numeric), mean)) # mean per numeric col where(~ all(.x > 0)) # custom predicate
Need explanation? Read on for examples and pitfalls.
What where() does in one sentence
where(.fn) selects columns where applying .fn to the column returns TRUE. It is a tidyselect helper used inside select, across, pick, rename_with, etc.
The most common use: where(is.numeric) to grab all numeric columns. But any predicate function works.
Syntax
where(.fn). .fn is a function (or lambda) applied to each column; columns where it returns TRUE are kept.
where() with across() for "apply this function to all numeric columns" patterns: mutate(across(where(is.numeric), fn)). This is the standard dplyr idiom for type-based column transformations.Five common patterns
1. Select numeric columns
2. Select character columns
3. Apply function to all numerics
4. Custom predicate
5. Combine with other tidyselect helpers
& and | combine tidyselect predicates.
where() accepts ANY function returning TRUE/FALSE. Common predicates are is.numeric, is.character, is.logical, is.factor. For custom logic, use a lambda: where(~ all(.x > 0)) selects columns where every value is positive.where() vs starts_with() vs everything()
Three tidyselect helpers, used for different selection strategies.
| Helper | Selects by |
|---|---|
where(.fn) |
Predicate function on the column |
starts_with("x") |
Column name prefix |
ends_with("y") |
Column name suffix |
contains("ab") |
Column name substring |
matches("regex") |
Column name regex |
everything() |
All columns |
x:z |
Column range |
When to use which:
wherefor type-based or content-based selection.starts_with/ends_with/contains/matchesfor name-based.everything()to grab all.
A practical workflow
The "type-based transformation" pattern is where's killer use case.
Apply different transformations to different column types in one pipeline.
For audits:
NA counts per numeric column.
Common pitfalls
Pitfall 1: forgetting where with across. across(is.numeric, fn) errors; across(where(is.numeric), fn) is correct. across needs the explicit tidyselect helper.
Pitfall 2: predicate must return single TRUE/FALSE. where(~ .x > 0) (returns a vector) errors. Use where(~ all(.x > 0)) to reduce to single boolean.
where() is dplyr / tidyselect-specific. It doesn't work in base R or with [-style indexing. Inside dplyr verbs, use where; outside, use sapply(df, is.numeric) or Filter.Try it yourself
Try it: From mtcars, select only the columns that have ALL values >= 1. Save to ex_pos.
Click to reveal solution
Explanation: where(~ all(.x >= 1)) checks each column for whether every value is >= 1. vs and am have 0s and are dropped.
Related tidyselect helpers
After mastering where, look at:
everything(): all columnsstarts_with()/ends_with()/contains()/matches(): name-basednum_range(): numeric-suffixed namesany_of()/all_of(): literal column lists with strict / lax matchinglast_col(): rightmost column
For complex selection, combine with &, |, ! for set operations on selections.
Why where simplifies type-aware code
Before tidyselect's where, applying a function to "all numeric columns" required mutate_if(is.numeric, fn) (now superseded) or hand-built loops. where brings type-based selection into the unified tidyselect grammar, so it composes cleanly with name-based selectors. select(where(is.numeric) & starts_with("score_")) reads naturally; building the same selection without where would require multiple steps.
FAQ
What does where do in dplyr?
where(.fn) is a tidyselect helper that selects columns where .fn(column) returns TRUE. Used inside select, across, pick, etc.
Can I use where with custom predicates?
Yes. where(~ all(.x > 0)) selects columns where every value is positive. The predicate must reduce to a single TRUE/FALSE.
How do I select all numeric columns in dplyr?
select(df, where(is.numeric)). For non-numeric columns, select(df, where(\(x) !is.numeric(x))).
What is the difference between where and starts_with?
where uses a function (predicate) on the column. starts_with uses a name prefix. They can be combined: where(is.numeric) & starts_with("x_").
Does where work outside dplyr?
No. It is a tidyselect helper. Outside dplyr verbs, use sapply(df, predicate) or Filter(predicate, df).