dplyr starts_with() in R: Select Columns by Prefix
The starts_with() helper in dplyr selects columns whose names START WITH a given string. It is the most common tidyselect pattern for column-name-based selection.
df |> select(starts_with("score_")) # all columns starting with "score_"
df |> select(starts_with("X")) # case-sensitive by default
df |> select(starts_with("x", ignore.case = TRUE))
df |> mutate(across(starts_with("score_"), ~ .x * 100))
df |> select(-starts_with("temp_")) # drop columns with that prefixNeed explanation? Read on for examples and pitfalls.
What starts_with() does in one sentence
starts_with(match) selects columns whose names start with the literal string match. Used inside select, across, pick, and other tidyselect-aware verbs.
Syntax
starts_with(match, ignore.case = TRUE, vars = NULL). The default IS case-insensitive.
starts_with() is CASE-INSENSITIVE by default (unlike base R's startsWith()). Pass ignore.case = FALSE for strict matching.Five common patterns
1. Select by prefix
2. Apply across by prefix
3. Drop by prefix
4. Case-sensitive matching
5. Combine with other helpers
starts_with() matches LITERAL strings, not regex. For regex matching, use matches(). For substring (anywhere in name), use contains().starts_with() vs ends_with() vs contains() vs matches()
Four name-based tidyselect helpers.
| Helper | Matches |
|---|---|
starts_with("x") |
Names starting with "x" |
ends_with("y") |
Names ending with "y" |
contains("ab") |
Names containing "ab" anywhere |
matches("regex") |
Names matching regex |
When to use which:
starts_withfor prefix patterns (most common).ends_withfor suffix patterns.containsfor substring.matchesfor regex.
A practical workflow
The "transform all X_ columns" pattern is starts_with's killer use case.
Convert all q columns to factor; scale all score_ columns.
Common pitfalls
Pitfall 1: starts_with is case-INSENSITIVE by default. This differs from base R's startsWith which is case-sensitive. If you need strict matching, pass ignore.case = FALSE.
Pitfall 2: not regex. starts_with("a.") matches names starting with the literal "a." (period included), not "a" followed by any character. Use matches("^a.") for regex.
starts_with() accepts a SINGLE string OR a character vector. Passing a vector matches any of the prefixes: starts_with(c("a_","b_")) selects names starting with "a_" OR "b_".Try it yourself
Try it: Select all mtcars columns whose names start with "m". Save to ex_m_cols.
Click to reveal solution
Explanation: Only mpg starts with "m" (case-insensitive default).
Related tidyselect helpers
After mastering starts_with, look at:
ends_with(): suffix matchingcontains(): substring matchingmatches(): regex matchingeverything(): all remainingwhere(): type / predicateall_of()/any_of(): explicit vector
For combining helpers, use & (and), | (or), ! (not).
FAQ
What does starts_with do in dplyr?
starts_with(match) selects columns whose names start with the string match. Tidyselect helper for prefix-based selection.
Is starts_with case-sensitive?
No, by default it is case-insensitive. Pass ignore.case = FALSE for strict matching.
Can starts_with take multiple prefixes?
Yes. Pass a character vector: starts_with(c("a_","b_")) matches names starting with either prefix.
What is the difference between starts_with and matches?
starts_with matches a literal prefix. matches uses regex. starts_with("a.") matches "a." literally; matches("^a.") matches "a" followed by any character.
How do I drop columns by prefix?
select(-starts_with("temp_")). The minus sign inverts the selection.