dplyr matches() in R: Select Columns by Regex
The matches() helper in dplyr selects columns whose names match a REGULAR EXPRESSION. It is the regex tidyselect helper, more flexible than starts_with, ends_with, or contains.
df |> select(matches("^score")) # regex prefix
df |> select(matches("\\d+$")) # ends with digits
df |> select(matches("^[A-Z]_\\w+")) # complex pattern
df |> select(matches("score|rating")) # alternation
df |> mutate(across(matches("^q\\d+$"), as.factor))Need explanation? Read on for examples and pitfalls.
What matches() does in one sentence
matches(match, ignore.case = TRUE, perl = FALSE, vars = NULL) selects columns whose names match the regex match. The most flexible name-based tidyselect helper.
Syntax
matches(match, ignore.case = TRUE, perl = FALSE, vars = NULL). Standard regex.
Five common patterns
1. Regex prefix
^ anchors to start.
2. Regex suffix
$ anchors to end.
3. Alternation
| is OR in regex.
4. Character class
[A-Z] is uppercase; \\w+ is word characters.
5. Multi-step transform
matches() is the only tidyselect helper that supports REGEX. Everything else (starts_with, ends_with, contains) uses literal strings. Use matches when the pattern is too complex for the literals.matches() vs starts_with / ends_with / contains
| Helper | Matches | Best for |
|---|---|---|
starts_with("x") |
Literal prefix | Simple prefixes |
ends_with("y") |
Literal suffix | Simple suffixes |
contains("ab") |
Literal substring | Substring anywhere |
matches("regex") |
Regex | Complex patterns |
When to use which:
- Use literal helpers when possible (faster, clearer).
- Reach for matches only when regex is needed.
A practical workflow
Use matches for column names with structured patterns.
For survey data with structured names, matches is essential.
Common pitfalls
Pitfall 1: regex special characters. matches(".") matches every column (any character). Use matches("\\.") for literal period.
Pitfall 2: case-insensitive default. matches("score") matches "SCORE" and "ScOrE". Pass ignore.case = FALSE for strict.
\d in a string is "\\d". Common bug: writing matches("\d+") (errors).Try it yourself
Try it: Select all iris columns ending in either "Length" or "Width". Save to ex_dims.
Click to reveal solution
Explanation: (Length|Width)$ matches either word at the end of the name.
Related tidyselect helpers
After mastering matches, look at:
starts_with()/ends_with()/contains(): literal helperseverything(): all remainingwhere(): predicateall_of()/any_of(): explicit listnum_range(): numeric-suffixed names
For 99% of name-based selection, the literal helpers are simpler and faster than matches.
FAQ
What does matches do in dplyr?
matches(pattern) selects columns whose names match the regex pattern. Tidyselect helper for regex-based selection.
Is matches case-sensitive?
No by default. Pass ignore.case = FALSE for strict matching.
What is the difference between matches and contains?
matches uses regex; contains is literal substring. matches("a.b") is "a, any char, b"; contains("a.b") is the literal "a.b".
How do I anchor matches to start or end?
Use regex anchors: ^ for start (matches("^score")); $ for end (matches("score$")).
Why does my pattern with backslashes error?
R strings double-escape backslashes. Regex \d is the string "\\d". Use matches("\\d+"), not matches("\d+").