dplyr matches() in R: Select Columns by Regex

The matches() helper in dplyr selects columns whose names match a REGULAR EXPRESSION. It is the regex tidyselect helper, more flexible than starts_with, ends_with, or contains.

⚡ Quick Answer
df |> select(matches("^score"))             # regex prefix
df |> select(matches("\\d+$"))              # ends with digits
df |> select(matches("^[A-Z]_\\w+"))        # complex pattern
df |> select(matches("score|rating"))       # alternation
df |> mutate(across(matches("^q\\d+$"), as.factor))

Need explanation? Read on for examples and pitfalls.

📊 Is matches() the right tool?
STARTregex pattern in namematches("regex")literal prefixstarts_with() (faster, simpler)literal suffixends_with()literal substringcontains()exact list of namesall_of()predicatewhere()

What matches() does in one sentence

matches(match, ignore.case = TRUE, perl = FALSE, vars = NULL) selects columns whose names match the regex match. The most flexible name-based tidyselect helper.

Syntax

matches(match, ignore.case = TRUE, perl = FALSE, vars = NULL). Standard regex.

Run live
Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.
RColumns whose names start with 'q' followed by digits
library(dplyr) df <- tibble(q1 = 1, q2 = 2, q12 = 3, qa = 4, score = 5) df |> select(matches("^q\\d+$")) #> q1 q2 q12 (qa dropped, no digit; score dropped, no q)

  
Tip
Reach for matches when literal helpers (starts_with, ends_with, contains) can't express the pattern. For simple cases, the literal helpers are clearer.

Five common patterns

1. Regex prefix

RSame as starts_with but with regex
df |> select(matches("^score"))

  

^ anchors to start.

2. Regex suffix

RNames ending in digits
df |> select(matches("\\d+$"))

  

$ anchors to end.

3. Alternation

RMultiple patterns at once
df |> select(matches("score|rating|score_pct"))

  

| is OR in regex.

4. Character class

RNames with format X_word
df |> select(matches("^[A-Z]_\\w+"))

  

[A-Z] is uppercase; \\w+ is word characters.

5. Multi-step transform

RConvert all q1, q2, ... to factor
df |> mutate(across(matches("^q\\d+$"), as.factor))

  
Key Insight
matches() is the only tidyselect helper that supports REGEX. Everything else (starts_with, ends_with, contains) uses literal strings. Use matches when the pattern is too complex for the literals.

matches() vs starts_with / ends_with / contains

Helper Matches Best for
starts_with("x") Literal prefix Simple prefixes
ends_with("y") Literal suffix Simple suffixes
contains("ab") Literal substring Substring anywhere
matches("regex") Regex Complex patterns

When to use which:

  • Use literal helpers when possible (faster, clearer).
  • Reach for matches only when regex is needed.

A practical workflow

Use matches for column names with structured patterns.

RInteractive R
# Q1, Q2, ..., Q20 columns -> all factors df |> mutate(across(matches("^Q\\d+$"), as.factor)) # Year-stamped columns 2020-2024 df |> select(matches("_(202[0-4])$"))

  

For survey data with structured names, matches is essential.

Common pitfalls

Pitfall 1: regex special characters. matches(".") matches every column (any character). Use matches("\\.") for literal period.

Pitfall 2: case-insensitive default. matches("score") matches "SCORE" and "ScOrE". Pass ignore.case = FALSE for strict.

Warning
Backslashes in R strings are double-escaped. Regex \d in a string is "\\d". Common bug: writing matches("\d+") (errors).

Try it yourself

Try it: Select all iris columns ending in either "Length" or "Width". Save to ex_dims.

RYour turn: dimension columns
ex_dims <- iris |> # your code here names(ex_dims) #> Expected: 4 columns (Sepal/Petal Length/Width)

  
Click to reveal solution
RSolution
ex_dims <- iris |> select(matches("(Length|Width)$")) names(ex_dims) #> [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"

  

Explanation: (Length|Width)$ matches either word at the end of the name.

After mastering matches, look at:

  • starts_with() / ends_with() / contains(): literal helpers
  • everything(): all remaining
  • where(): predicate
  • all_of() / any_of(): explicit list
  • num_range(): numeric-suffixed names

For 99% of name-based selection, the literal helpers are simpler and faster than matches.

FAQ

What does matches do in dplyr?

matches(pattern) selects columns whose names match the regex pattern. Tidyselect helper for regex-based selection.

Is matches case-sensitive?

No by default. Pass ignore.case = FALSE for strict matching.

What is the difference between matches and contains?

matches uses regex; contains is literal substring. matches("a.b") is "a, any char, b"; contains("a.b") is the literal "a.b".

How do I anchor matches to start or end?

Use regex anchors: ^ for start (matches("^score")); $ for end (matches("score$")).

Why does my pattern with backslashes error?

R strings double-escape backslashes. Regex \d is the string "\\d". Use matches("\\d+"), not matches("\d+").