stringr str_subset() in R: Filter Strings by Pattern

The str_subset() function in stringr keeps elements of a character vector that match a regex pattern. It is the tidyverse replacement for base R grep(pattern, x, value = TRUE).

By Selva Prabhakaran · Published May 15, 2026 · Last updated May 15, 2026

⚡ Quick Answer

str_subset(x, "pattern")                              # regex (default)
str_subset(x, fixed("text"))                          # literal match
str_subset(x, regex("pat", ignore_case = TRUE))       # case-insensitive
str_subset(x, "pattern", negate = TRUE)               # keep non-matches
str_subset(x, "^a")                                   # starts with "a"
str_subset(x, "ing$")                                 # ends with "ing"
str_subset(na.omit(x), "pattern")                     # NA-safe filter

Need explanation? Read on for examples and pitfalls.

📊 Is str_subset() the right tool?

What str_subset() does in one sentence

str_subset(string, pattern) returns a character vector containing only the elements of string that match pattern. The pattern is a regular expression by default; wrap with fixed() for literal string matching.

It is the workhorse for filtering character vectors. str_subset(x, p) is exactly equivalent to x[str_detect(x, p)] but reads better and is more concise in pipelines.

Syntax

str_subset(string, pattern, negate = FALSE) takes a character vector and a regex pattern, returning the subset that matches.

Run live

Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.

RLoad stringr and filter a vector

library(stringr) fruits <- c("apple", "banana", "cherry", "blueberry", "date") str_subset(fruits, "berry") #> [1] "blueberry"

Tip

str_subset() is the vector-in, vector-out cousin of str_detect(). Use str_subset() when you want the matching strings themselves. Use str_detect() when you want a logical mask (for example, to combine with other conditions or to feed into dplyr filter()).

Five common patterns

1. Basic regex filter

RKeep strings containing a substring

words <- c("running", "ran", "runs", "runner", "walking") str_subset(words, "run") #> [1] "running" "runs" "runner"

By default the pattern is a regular expression. "run" matches anywhere in each string, so "running", "runs", and "runner" all qualify; "ran" and "walking" are dropped.

2. Literal (non-regex) match with fixed()

RMatch a literal dot

versions <- c("1.5", "1a5", "1.x", "v1.5") str_subset(versions, fixed("1.5")) #> [1] "1.5" "v1.5"

fixed("1.5") treats the pattern as a plain string. Without fixed(), the . is regex for "any character" and would also match "1a5".

3. Case-insensitive filter

RKeep strings matching 'apple' in any case

items <- c("apple pie", "Apple Tart", "BANANA", "pineapple") str_subset(items, regex("apple", ignore_case = TRUE)) #> [1] "apple pie" "Apple Tart" "pineapple"

Wrap the pattern in regex(pattern, ignore_case = TRUE) to ignore case. "BANANA" is excluded; everything else contains "apple" somewhere, regardless of capitalization.

4. Negate (keep non-matching strings)

RDrop strings containing digits

labels <- c("alpha", "beta2", "gamma", "delta9", "epsilon") str_subset(labels, "\\d", negate = TRUE) #> [1] "alpha" "gamma" "epsilon"

negate = TRUE flips the filter to keep elements that do NOT match. Here \\d matches any digit, so labels containing a digit are dropped.

5. Anchored patterns for start and end matching

RFilter by prefix and suffix

files <- c("data.csv", "data.xlsx", "report.csv", "summary.txt") str_subset(files, "^data") #> [1] "data.csv" "data.xlsx" str_subset(files, "\\.csv$") #> [1] "data.csv" "report.csv"

^ anchors the pattern to the start of each string, $ to the end. Combine them (^foo$) to match the whole string exactly.

Key Insight

str_subset() = str_detect() + subset, in one step. Internally it computes a logical mask with str_detect() and uses it to filter the vector. Knowing this equivalence makes the function easy to reason about and lets you swap to str_detect() the moment you need the mask separately.

str_subset() vs alternatives

Four functions cover almost every "filter a character vector by pattern" job. Pick by return type, not by habit: values, logicals, indices, or base R equivalent.

Function	Returns	Use when
`str_subset(x, p)`	character vector of matches	You want the matching strings themselves
`str_detect(x, p)`	logical vector, same length as x	You need a TRUE/FALSE mask (for filter() or & / \	)
`str_which(x, p)`	integer vector of indices	You need positions (for slicing, ordering, joining)
`grep(p, x, value=TRUE)`	character vector of matches	You are writing base R with no dependencies

RCompare the four approaches

x <- c("apple", "banana", "cherry") str_subset(x, "an") # values #> [1] "banana" str_detect(x, "an") # logical #> [1] FALSE TRUE FALSE str_which(x, "an") # indices #> [1] 2 grep("an", x, value = TRUE) # base R equivalent of str_subset #> [1] "banana"

All four are vectorized over x. Pick the one whose return type fits the next step in your pipeline.

Common pitfalls

Pitfall 1: regex special characters treated literally. str_subset(x, "1.5") matches "1a5", "125", "1.5", and so on, because . is regex for "any character". Use fixed("1.5") or escape: "1\\.5".

Pitfall 2: NA elements pass through unchanged. str_subset(c("a", NA, "ab"), "a") returns c("a", NA, "ab"). NAs are kept because the match test returns NA, which subset treats as "include". Filter with str_subset(na.omit(x), "a") if you want NAs dropped first.

Warning

str_subset() does not work on data frames. Passing a tibble or data frame raises an error. To filter rows of a data frame by a string column, use filter(df, str_detect(col, "pattern")) with dplyr instead. str_subset() is only for atomic character vectors.

Pitfall 3: empty result returns character(0), not NULL. When no strings match, str_subset() returns a zero-length character vector. Check with length(result) > 0 before downstream code that assumes at least one element.

Try it yourself

Try it: Filter the built-in state.name vector to keep only states whose name starts with "New". Save the result to ex_states.

RYour turn: filter state names

# Try it: keep states starting with "New" ex_states <- # your code here ex_states #> Expected: 4 states

Click to reveal solution

RSolution

library(stringr) ex_states <- str_subset(state.name, "^New") ex_states #> [1] "New Hampshire" "New Jersey" "New Mexico" "New York"

Explanation: The pattern "^New" anchors the match to the start of each state name, so it keeps only states whose name begins with "New". Four states qualify.

After mastering str_subset, look at:

str_detect(): TRUE/FALSE mask version of the same logical test
str_which(): integer index version, for slicing or positional joins
str_extract(): extract the matched substring rather than the whole string
str_count(): count matches within each string
str_replace(): replace the matched pattern with new text

For complex patterns, the regex(), fixed(), coll(), and boundary() modifier functions in stringr give precise control over match behavior. Read the stringr regex reference for the full pattern syntax.

FAQ

How do I filter a vector of strings by a pattern in R?

Use stringr::str_subset(x, "pattern"). It returns a character vector containing only the elements of x that match the pattern. The pattern is a regex by default; wrap with fixed() for a literal match. It is the tidyverse equivalent of grep(pattern, x, value = TRUE).

What is the difference between str_subset and str_detect in R?

str_detect() returns a logical vector the same length as input (TRUE where the pattern matches). str_subset() returns only the matching elements. str_subset(x, p) is equivalent to x[str_detect(x, p)]. Use str_subset() when you want the values; str_detect() when you want a mask for further combining.

Is str_subset the same as grep with value=TRUE?

Yes, for simple cases. str_subset(x, "p") and grep("p", x, value = TRUE) return the same character vector. Differences: str_subset() supports the negate argument directly and handles modifiers like fixed() and regex() consistently. grep() requires zero packages and offers perl = TRUE for PCRE.

How do I do a case-insensitive str_subset in R?

Wrap the pattern in regex(pattern, ignore_case = TRUE). Example: str_subset(x, regex("apple", ignore_case = TRUE)) keeps strings matching "apple", "Apple", "APPLE", and so on.

Can str_subset filter rows of a data frame?

No. str_subset() only accepts atomic character vectors. To filter data frame rows by a string column, use dplyr: filter(df, str_detect(col, "pattern")). str_detect() produces the logical mask that filter() needs.

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

stringr str_subset() in R: Filter Strings by Pattern

What str_subset() does in one sentence

Syntax

Five common patterns

1. Basic regex filter

2. Literal (non-regex) match with fixed()

3. Case-insensitive filter

4. Negate (keep non-matching strings)

5. Anchored patterns for start and end matching

str_subset() vs alternatives

Common pitfalls

Try it yourself

FAQ

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

stringr str_subset() in R: Filter Strings by Pattern

What str_subset() does in one sentence

Syntax

Five common patterns

1. Basic regex filter

2. Literal (non-regex) match with fixed()

3. Case-insensitive filter

4. Negate (keep non-matching strings)

5. Anchored patterns for start and end matching

str_subset() vs alternatives

Common pitfalls

Try it yourself

Related stringr functions

FAQ

Related Tutorials