grep() and grepl() in R: Search Strings With Patterns
The grep() function in base R returns INDEXES of matches; grepl() returns a LOGICAL vector. Both search a character vector for elements matching a regex (or fixed string).
grep("apple", x) # indexes of matches
grepl("apple", x) # TRUE/FALSE vector
grep("apple", x, value = TRUE) # matched values
grep("apple", x, ignore.case = TRUE) # case-insensitive
grep("\\.csv$", files) # regex anchor
grep("apple", x, fixed = TRUE) # literal string (no regex)
grep("apple", x, invert = TRUE) # non-matchesNeed explanation? Read on for examples and pitfalls.
What grep() does in one sentence
grep(pattern, x) returns the integer positions of x's elements that match pattern; grepl() returns the same result as a logical vector. Both default to regex; use fixed = TRUE for literal text.
These two functions are the workhorses of base R string search. Their cousins sub() and gsub() do replacement.
Syntax
grep(pattern, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, value = FALSE, invert = FALSE).
grepl for filtering with subset() or [; use grep for index-based slicing. Both are vectorized and equally fast. Pick the one that fits your downstream use cleanly.Five common patterns
1. Filter a vector
Equivalent to grep("apple", x, value = TRUE).
2. Filter a data frame
3. Case-insensitive search
4. Anchored regex (file extensions)
\\. is an escaped literal dot; $ anchors to end of string.
5. Inverted match (non-matches)
invert = TRUE returns elements that do NOT match.
grep and grepl differ only in what they return. Same arguments, same regex engine. grep returns positions (or values with value = TRUE); grepl returns booleans. Pick whichever feeds into your next step naturally.grep() vs grepl() vs str_detect() vs str_subset()
Four "is this string a match?" functions across base and stringr.
| Function | Package | Returns | Vectorized |
|---|---|---|---|
grep() |
base | Integer positions | Yes |
grepl() |
base | Logical vector | Yes |
stringr::str_detect() |
stringr | Logical vector | Yes |
stringr::str_subset() |
stringr | Filtered vector | Yes |
stringr::str_which() |
stringr | Integer positions | Yes |
When to use which:
- grep / grepl for base R purity, no dependencies.
- str_detect for tidyverse pipelines (consistent with str_replace, str_extract).
- str_subset as a shortcut for
x[grepl(p, x)].
The stringr functions take arguments in a more consistent order: str_detect(string, pattern) vs grepl(pattern, x). The base functions put pattern first; stringr puts string first to fit pipes.
A practical regex search workflow
Most string-filtering tasks combine three steps: detect, subset, validate.
- Detect which rows match (
grepl). - Subset to those rows.
- Validate the regex caught what you intended.
Always sanity-check matches by inspecting a sample. A regex that matches "too much" or "too little" is the most common bug. For complex patterns, build them up incrementally and test each addition.
Common pitfalls
Pitfall 1: regex special characters. grep(".", x) matches EVERY string (. is regex for any character). Use fixed = TRUE or escape: grep("\\.", x).
Pitfall 2: NA elements. grepl("a", c("apple", NA)) returns c(TRUE, NA). Filter NAs first or use which() instead of subsetting with the result.
grep returns INDEX positions; grepl returns LOGICAL. Mixing them up is a common bug. x[grep(...)] keeps matching elements; x[grepl(...)] does the same. x[!grep(...)] is WRONG (negating an index vector). Use grepl and ! together: x[!grepl(...)].When to use perl = TRUE
Base R supports two regex engines: POSIX (default) and PCRE (Perl-compatible) via perl = TRUE. PCRE is more powerful: it supports lookbehinds (?<=), lookaheads (?=), non-greedy quantifiers *? +?, and inline flags like (?i) for case-insensitive. The default POSIX engine handles most common patterns but errors on PCRE-only constructs. When in doubt, set perl = TRUE: it covers a strict superset of the default behaviour and is what most modern regex tutorials assume. The performance cost is negligible for typical inputs.
Try it yourself
Try it: Filter the iris species names to keep only those starting with "v" (case-insensitive). Save to ex_v_species.
Click to reveal solution
Explanation: ^v anchors the match to the start of the string. ignore.case = TRUE matches both "v" and "V". value = TRUE returns the matching strings.
Related search functions
After mastering grep, look at:
sub()andgsub(): replace match (first / all)regmatches(): extract the matched substringregexpr()/gregexpr(): find positions of matchesstringr::str_detect()and friends: tidyverse equivalentsstartsWith()/endsWith(): prefix / suffix matching (faster than regex)
For literal prefix / suffix checks, startsWith and endsWith are faster than regex anchors.
FAQ
What is the difference between grep and grepl in R?
grep returns the INTEGER POSITIONS of matches (or matched strings if value = TRUE). grepl returns a LOGICAL vector with TRUE for matches and FALSE for non-matches. Same arguments otherwise.
How do I do a case-insensitive grep in R?
Pass ignore.case = TRUE: grep("apple", x, ignore.case = TRUE). Or use the inline regex flag: grep("(?i)apple", x, perl = TRUE).
How do I search for a literal string with grep?
Pass fixed = TRUE: grep(".", x, fixed = TRUE) matches the literal . (period) instead of "any character".
How do I count matches with grep?
length(grep(p, x)) or sum(grepl(p, x)). Both give the count.
How do I get the actual matched strings instead of indexes?
Pass value = TRUE: grep("apple", x, value = TRUE). Or use x[grepl("apple", x)]. Both give matching strings.