stringr str_locate() in R: Find Match Positions in Strings

The str_locate() function in stringr returns the start and end character positions of the FIRST regex match in each string, as a two-column integer matrix. Use str_locate_all() for every match per string.

⚡ Quick Answer
str_locate(x, "pattern")                       # first match: matrix start/end
str_locate_all(x, "pattern")                   # all matches: list of matrices
str_locate(x, fixed("."))                      # literal match (skip regex)
str_locate(x, regex("a", ignore_case = TRUE))  # case-insensitive
str_locate(x, "\\d+")                          # first run of digits
str_sub(x, str_locate(x, "\\d+"))              # extract via positions
str_locate(x, "@")[, "start"]                  # column subset

Need explanation? Read on for examples and pitfalls.

📊 Is str_locate() the right tool?
STARTfind start and end position of first matchstr_locate()find positions of all matches per stringstr_locate_all()get the matched substring directlystr_extract()check if pattern exists (TRUE or FALSE)str_detect()replace matched textstr_replace()count matches per stringstr_count()split string at match positionstr_split() or str_sub()

What str_locate() does in one sentence

str_locate(string, pattern) returns an integer matrix with start and end columns giving the character positions of the first match in each input string. Strings with no match return NA in both columns.

The matrix output is what sets str_locate() apart from the other stringr matchers: you can pipe positions straight into str_sub() to slice substrings, or use the start column to find delimiters for parsing.

Syntax

str_locate(string, pattern). Returns a matrix; rows align with input strings, columns are start and end.

Run live
Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.
RLoad stringr and locate first match
library(stringr) library(tibble) x <- c("apple pie", "banana bread", "cherry tart", "no match here") str_locate(x, "pie|bread|tart") #> start end #> [1,] 7 9 #> [2,] 8 12 #> [3,] 8 11 #> [4,] NA NA

  
Tip
Always think of the result as a matrix, not a vector. Subset columns with [, "start"] or [, "end"]. Subset rows with [i, ]. Treating the result as a vector silently misuses the data.

Five common patterns

1. First match position

RFind position of first digit run
codes <- c("INV-2024-08", "REF-99", "no number") str_locate(codes, "\\d+") #> start end #> [1,] 5 8 #> [2,] 5 6 #> [3,] NA NA

  

The start column gives the index of the first matching character; end gives the last. NA in both columns means no match.

2. All matches per string

RLocate every digit run, not just the first
str_locate_all(codes, "\\d+") #> [[1]] #> start end #> [1,] 5 8 #> [2,] 10 11 #> #> [[2]] #> start end #> [1,] 5 6 #> #> [[3]] #> start end

  

str_locate_all() returns a LIST of matrices, one per input. Empty matrices (zero rows) signal no match.

3. Slice substrings using positions

RPipe positions into str_sub
emails <- c("alice@x.com", "bob@y.org", "no-email") positions <- str_locate(emails, "@.+") str_sub(emails, positions) #> [1] "@x.com" "@y.org" NA

  

str_sub() accepts the position matrix directly. This is the canonical way to extract a match when you want to keep both the substring and the index.

4. Find a delimiter for parsing

RSplit key=value at the first equals sign
kv <- c("color=red", "size=12", "weight=heavy") eq <- str_locate(kv, "=")[, "start"] data.frame( key = str_sub(kv, 1, eq - 1), value = str_sub(kv, eq + 1) ) #> key value #> 1 color red #> 2 size 12 #> 3 weight heavy

  

Locating a single character (the =) and arithmetic on the position is faster and clearer than a regex with capture groups when you only need to split at one point.

5. Use inside a data frame

RAdd start, end columns with mutate
library(dplyr) df <- tibble::tibble(text = c("order #123", "ref #45", "no id")) df |> mutate( start = str_locate(text, "#\\d+")[, "start"], end = str_locate(text, "#\\d+")[, "end"] ) #> # A tibble: 3 x 3 #> text start end #> <chr> <int> <int> #> 1 order #123 7 10 #> 2 ref #45 5 7 #> 3 no id NA NA

  
Key Insight
Position-based extraction beats pattern-based extraction when you need both the value AND its location in the original string. Use str_extract() for value-only; reach for str_locate() when downstream code needs the index (highlighting, slicing, joining adjacent fields).

Common pitfalls

Pitfall 1: forgetting it returns a matrix. str_locate(x, "a") + 1 works (matrix arithmetic), but length(str_locate(x, "a")) returns the cell count, not the row count. Use nrow() or subset to a column first.

Pitfall 2: confusing str_locate() with str_locate_all(). The first returns a matrix; the second returns a list of matrices. Code that loops over the result depends on which you called.

Warning
str_locate() reports CHARACTER positions, not byte positions. A string with multibyte UTF-8 characters (emoji, accented letters) is indexed by character, not by byte. This matches str_sub() and nchar() but differs from base R regexpr(), which returns byte positions on some platforms.

Try it yourself

Try it: Find the start position of the substring "color" in iris$Species (as character). Save the integer vector to ex_pos.

RYour turn: locate 'color' in species names
species <- as.character(iris$Species) ex_pos <- # your code here ex_pos #> Expected: NA NA ... 8 8 8 (50 NA, 50 NA, 50 eights for versicolor)

  
Click to reveal solution
RSolution
species <- as.character(iris$Species) ex_pos <- str_locate(species, "color")[, "start"] table(ex_pos, useNA = "ifany") #> ex_pos #> 8 <NA> #> 50 100

  

Explanation: str_locate(species, "color") returns a 150-row matrix; subsetting [, "start"] gives the start positions. Only the 50 "versicolor" rows match (start at character 8); the other 100 are NA.

str_locate vs other stringr matchers

Pick the matcher whose return shape matches what your downstream code consumes. The five most common stringr matchers differ only in what they hand back; the pattern engine is identical.

Function Returns Use when you need
str_locate() Matrix: start, end of first match Position of the first match
str_locate_all() List of matrices, one per string Positions of every match
str_extract() Matched substring (length-N character) Just the matched text
str_detect() Logical vector TRUE/FALSE per string
str_count() Integer vector Number of matches per string

Reach for str_locate() only when the index matters; otherwise the others are usually a shorter path.

After mastering str_locate, look at:

  • str_locate_all(): positions of every match per string
  • str_extract(): pull out the matched substring
  • str_sub(): slice substrings by start and end position
  • str_detect(): check if a pattern exists
  • str_count(): count matches per string
  • regexpr(): base R equivalent (returns positions plus match length attribute)

For full pattern grammar, see the official stringr regular expressions vignette.

FAQ

How do I find the position of a substring in R?

Use stringr::str_locate(string, "substring"). It returns a matrix with start and end columns giving the character positions of the first match. For all matches, use str_locate_all(), which returns a list of matrices, one per input string.

What is the difference between str_locate and str_extract in R?

str_locate() returns the POSITION of the match as a matrix of integers; str_extract() returns the matching SUBSTRING itself. Use locate when you need the index (for slicing, highlighting, joining); use extract when you only need the matched text.

How do I get all match positions, not just the first, in R?

Use str_locate_all(x, "pattern"). It returns a list of matrices, where each matrix has one row per match. To collapse into a single data frame, combine with purrr::map_dfr() or do.call(rbind, ...).

What does str_locate return when no match is found?

It returns NA in both the start and end columns for that row. The matrix shape is preserved, so you can still subset by row or column without errors.

How do I extract a substring using str_locate in R?

Pass the matrix directly to str_sub(): str_sub(x, str_locate(x, "pattern")). str_sub() accepts a two-column matrix as its position argument, slicing each string by its corresponding row.