stringr str_locate() in R: Find Match Positions in Strings
The str_locate() function in stringr returns the start and end character positions of the FIRST regex match in each string, as a two-column integer matrix. Use str_locate_all() for every match per string.
str_locate(x, "pattern") # first match: matrix start/end str_locate_all(x, "pattern") # all matches: list of matrices str_locate(x, fixed(".")) # literal match (skip regex) str_locate(x, regex("a", ignore_case = TRUE)) # case-insensitive str_locate(x, "\\d+") # first run of digits str_sub(x, str_locate(x, "\\d+")) # extract via positions str_locate(x, "@")[, "start"] # column subset
Need explanation? Read on for examples and pitfalls.
What str_locate() does in one sentence
str_locate(string, pattern) returns an integer matrix with start and end columns giving the character positions of the first match in each input string. Strings with no match return NA in both columns.
The matrix output is what sets str_locate() apart from the other stringr matchers: you can pipe positions straight into str_sub() to slice substrings, or use the start column to find delimiters for parsing.
Syntax
str_locate(string, pattern). Returns a matrix; rows align with input strings, columns are start and end.
[, "start"] or [, "end"]. Subset rows with [i, ]. Treating the result as a vector silently misuses the data.Five common patterns
1. First match position
The start column gives the index of the first matching character; end gives the last. NA in both columns means no match.
2. All matches per string
str_locate_all() returns a LIST of matrices, one per input. Empty matrices (zero rows) signal no match.
3. Slice substrings using positions
str_sub() accepts the position matrix directly. This is the canonical way to extract a match when you want to keep both the substring and the index.
4. Find a delimiter for parsing
Locating a single character (the =) and arithmetic on the position is faster and clearer than a regex with capture groups when you only need to split at one point.
5. Use inside a data frame
str_extract() for value-only; reach for str_locate() when downstream code needs the index (highlighting, slicing, joining adjacent fields).Common pitfalls
Pitfall 1: forgetting it returns a matrix. str_locate(x, "a") + 1 works (matrix arithmetic), but length(str_locate(x, "a")) returns the cell count, not the row count. Use nrow() or subset to a column first.
Pitfall 2: confusing str_locate() with str_locate_all(). The first returns a matrix; the second returns a list of matrices. Code that loops over the result depends on which you called.
str_locate() reports CHARACTER positions, not byte positions. A string with multibyte UTF-8 characters (emoji, accented letters) is indexed by character, not by byte. This matches str_sub() and nchar() but differs from base R regexpr(), which returns byte positions on some platforms.Try it yourself
Try it: Find the start position of the substring "color" in iris$Species (as character). Save the integer vector to ex_pos.
Click to reveal solution
Explanation: str_locate(species, "color") returns a 150-row matrix; subsetting [, "start"] gives the start positions. Only the 50 "versicolor" rows match (start at character 8); the other 100 are NA.
str_locate vs other stringr matchers
Pick the matcher whose return shape matches what your downstream code consumes. The five most common stringr matchers differ only in what they hand back; the pattern engine is identical.
| Function | Returns | Use when you need |
|---|---|---|
str_locate() |
Matrix: start, end of first match | Position of the first match |
str_locate_all() |
List of matrices, one per string | Positions of every match |
str_extract() |
Matched substring (length-N character) | Just the matched text |
str_detect() |
Logical vector | TRUE/FALSE per string |
str_count() |
Integer vector | Number of matches per string |
Reach for str_locate() only when the index matters; otherwise the others are usually a shorter path.
Related stringr functions
After mastering str_locate, look at:
str_locate_all(): positions of every match per stringstr_extract(): pull out the matched substringstr_sub(): slice substrings by start and end positionstr_detect(): check if a pattern existsstr_count(): count matches per stringregexpr(): base R equivalent (returns positions plus match length attribute)
For full pattern grammar, see the official stringr regular expressions vignette.
FAQ
How do I find the position of a substring in R?
Use stringr::str_locate(string, "substring"). It returns a matrix with start and end columns giving the character positions of the first match. For all matches, use str_locate_all(), which returns a list of matrices, one per input string.
What is the difference between str_locate and str_extract in R?
str_locate() returns the POSITION of the match as a matrix of integers; str_extract() returns the matching SUBSTRING itself. Use locate when you need the index (for slicing, highlighting, joining); use extract when you only need the matched text.
How do I get all match positions, not just the first, in R?
Use str_locate_all(x, "pattern"). It returns a list of matrices, where each matrix has one row per match. To collapse into a single data frame, combine with purrr::map_dfr() or do.call(rbind, ...).
What does str_locate return when no match is found?
It returns NA in both the start and end columns for that row. The matrix shape is preserved, so you can still subset by row or column without errors.
How do I extract a substring using str_locate in R?
Pass the matrix directly to str_sub(): str_sub(x, str_locate(x, "pattern")). str_sub() accepts a two-column matrix as its position argument, slicing each string by its corresponding row.