tidyr extract() in R: Extract Regex Capture Groups Into Cols
The extract() function in tidyr extracts regex CAPTURE GROUPS from a string column into multiple new columns. It is similar to separate_wider_regex() but uses traditional capture-group syntax.
df |> extract(col, into = c("year","month"), regex = "(\\d{4})-(\\d{2})")
df |> extract(col, c("a","b"), "([A-Z]+)(\\d+)")
df |> separate_wider_regex(col, ...) # modern alternative
df |> stringr::str_match(col, ...) # base-level extractionNeed explanation? Read on for examples and pitfalls.
What extract() does in one sentence
extract(data, col, into, regex, remove = TRUE, convert = FALSE) extracts capture groups from a regex match into new columns named in into. Older API; the newer separate_wider_regex() is preferred.
Syntax
extract(data, col, into, regex = "([[:alnum:]]+)", remove = TRUE, convert = FALSE).
extract is older and still works; separate_wider_regex() is the modern unified replacement. Both extract regex capture groups; the latter has cleaner syntax.Five common patterns
1. Date with regex
2. Letter prefix + number
3. Convert types
4. Modern alternative
5. Keep original column
extract is the OLDER regex-based extractor; separate_wider_regex is the MODERN unified version.* Both work; for consistency with the separate_wider_ family, prefer the newer.extract() vs separate_wider_regex() vs str_match
| Function | API style | Best for |
|---|---|---|
extract() |
Older, capture-group syntax | Existing code |
separate_wider_regex() |
Modern, named patterns | New code |
stringr::str_match() |
Base extraction | Outside dplyr |
A practical workflow
Both extract and separate_wider_regex work; for new code use separate_wider_regex.
Common pitfalls
Pitfall 1: regex special characters. extract uses regex by default. Escape literals: \\. for period.
Pitfall 2: convert auto-detection. convert = TRUE tries to convert types; this may surprise (e.g., "01" -> 1 not "01").
extract() is a soft-superseded function. Existing uses are fine; new code should use separate_wider_regex() for consistency.Try it yourself
Try it: Extract version major and minor from "v2.5". Save to ex_ver.
Click to reveal solution
Explanation: Capture groups extract the digits; convert turns them into integers.
Related tidyr functions
After mastering extract, look at:
separate_wider_regex(): modern equivalentseparate_wider_delim(): delimiter-basedseparate_wider_position(): fixed widthsstringr::str_match(): base extraction
FAQ
What does extract do in tidyr?
extract(data, col, into, regex) extracts regex capture groups into new columns named in into.
What is the difference between extract and separate_wider_regex?
extract is older with capture-group syntax. separate_wider_regex is newer with named pattern syntax. Both extract regex; new code prefers separate_wider_regex.
Should I use extract or separate_wider_regex in new code?
separate_wider_regex. extract still works but is part of the older API.
What does convert = TRUE do?
Tries to convert each new column to its appropriate type (numeric, integer). May surprise with leading zeros.
Can I keep the original column?
Yes. Pass remove = FALSE to keep the source column alongside the extracted ones.