tidyr separate_wider_regex() in R: Split Column by Regex
The separate_wider_regex() function in tidyr 1.3 splits a string column into multiple columns based on a sequence of REGEX PATTERNS. Each named pattern captures a part of the string into a new column.
df |> separate_wider_regex(col, patterns = c(year="\\d{4}", "-", month="\\d{2}", "-", day="\\d{2}"))
df |> separate_wider_regex(col, patterns = c(letter="[A-Z]+", num="\\d+"))
df |> separate_wider_regex(col, patterns = c(name="[a-z]+", "@", domain="\\S+"))
df |> separate_wider_delim(col, delim = "-") # simpler alternative
df |> separate_wider_position(col, widths = c(...)) # for fixed widthsNeed explanation? Read on for examples and pitfalls.
What separate_wider_regex() does in one sentence
separate_wider_regex(data, cols, patterns) matches each value of cols against a CONCATENATED sequence of regex patterns; each named pattern becomes a new column. Unnamed strings in patterns are skipped.
Syntax
separate_wider_regex(data, cols, patterns, too_few = "error", cols_remove = TRUE). patterns is a NAMED character vector.
Five common patterns
1. Letter prefix + number suffix
2. Email address
3. Date with delimiter
4. Skip parts of input
5. Multi-step regex parse
separate_wider_regex is the regex sister of separate_wider_delim and separate_wider_position. Use regex when patterns are complex (e.g., variable-length parts, alternation). For simple delim or position, use the simpler functions.separate_wider_regex() vs str_match() vs separate_wider_delim()
| Function | Output | Best for |
|---|---|---|
separate_wider_regex() |
Multi-column tibble | Structured regex parsing |
stringr::str_match() |
Matrix of capture groups | One-off vector extraction |
separate_wider_delim() |
Multi-column tibble | Simple delimiter |
separate_wider_position() |
Multi-column tibble | Fixed widths |
When to use which:
- regex for complex patterns.
- delim for simple delimiters.
- position for fixed widths.
- str_match for one-time extraction outside dplyr.
A practical workflow
Use separate_wider_regex when input has STRUCTURE the simpler functions can't capture.
Parse log entries into timestamp, level, and message in one step.
Common pitfalls
Pitfall 1: too_few = "error" by default. If a row doesn't match the full pattern, it errors. Pass too_few = "align_start" for partial matches.
Pitfall 2: greedy regex eating too much. pattern = ".*" is greedy. Use .*? (non-greedy) or anchored alternatives.
separate_wider_regex() requires the FULL string to match the concatenated pattern. Each character of the input must be consumed by some part of patterns. Use unnamed strings to "skip" segments.Try it yourself
Try it: Parse "v2.5.1" into major, minor, patch integer components. Save to ex_ver.
Click to reveal solution
Explanation: Match "v" literally, then capture digits as major, minor, patch with literal dots between.
Related tidyr functions
After mastering separate_wider_regex, look at:
separate_wider_delim(): simpler delimiterseparate_wider_position(): fixed widthsseparate_longer_delim(): split into rowsstringr::str_match(): lower-level vector extractionunite(): combine columns
FAQ
What does separate_wider_regex do in tidyr?
Splits a string column into multiple columns by matching a sequence of regex patterns. Named patterns become columns; unnamed are matched but discarded.
What is the difference between separate_wider_regex and separate_wider_delim?
regex uses regex patterns (more flexible). delim uses a literal delimiter (simpler). Use regex when the pattern is too complex for a single delimiter.
Can I use separate_wider_regex with capture groups?
Yes implicitly. The named patterns ARE the capture groups; the function generates the regex internally.
What happens if my input doesn't fully match the pattern?
Errors by default. Pass too_few = "align_start" to tolerate partial matches with NA fill.
Does separate_wider_regex use Perl regex?
Standard PCRE-compatible regex. Most regex syntax you know from elsewhere applies.