tidyr separate_wider_regex() in R: Split Column by Regex

The separate_wider_regex() function in tidyr 1.3 splits a string column into multiple columns based on a sequence of REGEX PATTERNS. Each named pattern captures a part of the string into a new column.

⚡ Quick Answer
df |> separate_wider_regex(col, patterns = c(year="\\d{4}", "-", month="\\d{2}", "-", day="\\d{2}"))
df |> separate_wider_regex(col, patterns = c(letter="[A-Z]+", num="\\d+"))
df |> separate_wider_regex(col, patterns = c(name="[a-z]+", "@", domain="\\S+"))
df |> separate_wider_delim(col, delim = "-")  # simpler alternative
df |> separate_wider_position(col, widths = c(...)) # for fixed widths

Need explanation? Read on for examples and pitfalls.

📊 Is separate_wider_regex() the right tool?
STARTcomplex regex pattern with named groupsseparate_wider_regex()simple delimiterseparate_wider_delim() (cleaner)fixed widthsseparate_wider_position()match anywhere with regexstringr::str_match()skip characters between groupsunnamed string in patterns

What separate_wider_regex() does in one sentence

separate_wider_regex(data, cols, patterns) matches each value of cols against a CONCATENATED sequence of regex patterns; each named pattern becomes a new column. Unnamed strings in patterns are skipped.

Syntax

separate_wider_regex(data, cols, patterns, too_few = "error", cols_remove = TRUE). patterns is a NAMED character vector.

Run live
Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.
RParse YYYY-MM-DD with regex
library(tidyr) library(dplyr) df <- tibble(date_str = c("2024-01-15","2025-03-20")) df |> separate_wider_regex( date_str, patterns = c(year = "\\d{4}", "-", month = "\\d{2}", "-", day = "\\d{2}") ) #> year month day #> 2024 01 15 #> 2025 03 20

  
Tip
Unnamed elements in patterns (literal strings or unnamed regex) are MATCHED but not captured. Use them as separators between named groups.

Five common patterns

1. Letter prefix + number suffix

RA123 -> letter='A', num='123'
df <- tibble(code = c("A123","B45")) df |> separate_wider_regex(code, patterns = c(letter = "[A-Z]+", num = "\\d+"))

  

2. Email address

Ruser@domain
df <- tibble(email = c("alice@example.com","bob@gmail.com")) df |> separate_wider_regex(email, patterns = c(name = "[\\w.]+", "@", domain = "\\S+"))

  

3. Date with delimiter

RYear-Month-Day
df |> separate_wider_regex( date_str, patterns = c(year = "\\d{4}", "-", month = "\\d{2}", "-", day = "\\d{2}") )

  

4. Skip parts of input

RMatch 'ID:123' but extract only number
df <- tibble(s = c("ID:123","ID:456")) df |> separate_wider_regex(s, patterns = c("ID:", id = "\\d+")) #> id #> 123 #> 456

  

5. Multi-step regex parse

RComplex token format
df <- tibble(token = c("v2.5.1-beta","v3.0.0-alpha")) df |> separate_wider_regex( token, patterns = c("v", major = "\\d+", "\\.", minor = "\\d+", "\\.", patch = "\\d+", "-", tag = "\\w+") )

  
Key Insight
separate_wider_regex is the regex sister of separate_wider_delim and separate_wider_position. Use regex when patterns are complex (e.g., variable-length parts, alternation). For simple delim or position, use the simpler functions.

separate_wider_regex() vs str_match() vs separate_wider_delim()

Function Output Best for
separate_wider_regex() Multi-column tibble Structured regex parsing
stringr::str_match() Matrix of capture groups One-off vector extraction
separate_wider_delim() Multi-column tibble Simple delimiter
separate_wider_position() Multi-column tibble Fixed widths

When to use which:

  • regex for complex patterns.
  • delim for simple delimiters.
  • position for fixed widths.
  • str_match for one-time extraction outside dplyr.

A practical workflow

Use separate_wider_regex when input has STRUCTURE the simpler functions can't capture.

RInteractive R
log_lines |> separate_wider_regex( raw, patterns = c( ts = "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}", " \\[", level = "\\w+", "\\] ", msg = ".*" ) )

  

Parse log entries into timestamp, level, and message in one step.

Common pitfalls

Pitfall 1: too_few = "error" by default. If a row doesn't match the full pattern, it errors. Pass too_few = "align_start" for partial matches.

Pitfall 2: greedy regex eating too much. pattern = ".*" is greedy. Use .*? (non-greedy) or anchored alternatives.

Warning
separate_wider_regex() requires the FULL string to match the concatenated pattern. Each character of the input must be consumed by some part of patterns. Use unnamed strings to "skip" segments.

Try it yourself

Try it: Parse "v2.5.1" into major, minor, patch integer components. Save to ex_ver.

RYour turn: parse semver
df <- tibble(v = c("v2.5.1","v3.0.10")) ex_ver <- df |> # your code here ex_ver #> Expected: 3 columns major, minor, patch

  
Click to reveal solution
RSolution
ex_ver <- df |> separate_wider_regex( v, patterns = c("v", major = "\\d+", "\\.", minor = "\\d+", "\\.", patch = "\\d+") ) ex_ver #> major minor patch #> 2 5 1 #> 3 0 10

  

Explanation: Match "v" literally, then capture digits as major, minor, patch with literal dots between.

After mastering separate_wider_regex, look at:

  • separate_wider_delim(): simpler delimiter
  • separate_wider_position(): fixed widths
  • separate_longer_delim(): split into rows
  • stringr::str_match(): lower-level vector extraction
  • unite(): combine columns

FAQ

What does separate_wider_regex do in tidyr?

Splits a string column into multiple columns by matching a sequence of regex patterns. Named patterns become columns; unnamed are matched but discarded.

What is the difference between separate_wider_regex and separate_wider_delim?

regex uses regex patterns (more flexible). delim uses a literal delimiter (simpler). Use regex when the pattern is too complex for a single delimiter.

Can I use separate_wider_regex with capture groups?

Yes implicitly. The named patterns ARE the capture groups; the function generates the regex internally.

What happens if my input doesn't fully match the pattern?

Errors by default. Pass too_few = "align_start" to tolerate partial matches with NA fill.

Does separate_wider_regex use Perl regex?

Standard PCRE-compatible regex. Most regex syntax you know from elsewhere applies.