tidyr separate_wider_position() in R: Split by Character Position
The separate_wider_position() function in tidyr 1.3 splits a string column into multiple columns based on FIXED CHARACTER POSITIONS. It is the right tool for fixed-width formats like dates without delimiters or coded IDs.
df |> separate_wider_position(col, widths = c(year=4, month=2, day=2)) df |> separate_wider_position(col, widths = c(prefix=3, code=5, suffix=2)) df |> separate_wider_position(col, widths = c(year=4, NA, month=2)) # skip df |> separate_wider_delim(col, delim = "-") # different: delimiter-based df |> separate_wider_regex(col, patterns = c(...)) # regex-based
Need explanation? Read on for examples and pitfalls.
What separate_wider_position() does in one sentence
separate_wider_position(data, cols, widths) splits each value of cols at the cumulative positions defined by the named integer vector widths. Each name becomes a new column with the corresponding number of characters.
Syntax
separate_wider_position(data, cols, widths, too_few = "error", too_many = "error", cols_remove = TRUE). widths is a named integer vector.
NA as the name: widths = c(year=4, NA, month=2) skips 2 characters between year and month.Five common patterns
1. Date string
2. Code with prefix and suffix
3. Skip middle characters
4. Handle short strings with too_few
5. Combine with other tidy operations
widths = c(a=3, b=2) means "first 3 chars to a, next 2 chars to b". Cumulative positions: a is chars 1-3, b is chars 4-5.separate_wider_position() vs separate_wider_delim() vs str_sub
| Function | Splits by | Best for |
|---|---|---|
separate_wider_position() |
Character positions | Fixed-width formats |
separate_wider_delim() |
Delimiter | Variable-width parts |
separate_wider_regex() |
Regex groups | Pattern-based |
stringr::str_sub() |
Position substring | One column at a time |
When to use which:
- separate_wider_position for FIXED-WIDTH (dates, codes).
- separate_wider_delim for DELIMITED.
- separate_wider_regex for COMPLEX patterns.
A practical workflow
Use for fixed-width data formats common in legacy systems.
Parse a structured ID into its semantic components in one step.
Common pitfalls
Pitfall 1: forgetting to name widths. Unnamed widths are dropped (treated as skip). Always name segments you want to keep.
Pitfall 2: mismatched total width. If widths sum to less than string length, extras are dropped silently. Use too_many = "error" to catch.
separate_wider_position() operates on CHARACTER counts, not BYTE counts. For multi-byte UTF-8 strings, the count is by codepoint, not byte.Try it yourself
Try it: Split a 7-character ID like "A12-CD" into 3 named segments. Save to ex_parsed.
Click to reveal solution
Explanation: Skip the dash between num and code with NA.
Related tidyr functions
After mastering separate_wider_position, look at:
separate_wider_delim(): delimiter-basedseparate_wider_regex(): regex-basedseparate_longer_delim(): split into rowsunite(): combine columnsstringr::str_sub(): position-based substring
FAQ
What does separate_wider_position do in tidyr?
Splits a string column into multiple columns based on character widths. Each width specifies how many characters go into each new column.
How do I skip characters with separate_wider_position?
Use NA as the name in widths: widths = c(a=3, NA, b=2) skips one character between a and b.
What is the difference between separate_wider_position and separate_wider_delim?
position uses fixed character counts; delim uses a delimiter. Use position for fixed-width (YYYYMMDD); delim for variable parts (a-b-c).
Can I parse multi-byte UTF-8 with separate_wider_position?
Yes. Counts are by codepoint, not byte.
What happens if the string is shorter than the widths?
By default it errors. Pass too_few = "align_start" to fill with NA, or "align_end" for right-alignment.