tidyr separate() in R: Split One Column Into Many
The separate() function in tidyr splits a column into multiple columns by a delimiter or regex. Modern alternatives separate_wider_delim() and separate_wider_regex() are stricter and recommended for new code.
separate(df, col = name, into = c("first","last"), sep = " ")
separate(df, col = date, into = c("y","m","d"), sep = "-", convert = TRUE)
separate_wider_delim(df, cols = name, delim = " ", names = c("first","last"))
separate_wider_regex(df, cols = code, patterns = c(letter="[A-Z]+", digit="[0-9]+"))
separate(df, col = ..., extra = "merge") # too many pieces -> last gets rest
separate(df, col = ..., fill = "right") # too few pieces -> NA on right
unite(df, "full", first, last, sep = " ") # opposite of separateNeed explanation? Read on for examples and pitfalls.
What separate() does in one sentence
separate() takes one column and splits it into N columns based on a delimiter or position. Each input row produces one output row; only the column structure changes.
For new code, prefer separate_wider_delim() and separate_wider_regex(). They are stricter (you must specify exactly how many pieces to expect) and produce clearer errors when the data does not match the expected pattern.
Syntax
separate(data, col, into, sep) is the basic form. col is the column to split; into is a vector of new column names; sep is the delimiter (default: any non-alphanumeric).
separate_wider_delim() for new code. It is stricter and gives clearer errors than the legacy separate(). The legacy form silently truncates or pads when the number of pieces does not match into.Five common patterns
1. Split by simple delimiter
sep = " " splits on space. The two pieces become first and last.
2. Convert types automatically
convert = TRUE runs type.convert() on each new column, so numeric strings become numeric.
3. Modern strict form
separate_wider_delim() errors if any row has a different number of pieces. The strictness catches data quality issues that separate() would silently fix.
4. Handle uneven splits
extra = "merge" keeps extra pieces in the last column. fill = "right" pads NAs on the right when too few pieces. Together they handle messy data without errors.
5. Regex-based split
separate_wider_regex() lets you name pieces by REGEX PATTERN, not delimiter. Useful when boundaries are between TYPES (letter vs digit) rather than fixed delimiters.
separate() is being superseded by separate_wider_delim(), separate_wider_position(), and separate_wider_regex() in tidyr 1.3+. The newer functions are stricter and error early on bad data. For new code, prefer them; legacy separate() is still in tidyr for backward compatibility.Common pitfalls
Pitfall 1: silent truncation in legacy separate. With sep = " " and a name like "Mary Jane Smith", legacy separate truncates to first/last, dropping "Jane" silently. Use extra = "merge" or switch to separate_wider_delim() for explicit handling.
Pitfall 2: regex special chars in sep. sep is interpreted as a regex by default. To split on a literal ., escape: sep = "\\.". Or use separate_wider_delim() which treats delim as a literal string.
separate() produces a warning when rows have different numbers of pieces. Pay attention to it. Without extra and fill arguments, the function picks defaults that may silently corrupt data.Try it yourself
Try it: Split the column "version" containing strings like "v1.2.3" into three columns: major, minor, patch. Save to ex_split.
Click to reveal solution
Explanation: Unnamed patterns ("v", "\\.") match but get DROPPED. Named patterns become columns. Regex [0-9]+ captures one or more digits. Each row must match the pattern exactly.
Related tidyr functions
After mastering separate, look at:
unite(): combine columns into one (inverse of separate)separate_wider_position(): split by character position, not delimiterseparate_longer_delim(): split into rows instead of columnsextract(): regex-based column extractionpivot_longer()plusseparate(): combine reshaping with splitting
For pure string operations on a single column, stringr::str_split() returns a list-column you can unnest().
FAQ
How do I split a column into multiple columns in R?
Use tidyr::separate(df, col, into = c(...), sep = ...) for the legacy form. For new code, prefer separate_wider_delim(df, cols, delim = "...", names = c(...)). Both split one column based on a delimiter.
What is the difference between separate and separate_wider_delim?
separate() is the legacy function. separate_wider_delim() (tidyr 1.3+) is stricter: it errors when row pieces do not match the expected count, while separate() silently truncates or pads. Prefer the modern one for new code.
How do I split a column using a regex in R?
Use separate_wider_regex(cols, patterns = c(name = "regex", ...)). Each named pattern becomes a new column; unnamed patterns match but are dropped.
How do I handle missing pieces with separate?
Use fill = "right" (pads NA on the right) or fill = "left" (pads on the left). For too many pieces, extra = "merge" puts extras in the last column; extra = "drop" discards them.
What does the convert argument do in separate?
convert = TRUE runs type.convert() on the new columns, converting numeric strings to numbers, etc. Without it, all output columns are character.