tidyr unite() in R: Combine Columns Into One
The unite() function in tidyr combines multiple columns into a single column with a separator. It is the inverse of separate() and the most direct way to build composite identifiers like dates or full names.
unite(df, "full_name", first, last, sep = " ") # combine 2 cols unite(df, "id", year, month, day, sep = "-") # 3 cols with dash unite(df, "label", year, month, sep = "_", remove = FALSE) # keep originals unite(df, "date", -id, sep = "-") # all except id unite(df, "tag", a, b, sep = "_", na.rm = TRUE) # skip NAs df |> mutate(full = paste(first, last, sep = " ")) # base R alternative
Need explanation? Read on for examples and pitfalls.
What unite() does in one sentence
unite() takes multiple columns and concatenates their values into a single new column with a separator. By default, the original columns are removed; set remove = FALSE to keep them.
This is the simplest tool for building composite keys (e.g., date from y/m/d), full names, or any concatenated identifier.
Syntax
unite(data, col, ..., sep, remove, na.rm). col is the new column name; ... lists columns to combine; sep is the separator string.
sep = "_" is rarely what you want for human-readable text. Always specify sep explicitly: " " for names, "-" for dates, etc. The default underscore is more for machine identifiers.Five common patterns
1. Two columns into one
The simplest case: two columns become one with a space.
2. Keep originals with remove = FALSE
remove = FALSE adds the new column without dropping the originals.
3. Build a date from components
For zero-padded months/days, format first: mutate(m = sprintf("%02d", m), d = sprintf("%02d", d)).
4. Skip NAs
na.rm = TRUE skips NA values, joining only the non-missing pieces.
5. Tidyselect for many columns
-id selects all columns except id. Tidyselect helpers like starts_with() work too.
unite() is just paste() with a clean dplyr-friendly interface. The base R equivalent is mutate(new = paste(a, b, c, sep = "_")). Use unite when working in tidyr/dplyr pipelines; use paste when not.Common pitfalls
Pitfall 1: forgetting remove = FALSE. Default removes the original columns. If you want to keep them, set remove = FALSE explicitly.
Pitfall 2: numeric inputs converted to strings. unite() always returns a character column. Numeric inputs (like year) become "2024", which is a string. Convert back if needed: mutate(date = as.Date(date)).
na.rm = FALSE (default) propagates "NA" as a string in the result. With NA in any input column, the output is something like "Alice-NA-Smith". Always set na.rm = TRUE if NAs should be skipped, or fill NAs before unite.Try it yourself
Try it: Combine iris$Species and a new "year" column (2024) into a single "tag" column like "setosa_2024". Save to ex_tagged.
Click to reveal solution
Explanation: unite("tag", Species, year, sep = "_") combines the Species and year columns into "tag". remove = FALSE keeps the originals. Each row's tag is the species name plus year, joined with an underscore.
Related tidyr functions
After mastering unite, look at:
separate(): split one column into many (inverse)paste(),paste0(): base R column concatenationglue::glue(): template-based string interpolationstringr::str_c(): stringr's vectorized string concatenationunite()plusarrange(): build a sortable composite key
For complex composite IDs, glue("{first}_{year}_{country}") inside mutate() is more flexible than unite().
FAQ
How do I combine multiple columns into one in R?
Use tidyr::unite("new", col1, col2, sep = "_") for tidyr style. Or base R: mutate(new = paste(col1, col2, sep = "_")). Both produce a single character column from multiple inputs.
What is the difference between unite and paste in R?
unite() is a dplyr-friendly verb that adds a new column to a data frame and optionally removes the originals. paste() is a base R function that concatenates strings; you wrap it in mutate() to add to a data frame.
How do I keep the original columns after unite?
Set remove = FALSE: unite(df, "new", a, b, sep = "_", remove = FALSE). Default remove = TRUE drops the columns being combined.
How do I handle NA values in unite?
Set na.rm = TRUE: unite("new", a, b, sep = "_", na.rm = TRUE) skips NA values when concatenating. Default na.rm = FALSE includes "NA" as a string in the result.
What separator should I use for unite?
Use a separator that does NOT appear in your data. For names: " ". For dates: "-". For machine identifiers: "_". Avoid characters used in your data (e.g., "," if values contain commas).