forcats fct_collapse() in R: Merge Factor Levels by Group
forcats fct_collapse() in R merges several named factor levels into new, fewer groups in a single call. You pass it a factor and one named argument per output group, and it rewrites every listed level to that group's name.
fct_collapse(f, new = c("a", "b")) # merge a and b into new
fct_collapse(f, x = c("a","b"), y = "c") # several groups at once
fct_collapse(f, big = grp, other_level = "X") # bucket unnamed levels into X
fct_collapse(f, kept = "a") # rename one level, keep rest
df |> mutate(g = fct_collapse(g, lo = "a")) # inside a dplyr pipeline
fct_count(fct_collapse(f, lo = "a")) # tally the collapsed factorNeed explanation? Read on for examples and pitfalls.
What fct_collapse() does
fct_collapse() performs a many-to-few mapping on factor levels. You give it a factor and one named argument for each group you want in the result. Every level you list under a name is rewritten to that name, so a dozen messy levels become three or four meaningful ones.
This is the manual, many-group counterpart to fct_other(). Where fct_other() produces exactly one catch-all bucket, fct_collapse() lets you define as many target groups as you need in the same call. The return value is always a factor, ready for plots, models, and dplyr summaries.
Syntax
fct_collapse() takes a factor plus one named argument per output group. The signature is short:
The arguments behave like this:
| Argument | What it does |
|---|---|
.f |
The factor (or character vector) to modify. |
... |
Named arguments. Each name is a new group; its value is a character vector of old levels to merge. |
other_level |
Optional label for a catch-all group. Every level not named in ... is collapsed into it. |
Each ... name becomes a level in the output. A level you do not mention anywhere stays unchanged unless you set other_level, in which case it joins the catch-all group.
fct_collapse() replaces a chain of levels(f)[levels(f) %in% c("a","b")] <- "new" assignments. The forcats version handles every group in one call, manages level ordering, and never mutates the original factor.Examples by use case
Run these examples top to bottom. Each block builds on a shared session, so objects created earlier stay available in the blocks that follow.
Merge level sets into new groups
Pass one named argument per group to define the mapping. Here six month abbreviations collapse into two quarters.
The names Q1 and Q2 become the new levels, and each appears where its first member sat in the original ordering.
Collapse a real survey factor
fct_collapse() shines on high-cardinality survey data. The gss_cat$partyid factor bundled with forcats has ten levels. The call below folds them into three political groups and leaves the rest alone.
The three minor levels (No answer, Don't know, Other party) survive untouched because they were never named.
Bucket the leftover levels with other_level
Set other_level to sweep every unnamed level into one group. This combines a precise mapping with a single catch-all in the same call.
other_level when leftover levels would clutter a chart. Without it, the three minor partyid levels stay as separate slivers. With it, they merge into a tidy Other group of 548 rows that sorts last.Use fct_collapse() inside a dplyr pipeline
Wrap fct_collapse() in mutate() to clean a column in place. This is the most common production use: simplify a factor right before a count() or a plot.
The six marital levels collapse into a clean three-level factor that drops straight into a bar chart.
fct_collapse() vs fct_recode and friends
fct_collapse() merges many levels into few; the alternatives each solve a narrower problem. Choosing the right one keeps your code readable and your intent obvious.
| Function | What it does | You specify |
|---|---|---|
fct_collapse() |
Merges named level sets into new groups | One named vector per output group |
fct_recode() |
Renames levels one-to-one | A new name for each old level |
fct_other() |
Collapses unnamed levels into one bucket | A keep or drop list |
fct_lump_n() |
Lumps rare levels by frequency | How many levels to keep (n) |
fct_relevel() |
Reorders levels without merging | The desired level order |
fct_recode(). The moment two or more old levels share a single new name, the operation is a collapse, and fct_collapse() expresses it in one readable call.Common pitfalls
1. Expecting unnamed levels to disappear. A level you never mention stays in the result unchanged. If you want the remainder gone, set other_level so they collapse into a catch-all instead of lingering:
2. Misspelling an old level name. A typo is not a hard error, but forcats warns and leaves the real level uncollapsed, so a wrong result can ship if you ignore the warning:
3. Using the deprecated group_other argument. Older code passed group_other = TRUE to bucket the remainder. Modern forcats deprecates it. Use other_level = "Other" instead, which also lets you choose the label.
Try it yourself
Try it: Collapse gss_cat$rincome so the three highest brackets become High and the rest collapse into Other. Save the result to ex_income.
Click to reveal solution
Explanation: The named High argument merges the three top brackets, and other_level sweeps every remaining level into a single Other group.
Related forcats functions
These functions pair well with fct_collapse() when you reshape categorical data:
- fct_recode() renames levels one-to-one without merging them.
- fct_other() collapses every unnamed level into a single Other bucket.
- fct_lump_n() lumps the rarest levels automatically by frequency.
- fct_relevel() reorders levels after you collapse them.
- fct_count() tallies levels so you can verify the result.
For the wider picture, see the guide to categorical data in R and the official forcats reference.
FAQ
What is the difference between fct_collapse() and fct_recode()? fct_recode() renames levels one-to-one: each old level maps to exactly one new name. fct_collapse() merges many levels into few: you list several old levels under one new group name. If two or more old levels should share a single label, use fct_collapse(). If every level just needs a cleaner name, fct_recode() is the simpler choice. Both return a factor and leave the original untouched.
How do I collapse the levels I did not name? Set the other_level argument. Any level not listed in a named group collapses into a catch-all level with that label, for example other_level = "Other". Without other_level, unnamed levels stay in the result unchanged. This makes other_level the clean way to combine a precise mapping with a single remainder bucket in one call.
Does fct_collapse() work on character vectors? Yes. If you pass a character vector, forcats coerces it to a factor first and then applies the collapsing. The return value is always a factor, never a character vector. To keep your code explicit, wrap the input in factor() yourself before the call so the level set is clear to anyone reading it.
Why does fct_collapse() warn about unknown levels? The warning means a level name you passed inside ... does not exist in the factor, usually because of a typo. forcats reports it as Unknown levels in .f and leaves the genuine level uncollapsed. Check the names against levels(f) and fix the spelling. The warning is easy to miss in a long script, so treat it as a real error.
Can fct_collapse() merge levels into more than one group at once? Yes, that is its main purpose. Pass one named argument per output group, such as Republican = c(...), Democrat = c(...). Each name becomes a level in the result, and every old level you list under it is rewritten to that name. This many-group capability is what separates fct_collapse() from fct_other(), which always produces a single Other bucket.