dplyr case_when() in R: Vectorized Multi-Way If-Else
The case_when() function in dplyr returns one of several values based on a sequence of conditions. It is the readable alternative to nested ifelse() calls when you have three or more outcomes.
case_when(x < 0 ~ "neg", x == 0 ~ "zero", x > 0 ~ "pos") # 3-way case_when(x < 0 ~ "neg", x >= 0 ~ "non-neg") # 2-way case_when(x < 0 ~ "neg", TRUE ~ "non-neg") # with default case_when(x < 0 ~ "neg", x == 0 ~ "zero", .default = "pos") # explicit default case_when(is.na(x) ~ "missing", x > 0 ~ "pos", TRUE ~ "neg-or-zero")# NA handling mutate(df, label = case_when(...)) # inside mutate case_match(x, c(1,2) ~ "low", c(3,4) ~ "high") # set membership
Need explanation? Read on for examples and pitfalls.
What case_when() does in one sentence
case_when() evaluates a sequence of condition ~ value pairs and returns the value of the first matching condition for each element. It is vectorized, handles NA explicitly, and enforces consistent return types. The trailing TRUE ~ default (or the .default argument) catches anything unmatched.
Unlike base R ifelse() for multiple branches (which requires nesting), case_when() reads top-to-bottom like a switch statement. Each line is one condition; the first match wins.
Syntax
case_when() takes a sequence of condition ~ value formulas. Conditions are logical vectors. Values are the result for matched elements. Add TRUE ~ default or .default = value to handle the unmatched case.
The full signature is:
case_when(..., .default = NULL, .ptype = NULL, .size = NULL)
The ... argument is a sequence of condition ~ value formulas, evaluated in order. .default (dplyr 1.1+) provides a fallback for unmatched elements. Older code uses TRUE ~ value for the same purpose.
case_when(x < 0 ~ "neg", x > 0 ~ 0) errors because "neg" is character but 0 is numeric. Either make both numeric (-1 instead of "neg") or both character ("0" instead of 0). dplyr is strict here on purpose; base R ifelse would silently coerce.Six common patterns
1. Three-way classification
Conditions are evaluated top to bottom; the first match wins. mpg < 15 is checked first; if true, returns "thirsty" and stops. If false, the next condition runs.
2. Explicit default with .default (dplyr 1.1+)
.default replaces the older TRUE ~ "large" idiom. Both work; .default is more explicit and self-documenting.
3. Handling NA explicitly
is.na(score) is checked first so missing values are labeled, not silently dropped. Without this branch, NA score rows would return NA from case_when().
4. Multiple-condition compound branches
Each condition can reference any column or combination. Use & for AND, | for OR.
5. Match-based version with case_match (dplyr 1.1+)
case_match() is a specialized form for matching against fixed values. Cleaner than case_when() when conditions are all x == value style.
6. case_when in summarise
case_when() works inside summarise() too, where the conditions reference aggregated values.
case_when() are checked in ORDER and stop at the first match. Order matters. Put the most specific conditions first and the most general (catchall) last. Reversing them silently produces wrong results: a TRUE ~ "default" placed first would short-circuit every later condition.case_when() vs base R nested ifelse()
Nested ifelse() for 3+ branches is unreadable; case_when() is the modern replacement. Both produce the same result but case_when is dramatically clearer past two branches.
| Branches | dplyr | Base R |
|---|---|---|
| 2 | if_else(x>0, "p", "n") |
ifelse(x>0, "p", "n") |
| 3 | case_when(x<0~"n", x==0~"z", TRUE~"p") |
ifelse(x<0, "n", ifelse(x==0, "z", "p")) |
| 4+ | case_when(...) reads top-down |
ifelse(...) chains nest deeply |
| Type strict | Yes (errors on mismatch) | No (silently coerces) |
| NA explicit | Yes (treat as condition) | No (returns NA) |
When to use which:
- 2 branches:
if_else()(dplyr) orifelse()(base) is fine. - 3+ branches: always
case_when().
Common pitfalls
Pitfall 1: forgetting the catchall branch. If no condition matches a row, case_when() returns NA for that row. Add TRUE ~ default or .default = value to ensure complete coverage.
Pitfall 2: type mismatch between branches. case_when(x < 0 ~ "neg", x > 0 ~ 0) errors. Pick one type for all return values. Numeric returns must all be numeric; character returns must all be character.
case_when(x > 0 ~ "pos", x > 100 ~ "huge") returns "pos" for everything above 0, and "huge" is unreachable. Put more specific conditions first.Pitfall 3: confusing case_when() with case_match(). Use case_when() when conditions are arbitrary expressions (x > 0, is.na(x), etc.). Use case_match() when conditions are all x %in% values or x == value.
Try it yourself
Try it: Use case_when() to classify mtcars$hp into "low" (< 100), "medium" (100 to 200), and "high" (> 200). Save the result as a new column hp_class in ex_classed.
Click to reveal solution
Explanation: Conditions evaluate in order. The first match wins. .default = "high" catches anything not matched by the prior two conditions (i.e., hp > 200).
Related dplyr functions
After mastering case_when(), look at:
if_else(): two-branch type-strict alternative toifelse()case_match(): specialized version for value matchingrecode(): simpler value renaming (now superseded bycase_match())cut(): base R for binning numeric ranges into categorical breaksforcats::fct_recode(): rename factor levels
For very many conditions or lookup-table style mappings, consider a left_join with a lookup tibble instead of a long case_when().
FAQ
How do I add a default value in dplyr case_when?
Two options. The traditional TRUE ~ default_value as the LAST branch (works in all versions). The modern .default = default_value argument (dplyr 1.1+, more explicit). Both produce identical results.
How does case_when handle NA values in dplyr?
If no condition matches and there is no TRUE ~ default or .default, the result is NA for that row. NA values themselves are FALSE for comparisons (NA > 0 is NA, treated as not matching). Use is.na(x) ~ "missing" as the first branch to label NAs explicitly.
What is the difference between case_when and case_match in dplyr?
case_when() takes arbitrary logical expressions (x > 0, is.na(x), cyl == 4 & mpg > 25). case_match() is specialized for matching values: case_match(x, c(1,2) ~ "low", c(3,4) ~ "high"). Use case_match() when all conditions are equality or membership tests; use case_when() for everything else.
Can I use case_when outside of mutate?
Yes. case_when() returns a vector, so you can use it anywhere a vector is expected: inside mutate(), summarise(), filter(), or assigned directly to a new variable: df$label <- case_when(...). The most common use is inside mutate().
Why does case_when error with type errors?
Each branch returns a value; all branches must return the same type (numeric, character, logical, etc.) or types that coerce cleanly. case_when(x ~ "a", y ~ 1) errors because "a" is character and 1 is numeric. Either make all branches return character ("1") or all return numeric (-1 instead of "a").