tidyr expand() in R: Generate Combinations of Existing Columns
The expand() function in tidyr generates all unique COMBINATIONS of values from one or more columns of an existing data frame. It returns ONLY the combinations (no other columns).
df |> expand(year, product) # all year-product combos df |> expand(nesting(year, quarter)) # preserve existing pairs df |> expand(year = 2020:2024, product) # custom values df |> complete(year, product) # different: merges back to original expand_grid(year = 2020:2024, product = c("X","Y")) # from vectors
Need explanation? Read on for examples and pitfalls.
What expand() does in one sentence
expand(data, ...) returns a tibble of all UNIQUE combinations of values from the named columns of data, dropping all other columns. It is "give me the combinations only", no original rows.
Syntax
expand(data, ...). ... are columns or expressions like year = 2020:2024.
expand() returns COMBINATIONS only. Use complete() if you want the original data merged back with the missing combinations filled.Five common patterns
1. Cross-product of existing values
2. Custom values
3. Nesting (preserve pairs)
4. With group_by
5. Use as left side of full_join
expand() and complete() are sister functions: complete = expand + left_join back to original. Use expand when you want JUST combinations; use complete when you want the original data with missing combinations filled in.expand() vs complete() vs expand_grid()
| Function | Inputs | Output |
|---|---|---|
expand(data, ...) |
Data + columns | Combinations only |
complete(data, ...) |
Data + columns | Original data + missing combos |
expand_grid(...) |
Vectors / lists | Combinations from scratch |
crossing(...) |
Same as expand_grid | (alias) |
When to use which:
- expand for combinations from existing data.
- complete for filling in missing rows.
- expand_grid for vector inputs.
A practical workflow
Use expand to generate "all expected combinations" reference tables.
Check which (month, region, product) combinations have no transactions.
Common pitfalls
Pitfall 1: cross-product size. expand(year, product, region) is years products regions. For high-cardinality data, this can be huge.
Pitfall 2: expand drops original rows. It returns only combinations. Use complete to keep the original data.
expand() returns UNIQUE combinations only. Duplicate (year, product) rows in the input contribute one row to the output.Try it yourself
Try it: Generate all (cyl, gear) combinations present in mtcars. Save to ex_combos.
Click to reveal solution
Explanation: expand returns the cross product of the unique cyl and gear values in mtcars.
Related tidyr functions
After mastering expand, look at:
complete(): expand + merge with originalexpand_grid(): from vectorscrossing(): alias for expand_gridnesting(): preserve column pairscross_join(): full Cartesian product of tables
FAQ
What does expand do in tidyr?
expand(data, ...) returns a tibble of all unique combinations of values from the named columns of data. Other columns are dropped.
What is the difference between expand and complete?
expand returns ONLY combinations (no original data). complete merges them back with the original. complete = expand + left_join.
What is the difference between expand and expand_grid?
expand operates on a data frame's existing values. expand_grid takes vectors / lists directly. Both produce combinations.
What does nesting do inside expand?
nesting(col1, col2) preserves the existing pairings of col1 and col2 (no cross-product between them). Useful for hierarchical data like (year, quarter).
Can I expand with custom values?
Yes. Pass year = 2020:2025 etc. inside expand() to override the actual values used for that column.