tidyr complete() in R: Fill Missing Combinations of Columns
The complete() function in tidyr ensures every COMBINATION of specified column values is present in the data frame, inserting NA rows where missing. It is the "make implicit missing values explicit" operation.
df |> complete(year, product) # fill all year-product combos df |> complete(year, product, fill = list(qty = 0)) df |> complete(nesting(year, quarter), product) # respect existing pairs df |> tidyr::expand(year, product) # similar but returns just combos df |> group_by(g) |> complete(...) # per-group complete
Need explanation? Read on for examples and pitfalls.
What complete() does in one sentence
complete(data, ...) adds rows for every combination of the named columns that doesn't already appear, with other columns set to NA (or to a fill default). It makes implicit missing combinations explicit.
Syntax
complete(data, ..., fill = list()). ... are columns whose combinations should be complete.
fill = list(qty = 0) to fill the new rows with a default instead of NA. Common for sales / metric data where missing means zero.Five common patterns
1. Standard complete
2. With fill default
3. Per-group complete
4. Nesting (preserve existing combinations)
nesting() keeps a subset of pairs together rather than generating their cross product.
5. Combine with fill
Complete ensures rows; fill carries forward values for the new NA rows.
complete() exposes IMPLICIT missing values. If 2025 had no data for product Y, the row was simply absent; complete makes it appear with NA, so downstream summaries treat it correctly.complete() vs expand() vs expand_grid() vs full_join
Four ways to fill combinations.
| Function | Inputs | Output | Best for |
|---|---|---|---|
complete(data, ...) |
Data + columns | Original data + missing rows | Make missing explicit |
expand(data, ...) |
Data + columns | Just combinations (no original data) | Pure combinations |
expand_grid(...) |
Vectors | All-pair tibble | Combinations from scratch |
full_join(x, y) |
Two tables | Joined | Different problem |
When to use which:
- complete to add missing rows in-place.
- expand for combinations without original data.
- expand_grid when starting from vectors.
A practical workflow
Use complete for time-series with sparse observations to ensure regular intervals.
Ensure every (month, product) combination exists, with qty = 0 for missing.
For per-subject completion:
Common pitfalls
Pitfall 1: cross-product explosion. complete(year, product, region) generates n_year n_product n_region rows. For high-cardinality data, this can be huge.
Pitfall 2: NA after complete. New rows have NA in non-key columns. Use fill = list(...) to default, or chain with fill() for forward-fill.
complete() does NOT use group_by automatically; it generates the FULL cross-product across the named columns. Use nesting() or group_by to restrict.Try it yourself
Try it: Ensure every (cyl, gear) combination exists in a small mtcars subset, with a count column. Save to ex_complete.
Click to reveal solution
Explanation: complete inserts the missing combinations and fills n with 0.
Related tidyr functions
After mastering complete, look at:
expand(): just combinations, no mergeexpand_grid(): combinations from vectorscrossing(): similar to expand_gridnesting(): preserve existing pairsfill(): forward-fill valuesreplace_na(): scalar fill
FAQ
What does complete do in tidyr?
complete(data, ...) adds rows for every missing combination of the named columns; other columns are set to NA (or to a fill default).
How do I avoid cross-product explosion with complete?
Use nesting(col1, col2) to keep specific column pairs together instead of generating their full cross product.
Can I fill the missing values with something other than NA?
Yes. Pass fill = list(col = 0) to set a default for the new rows.
What is the difference between complete and expand?
complete adds missing rows to the existing data frame. expand returns ONLY the combinations (without the original data). complete = expand + full_join.
Should I group_by before complete?
If you want combinations to apply within each group only, yes: group_by(g) |> complete(...) |> ungroup(). Otherwise complete operates across the full data frame.