dplyr nest_join() in R: Join With Right Stored as List Column
The nest_join() function in dplyr joins two tables where each matched group from the RIGHT is stored as a NESTED data frame inside a list column. Each row of x ends up with a list-column of all matching y rows.
nest_join(x, y, by = "id") # right matches as list column nest_join(x, y, by = "id", name = "data") # custom column name left_join(x, y, by = "id") # different: rows multiply inner_join(x, y, by = "id") # different: only matched rows df |> nest_join(...) |> tidyr::unnest(y) # back to flat form
Need explanation? Read on for examples and pitfalls.
What nest_join() does in one sentence
nest_join(x, y, by) returns a copy of x with ONE additional list column where each row holds a tibble of all matching rows from y. Unlike left_join, the row count of x is preserved; matching y rows are bundled instead of duplicated.
This is the "nested" filter+merge pattern. Useful when you want each x row to keep ALL its y matches in one place for downstream per-row computation.
Syntax
nest_join(x, y, by = NULL, keep = FALSE, name = NULL). name is the new list-column's name (defaults to the y data frame's deparsed name).
Customer 2 has an empty tibble (no orders). The new column is named after the y argument.
nest_join() preserves x's row count exactly (one nested cell per row). This is the key difference from left_join, which multiplies rows when y has multiple matches.Five common patterns
1. Standard nest_join
2. Custom column name
3. Inspect a single row's nested data
The list column is accessed with [[i]] (each cell is a tibble).
4. Combine with map for many-models or per-row analysis
purrr::map_dbl over the list column gives per-row aggregation without losing the raw rows.
5. Unnest back to flat form
unnest reverses nest_join.
nest_join() is the dplyr way to express "for each x row, give me ALL its y matches as a sub-table". This is the "many-models" / "per-customer summary" pattern made first-class. left_join multiplies rows; nest_join keeps them and stores matches as a list column.nest_join() vs left_join() vs group_by() + nest()
Three approaches to "x with its y matches" in dplyr.
| Approach | Result shape | Best for |
|---|---|---|
nest_join(x, y) |
x's row count, list column from y | Direct nested lookup |
left_join(x, y) |
Possibly multiplied rows | Flat output |
left_join + group_by(x_id) + nest() |
Same as nest_join | More verbose; equivalent |
When to use which:
nest_joinfor direct "x with nested y matches".left_joinfor flat output with one row per match.group_by + nestif you need additional aggregation before nesting.
A practical workflow
Use nest_join when each x row needs to do per-row computation on ALL its y matches.
Per customer, compute order count, total, and average from the nested tibble. Without nest_join you'd group_by + summarise, then join back; nest_join compresses the workflow.
Common pitfalls
Pitfall 1: empty tibble for unmatched rows. Customers without orders get a 0-row tibble (not NA). Map functions need to handle empty input gracefully.
Pitfall 2: assuming nest_join is a JSON-like nested join. It is not: each cell is a TIBBLE (data frame), not a list of fields. Access with [[i]] or purrr::map.
nest_join is more memory-intensive than left_join because each cell holds a tibble. For very large data, the overhead matters. Profile before using on >1M rows.Try it yourself
Try it: For each car name in mtcars, attach a nested tibble of all cars sharing the same cyl value (a "siblings" list). Save to ex_nested.
Click to reveal solution
Explanation: Self-nest_join by cyl. Each car's "siblings" cell is a tibble of all cars (including itself) with the same cyl.
Related dplyr / tidyr functions
After mastering nest_join, look at:
left_join(): flat alternativetidyr::nest(): pure nesting on grouped datatidyr::unnest(): flatten list columnspurrr::map(): per-row computation on nested cellsdplyr::group_by()+summarise(): alternative aggregation patterntibble::tribble(): build nested tibbles by hand
For tidymodels / many-models workflows, tidyr::nest() followed by mutate(model = map(data, ~ lm(y ~ x, .))) is the canonical pattern.
FAQ
What does nest_join do in dplyr?
nest_join(x, y, by) adds a list column to x where each cell is a tibble of all matching rows from y. x's row count is preserved.
What is the difference between nest_join and left_join?
left_join multiplies x's rows when y has multiple matches. nest_join keeps x's row count and stores y's matches as a list column. Same data, different shape.
How do I unnest the result of nest_join?
tidyr::unnest(result, name_of_list_col) flattens the nested column. Equivalent to left_join (with keep_empty option for unmatched rows).
Why does my unmatched row have a 0-row tibble?
Because nest_join keeps x's row but no y rows match. The cell is an empty tibble, not NA. purrr::map(orders, nrow) returns 0 for those rows.
When should I use nest_join over left_join?
When each x row needs per-row computation on ALL its y matches (counting, summing, fitting a model). nest_join keeps the raw y rows with the x row instead of separating them.