dplyr full_join() in R: Keep All Rows From Both Tables
The full_join() function in dplyr keeps EVERY row from BOTH tables, filling unmatched rows with NA. It is the outer join: nothing is dropped.
full_join(x, y, by = "id") # all rows from both full_join(x, y, by = c("id","date")) # multi-key left_join(x, y, by = "id") # keep only x's rows inner_join(x, y, by = "id") # only matched full_join(x, y, by = join_by(id)) # tidy syntax bind_rows(x, y) # different: stack vertically
Need explanation? Read on for examples and pitfalls.
What full_join() does in one sentence
full_join(x, y, by) returns a data frame containing every row from both x and y, with columns from both sides; rows present in only one table get NA in the other side's columns. Nothing is silently dropped.
This is the outer join. Use it when missing matches are themselves the answer (e.g., reconciling two source systems).
Syntax
full_join(x, y, by = NULL, suffix = c(".x", ".y")). Same arguments as left_join.
full_join is the right tool for reconciliation tasks: "what rows are in A only, in B only, or in both?". Combined with is.na checks, it lets you classify each row.Five common patterns
1. Reconciliation between two systems
A classic full_join + classify pattern for reconciliation.
2. Multi-key join
3. Outer join then count missing
4. Mapped-key full_join
5. Tidy join_by syntax
full_join is the only join that NEVER drops a row. Use it when "row missing on one side" is itself meaningful information. For most analysis pipelines, left_join or inner_join is what you actually want.full_join() vs left_join() vs inner_join() vs union
The mutating-join family.
| Function | Keeps |
|---|---|
full_join(x, y) |
All rows from both; NA for unmatched |
left_join(x, y) |
All rows of x |
right_join(x, y) |
All rows of y |
inner_join(x, y) |
Only matched rows |
union(x, y) |
All distinct rows from BOTH (set union, requires same columns) |
When to use which:
full_joinfor reconciliation / outer joins.left_joinfor "augment my main table".inner_joinfor "complete records only".unionfor set-style row union (same shape).
A practical workflow
The "data reconciliation" pattern is full_join's signature use case.
Compare reported and actual inventory; classify each SKU. Without full_join you'd miss SKUs present in only one source.
Common pitfalls
Pitfall 1: row count grows. full_join often returns more rows than either input. Always check nrow(result) against expectations.
Pitfall 2: NAs everywhere. Unmatched rows have NA in the OTHER side's columns. Downstream code must handle NA, or use coalesce(col_a, col_b) to merge.
full_join does NOT preserve any natural row order. The result interleaves matches and unmatched rows in implementation-defined order. Always arrange() after if order matters.Why full_join is rarely the right default
full_join is necessary for reconciliation but rarely the right choice for analysis. When merging fact and dimension tables for analysis, you almost always want left_join or inner_join. Reaching for full_join by default produces an explosion of NAs in your result, which then needs careful handling in every downstream step. Use full_join only when "what's missing from each side" is the question itself. For "augment my main table", left_join is the right tool. For "complete records only", inner_join. For "reconcile two systems", full_join. Train yourself to pick by the question, not by the verb's name.
Try it yourself
Try it: Find which mtcars are also in a separate featured lookup table, classifying each as "in mtcars only", "in featured only", or "in both". Save to ex_status.
Click to reveal solution
Explanation: full_join brings together all 32 mtcars + Tesla (only in featured). case_when classifies each row.
Related dplyr functions
After mastering full_join, look at:
left_join()/right_join()/inner_join(): other mutating joinssemi_join()/anti_join(): filtering joinscoalesce(): pick first non-NA across columns (post-full_join cleanup)union(),intersect(),setdiff(): set operations on rowsbind_rows(): vertical row stackingmerge(x, y, all = TRUE): base R equivalent
For post-join NA cleanup, coalesce(col.x, col.y) is the cleanest way to merge same-named columns from both sides.
FAQ
What does full_join do in dplyr?
full_join(x, y, by) keeps EVERY row from both x and y, with columns from both. Rows present in only one table get NA in the other side's columns.
What is the difference between full_join and inner_join?
full_join keeps ALL rows from both. inner_join keeps ONLY matched rows. full_join may include many NAs; inner_join has no NAs from missing matches.
When should I use full_join?
For reconciliation tasks (comparing two source systems), or when missing matches are themselves the answer. For most analysis, left_join or inner_join is what you actually want.
How do I find rows in only one table after full_join?
Filter by NA in the other table's columns: filter(is.na(col_from_y)) keeps rows that were in x but not y.
Does full_join preserve row order?
No. The result has implementation-defined ordering. Always arrange() if you need a specific order.