dplyr right_join() in R: Keep All Rows From the Right Table
The right_join() function in dplyr merges two tables by key, KEEPING all rows from the RIGHT table and adding matching columns from the left. It is the mirror of left_join(); flipping the inputs gives an equivalent result.
right_join(x, y, by = "id") # keep all rows of y right_join(x, y, by = c("id" = "uid")) # mapped key left_join(y, x, by = "id") # equivalent (flipped) inner_join(x, y, by = "id") # only matched full_join(x, y, by = "id") # all rows from both right_join(x, y, by = join_by(id)) # tidy-style key
Need explanation? Read on for examples and pitfalls.
What right_join() does in one sentence
right_join(x, y, by) returns a data frame with EVERY row of y, plus matching columns from x joined on the by key; rows in y with no match get NA in x's columns. Functionally identical to left_join(y, x, by).
right_join is rarely the cleanest call: most dplyr style guides recommend flipping inputs and using left_join for consistency.
Syntax
right_join(x, y, by = NULL, suffix = c(".x", ".y")). Same arguments as left_join.
left_join(y, x, by = ...) over right_join(x, y, by = ...). The two are equivalent; left_join with the data flipped is more idiomatic and reads more naturally in pipes.Five common patterns
1. Keep all rows of right table
2. Equivalent left_join (preferred style)
3. Multiple keys
4. Differently-named keys
5. Empty left input
right_join(x, y) == left_join(y, x) (column order may differ). They are mathematically equivalent. dplyr keeps right_join for completeness; in practice almost everyone uses left_join with flipped arguments. Pick one convention per project.right_join() vs left_join() vs inner_join() vs full_join()
The four mutating-join family.
| Function | Keeps |
|---|---|
left_join(x, y) |
All rows of x; y matches added |
right_join(x, y) |
All rows of y; x matches added |
inner_join(x, y) |
Only rows in BOTH x and y |
full_join(x, y) |
All rows of x and y; NA for unmatched |
semi_join(x, y) |
Rows of x that have a match in y (no y cols) |
anti_join(x, y) |
Rows of x that have NO match in y |
When to use which:
left_join(with the main table on the left): almost always preferred.right_joinif it makes the data flow clearer in your specific case.inner_joinfor "where both have data".full_joinfor outer joins.
A practical workflow
The "keep right table's structure" pattern is right_join's only natural use case.
Equivalent and more readable:
The general guidance: rewrite right_join as flipped left_join unless the order of arguments matters for readability (rare).
Common pitfalls
Pitfall 1: right_join is unfamiliar to many readers. Code reviewers often misread it. Prefer left_join with flipped inputs for clarity.
Pitfall 2: row count multiplication. Like left_join, right_join multiplies rows when keys are duplicated in the OTHER table. Always check row counts.
right_join and left_join produce results with potentially DIFFERENT column ORDERING. The "keep" side's columns come first. If your downstream code expects a specific column order, this matters.When right_join is genuinely the right call
right_join exists for symmetry, but in practice almost every right_join can be rewritten as a flipped left_join. The one case where right_join reads more naturally is in functional pipelines where the right argument is the "main" table and you don't want to break the chain by reordering. For instance: transactions |> right_join(customers, by = "id") keeps every customer (the right side) and augments with their transactions. Rewriting as customers |> left_join(transactions, by = "id") is cleaner. If your code reviewer doesn't immediately see why right_join was chosen, it probably should be left_join.
Try it yourself
Try it: Augment an orders table with customer info, keeping all rows from orders. Save to ex_aug.
Click to reveal solution
Explanation: Both keep every order row. left_join is more idiomatic; right_join is equivalent but less common in modern code.
Related dplyr functions
After mastering right_join, look at:
left_join(): idiomatic alternativeinner_join(): matched rows onlyfull_join(): all rows from bothsemi_join()/anti_join(): filtering joinsjoin_by(): dplyr 1.1+ key specmerge(x, y, all.y = TRUE): base R equivalent
For tidyverse style consistency, prefer left_join with the main table on the left.
FAQ
What does right_join do in dplyr?
right_join(x, y, by) keeps all rows of y and adds matching columns from x. Rows in y without a match get NA in x's columns.
What is the difference between right_join and left_join?
right_join keeps all rows of y; left_join keeps all rows of x. right_join(x, y) == left_join(y, x) (with possibly different column order).
Should I use right_join or flip and use left_join?
Most style guides recommend flipping and using left_join. left_join is more common, more idiomatic, and easier to read.
How does right_join handle multiple keys?
Same as left_join: pass by = c("col1", "col2") for multi-column keys, or by = c("id" = "uid") for differently-named columns.
Why does my right_join produce duplicates?
Because the LEFT table has duplicate keys. right_join multiplies right rows when left has multiple matches per key. Deduplicate the left table first or check relationship.