dplyr inner_join() in R: Keep Only Matched Rows

The inner_join() function in dplyr keeps ONLY rows that have matches in BOTH tables, returning the intersection. Unmatched rows from either side are DROPPED.

⚡ Quick Answer
inner_join(x, y, by = "id")              # only matched rows
inner_join(x, y, by = c("id","date"))    # multi-column key
inner_join(x, y, by = c("id" = "uid"))   # mapped key
inner_join(x, y, by = join_by(id))       # tidy syntax
left_join(x, y, by = "id")                # different: keep all x
semi_join(x, y, by = "id")                # filter only, no y columns

Need explanation? Read on for examples and pitfalls.

📊 Is inner_join() the right tool?
STARTkeep ONLY matched rows from bothinner_join()keep ALL rows from leftleft_join()keep ALL rows from rightright_join()keep ALL rows from BOTHfull_join()filter without adding columnssemi_join() (cleaner)filter to NON-matchesanti_join()combine without keybind_cols()

What inner_join() does in one sentence

inner_join(x, y, by) returns rows where the by key appears in BOTH x and y, with columns from both sides combined. Unmatched rows from either table are dropped.

It is the strictest of the mutating joins: rows survive only if they have a partner. Useful when you only care about complete records.

Syntax

inner_join(x, y, by = NULL, suffix = c(".x", ".y")). Same arguments as left_join.

Run live
Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.
ROnly customers with orders
library(dplyr) customers <- data.frame(id = 1:4, name = c("a","b","c","d")) orders <- data.frame(id = c(1, 1, 3, 5), amount = c(10, 20, 30, 40)) inner_join(customers, orders, by = "id") #> id name amount #> 1 1 a 10 #> 2 1 a 20 #> 3 3 c 30 #> (customer 2, 4 dropped: no orders; order 5 dropped: no customer)

  
Tip
Use inner_join when you only want to analyze records that have data in BOTH tables. It is the cleanest way to express "I need a match" without filtering after a left_join.

Five common patterns

1. Standard inner join

RMatch by id
inner_join(customers, orders, by = "id")

  

Drops customers with no orders AND orders with no customer.

2. Multi-column key

RMatch by composite key
sales <- data.frame(region=c("NA","EU","AS"), product=c("X","Y","X"), qty=c(100,200,50)) catalog <- data.frame(region=c("NA","EU"), product=c("X","Z"), price=c(10, 20)) inner_join(sales, catalog, by = c("region","product")) #> region product qty price #> 1 NA X 100 10 #> (EU/Y dropped: not in catalog; EU/Z and AS/X dropped: not in sales)

  

3. Different column names

RMap keys
inner_join(customers, orders, by = c("id" = "user_id"))

  

4. Self-inner-join

RCustomers who placed multiple orders
inner_join(orders, orders, by = "id", suffix = c("_a","_b")) |> filter(amount_a < amount_b) |> distinct(id)

  

Self-join with filter to find pairs.

5. Chained inner joins

RTriple-table intersection
sales |> inner_join(products, by = "product_id") |> inner_join(regions, by = "region_id")

  

Each step drops unmatched rows from the prior result.

Key Insight
inner_join is the SAFEST join for downstream analysis. No NAs introduced by missing matches. The trade-off: rows you forgot existed (without partners) silently disappear. Always check nrow(result) after an inner_join to make sure the drop count matches your expectation.

inner_join() vs left_join() vs semi_join() vs filter

Four ways to combine "match against y" semantics in dplyr.

Approach Keeps Adds y columns?
inner_join(x, y) x rows that match y Yes
left_join(x, y) All x rows Yes (NA where no match)
semi_join(x, y) x rows that match y No
filter(x %in% y) x rows where col is in y No

When to use which:

  • inner_join for "match AND augment".
  • semi_join for "match WITHOUT augmenting" (filter only).
  • left_join for "augment, keep unmatched as NA".
  • filter(... %in% ...) for simple membership tests.

A practical workflow

The "complete cases only" pattern is inner_join's signature use case.

RComplete cases via chained inner_join
results <- experiments |> inner_join(metadata, by = "experiment_id") |> inner_join(outcomes, by = "experiment_id")

  

Only experiments with both metadata and outcomes survive. NAs are impossible (assuming each table has no internal NAs).

For an outer report (showing what was DROPPED):

RAudit unmatched rows
unmatched <- experiments |> anti_join(metadata, by = "experiment_id")

  

Reports experiments without metadata; useful for data-quality auditing.

Common pitfalls

Pitfall 1: silent row drops. inner_join silently removes rows without a match. Always check nrow(result) against expectations, especially in production pipelines.

Pitfall 2: row multiplication. If both tables have duplicate keys, inner_join multiplies rows (cartesian product per key). Use relationship = "one-to-one" or "many-to-one" to error instead.

Warning
inner_join is NOT the same as left_join + filter(!is.na(y_col)). They look similar but the filter approach is slower and may drop rows that legitimately had NA in y_col before the join.

Try it yourself

Try it: Find cars in mtcars with cyl values present in a lookup table cyl_info. Save to ex_match.

RYour turn: keep only matched cyl values
cyl_info <- data.frame(cyl = c(4, 8), label = c("eco","power")) ex_match <- mtcars |> # your code here nrow(ex_match) #> Expected: 25 (cyl=6 dropped, only 4 and 8 in lookup)

  
Click to reveal solution
RSolution
ex_match <- mtcars |> inner_join(cyl_info, by = "cyl") nrow(ex_match) #> [1] 25 (11 cyl=4 + 14 cyl=8; 7 cyl=6 dropped)

  

Explanation: inner_join drops cars whose cyl is not in cyl_info (cyl=6) and adds the label column.

After mastering inner_join, look at:

  • left_join() / right_join() / full_join(): other mutating joins
  • semi_join(): same row-filtering as inner_join, no extra columns
  • anti_join(): opposite of semi_join
  • join_by(): dplyr 1.1+ key spec; supports inequality
  • intersect(): set intersection on whole rows
  • data.table::merge(): very fast for big data

For very large data, the data.table package's joins are typically much faster.

FAQ

What does inner_join do in dplyr?

inner_join(x, y, by) returns rows where the by key matches in both x and y, with columns from both. Unmatched rows from either side are dropped.

What is the difference between inner_join and left_join?

inner_join keeps only rows that match in BOTH tables. left_join keeps ALL rows of x, putting NA in y's columns where no match. inner may return fewer rows; left always has at least as many as x.

What is the difference between inner_join and semi_join?

inner_join adds y's columns; semi_join does not. Use semi_join when you only need to filter, not augment.

How do I do a many-to-many inner join?

Pass relationship = "many-to-many" to suppress the warning. Result is the cartesian product per key.

Why did my inner_join return fewer rows than expected?

Because some rows in one table had keys that didn't appear in the other. Use anti_join(x, y) to inspect which rows of x were dropped.