dplyr anti_join() in R: Keep Rows Without Match in Right

The anti_join() function in dplyr keeps rows from the LEFT table that DO NOT have a match in the right table. It is the opposite of semi_join() and the standard tool for "find rows missing from the master list".

By Selva Prabhakaran · Published May 12, 2026 · Last updated May 12, 2026

⚡ Quick Answer

anti_join(x, y, by = "id")             # x rows NOT in y
anti_join(x, y, by = c("id","date"))   # multi-key
semi_join(x, y, by = "id")             # opposite: x rows IN y
filter(x, !(id %in% y$id))             # equivalent for one-col key
anti_join(orders, customers, by = "id") # orphan orders
anti_join(actual, expected, by = "id")  # missing from expected

Need explanation? Read on for examples and pitfalls.

📊 Is anti_join() the right tool?

What anti_join() does in one sentence

anti_join(x, y, by) returns rows of x whose by key DOES NOT appear in y, with NO columns added from y. It is a filter: same column count as x, fewer (or equal) rows.

The cleanest tool for "what's in A but not in B?" questions, especially with multi-column keys.

Syntax

anti_join(x, y, by = NULL). Same key arguments as semi_join.

Run live

Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.

RCustomers without orders

library(dplyr) library(tibble) customers <- data.frame(id = 1:4, name = c("a","b","c","d")) orders <- data.frame(id = c(1, 1, 3, 5), amount = c(10, 20, 30, 40)) anti_join(customers, orders, by = "id") #> id name #> 1 2 b #> 2 4 d #> (customers 1 and 3 dropped: they have orders)

Tip

anti_join is the dplyr idiom for data-quality audits: "what records exist in source A but not in source B?". Cleaner than the filter(... !%in% ...) workaround, especially with multi-column keys.

Five common patterns

1. Find missing customers

RCustomers without any order

anti_join(customers, orders, by = "id")

2. Multi-column anti-join

RSales of products NOT in catalog

sales <- data.frame(region=c("NA","EU","AS"), product=c("X","Y","Z")) catalog <- data.frame(region=c("NA","EU"), product=c("X","Y")) anti_join(sales, catalog, by = c("region","product")) #> region product #> 1 AS Z #> (only AS/Z is missing from catalog)

For multi-column negation, filter(... %in% ...) is awkward; anti_join is clean.

3. Find orphan records

ROrders without a known customer

anti_join(orders, customers, by = "id") #> id amount #> 1 5 40 #> (order 5's customer doesn't exist)

4. New rows since last snapshot

RCurrent rows not in previous snapshot

prev_snapshot <- data.frame(id = 1:5) curr_data <- data.frame(id = 1:8, value = 10:17) new_rows <- anti_join(curr_data, prev_snapshot, by = "id") new_rows #> id value #> 1 6 15 #> 2 7 16 #> 3 8 17

A diff pattern: rows in current but not previous.

5. Pair with semi_join for partition

RSplit into matched and unmatched

matched <- semi_join(customers, orders, by = "id") unmatched <- anti_join(customers, orders, by = "id") nrow(matched) + nrow(unmatched) == nrow(customers) #> [1] TRUE

semi + anti partitions x into two non-overlapping groups.

Key Insight

anti_join is the ONLY clean way to express "in left but not in right" for multi-column keys. With single columns you could use filter(x, !id %in% y$id); with two-column keys, the equivalent filter is awkward and error-prone. anti_join is unambiguous.

anti_join() vs semi_join() vs filter() vs setdiff()

Four ways to express "exclude" in R.

Approach	Best for
`anti_join(x, y)`	Multi-key membership negation
`semi_join(x, y)`	Opposite (positive membership)
`filter(x, !(col %in% y$col))`	Single-column negation in a quick script
`setdiff(x, y)`	Whole-row difference (all columns must match)

When to use which:

anti_join for key-based exclusion.
setdiff for whole-row set difference.
filter for single-column scripts.

A practical workflow

Use anti_join in data audits to find what's expected vs what's there.

RBidirectional audit

# Audit: which IDs are in actual but not expected? unexpected <- actual |> anti_join(expected, by = "id") # Which IDs are expected but missing from actual? missing <- expected |> anti_join(actual, by = "id")

A pair of anti_joins gives a complete diff between two sources.

For incremental loads:

RFind new rows for incremental load

existing <- read_csv("history.csv") new_batch <- read_csv("today.csv") new_rows <- new_batch |> anti_join(existing, by = "id")

Only insert rows that don't already exist.

Common pitfalls

Pitfall 1: forgetting which side has the negation. anti_join(x, y) returns rows from X NOT in Y. Reversing the args inverts the meaning.

Pitfall 2: NA in the join column. NA == NA is NA, so rows with NA in the key may NOT match (and thus appear in anti_join output). Filter NAs first if surprising.

Warning

anti_join does NOT count duplicates in y. A single match in y is enough to exclude an x row, regardless of how many duplicates y has. Same semantics as semi_join (membership only).

Why anti_join beats filter(!%in%)

For multi-column keys, anti_join is the only clean expression of "exclude these tuples". The filter equivalent filter(!(col1 %in% y$col1) & !(col2 %in% y$col2)) is wrong: it tests each column independently, not as a tuple. The correct filter form requires building a composite key string or using paste0 tricks, both of which are clunky. anti_join handles tuple membership natively and clearly. Single-column negation can use filter(!col %in% y$col), which is fine for quick scripts; for production code, anti_join is more robust against accidental key changes.

Try it yourself

Try it: Find all mtcars cars whose name is NOT in a featured lookup. Save to ex_unfeatured.

RYour turn: find non-featured cars

featured <- data.frame(car = c("Mazda RX4", "Honda Civic")) mtcars_named <- mtcars |> tibble::rownames_to_column("car") ex_unfeatured <- mtcars_named |> # your code here nrow(ex_unfeatured) #> Expected: 30 (32 cars - 2 featured)

Click to reveal solution

RSolution

ex_unfeatured <- mtcars_named |> anti_join(featured, by = "car") nrow(ex_unfeatured) #> [1] 30 # Equivalent (single-column): ex_alt <- mtcars_named |> filter(!(car %in% featured$car))

Explanation: anti_join keeps cars whose name is NOT in featured. 30 of 32 mtcars survive.

After mastering anti_join, look at:

semi_join(): opposite filtering join
inner_join() / left_join(): mutating joins
setdiff(): whole-row set difference (requires same columns)
filter(): single-column negation
match(), %in%: base R membership tests

For "rows that exist in both AND match value-for-value", intersect() works on whole-row equality.

FAQ

What does anti_join do in dplyr?

anti_join(x, y, by) keeps rows of x whose by key does NOT appear in y. Used to find records missing from a master list or new since a previous snapshot.

What is the difference between anti_join and semi_join?

They are opposites. semi_join keeps x rows that DO match in y. anti_join keeps x rows that DO NOT match in y. Together they partition x.

How does anti_join differ from setdiff?

anti_join uses a key (subset of columns); setdiff requires whole-row equality. Use anti_join for key-based exclusion; setdiff for full-row difference.

Can anti_join handle multi-column keys?

Yes. anti_join(x, y, by = c("region","product")) excludes x rows whose (region, product) tuple appears in y.

Why does anti_join handle NA strangely?

Because NA == NA is NA, not TRUE. NA values in the join key may not match, so rows with NA appear in anti_join output. Filter NAs first if needed.

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

dplyr anti_join() in R: Keep Rows Without Match in Right

What anti_join() does in one sentence

Syntax

Five common patterns

1. Find missing customers

2. Multi-column anti-join

3. Find orphan records

4. New rows since last snapshot

5. Pair with semi_join for partition

anti_join() vs semi_join() vs filter() vs setdiff()

A practical workflow

Common pitfalls

Why anti_join beats filter(!%in%)

Try it yourself

FAQ

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

dplyr anti_join() in R: Keep Rows Without Match in Right

What anti_join() does in one sentence

Syntax

Five common patterns

1. Find missing customers

2. Multi-column anti-join

3. Find orphan records

4. New rows since last snapshot

5. Pair with semi_join for partition

anti_join() vs semi_join() vs filter() vs setdiff()

A practical workflow

Common pitfalls

Why anti_join beats filter(!%in%)

Try it yourself

Related dplyr functions

FAQ