dplyr rows_delete() in R: Remove Rows by Key
The rows_delete() function in dplyr removes rows from x whose keys appear in y. It is the SQL DELETE equivalent: precise key-based removal in a dplyr-friendly verb.
rows_delete(x, y, by = "id") # remove rows matching y's keys rows_delete(x, y, by = "id", unmatched = "ignore") # tolerate y keys not in x filter(x, !id %in% y$id) # base alternative for one-col key anti_join(x, y, by = "id") # equivalent (idiomatic alternative)
Need explanation? Read on for examples and pitfalls.
What rows_delete() does in one sentence
rows_delete(x, y, by) returns x with the rows whose by key appears in y removed; errors if y has keys not present in x (use unmatched = "ignore" to tolerate). Equivalent to anti_join(x, y, by) but expressed as a row-mutation operation.
Part of the rows_* family (rows_insert, rows_update, rows_upsert, rows_delete, rows_patch).
Syntax
rows_delete(x, y, by = NULL, unmatched = "error", in_place = FALSE). unmatched = "ignore" skips y rows whose keys aren't in x.
rows_delete and anti_join produce the same result for one-table-key deletion. Pick the one that fits your pipeline's style. rows_delete emphasizes the mutation semantics; anti_join emphasizes the filtering semantics.Five common patterns
1. Delete by single key
2. Error on unmatched (default)
3. Ignore unmatched
4. Composite key
5. Equivalent anti_join
rows_delete and anti_join are functionally equivalent.* Both keep x rows whose key is NOT in y. Choose based on framing: rows_delete reads as "I am removing"; anti_join reads as "I am filtering". Within the rows_ family, rows_delete fits the SQL CRUD vocabulary.rows_delete() vs anti_join() vs filter()
Three ways to remove rows by key.
| Approach | Multi-key | Errors on bad key | Best for |
|---|---|---|---|
rows_delete(x, y, by) |
Yes | Default yes (validates) | SQL-style mutation |
anti_join(x, y, by) |
Yes | No | Filtering pipelines |
filter(x, !col %in% y$col) |
One col only | No | Quick scripts |
When to use which:
rows_deletefor SQL-style row mutations + validation.anti_joinfor tidy filtering.filterfor one-column negation.
A practical workflow
Use rows_delete in audit-validated deletion pipelines.
If deletions.csv has stray IDs, the call errors. This catches bugs where the deletion list doesn't match the master.
For tolerance-needed cases (e.g., re-running a deletion that has already partially applied):
Common pitfalls
Pitfall 1: errors on unmatched in y by default. A single stray key in y crashes the call. Use unmatched = "ignore" for resilience.
Pitfall 2: rows_delete returns a NEW data frame. It doesn't modify x in place (unless in_place = TRUE for data.tables). Always assign the result.
rows_delete does NOT count duplicates. A single match in y is enough to remove an x row. If x has duplicates with the same key, ALL are deleted. For exact-row deletion, use setdiff() (full-row matching).Why validate-then-delete matters
The default unmatched = "error" for rows_delete is a feature, not an annoyance. It catches bugs where your deletion list references IDs that don't exist in the master, which usually means the deletion list is from a stale source or has typos. Erroring out forces you to investigate before silently corrupting your data. For repeatable / retry-able pipelines, switch to unmatched = "ignore" once you've confirmed the deletion list is clean. This pattern (validate first, then ignore for production retries) is common in ETL pipelines.
Try it yourself
Try it: Remove cars from a small mtcars_top (first 5 rows of mtcars) whose names appear in a to_drop list. Save to ex_kept.
Click to reveal solution
Explanation: rows_delete removes the 2 cars whose names match to_drop, leaving 3.
Related dplyr functions
After mastering rows_delete, look at:
anti_join(): filter-style alternativerows_insert()/rows_update()/rows_upsert()/rows_patch(): companion CRUD verbsfilter(): arbitrary conditiondistinct(): drop duplicatessetdiff(): full-row set difference
For "delete by complex condition" (not just key), use filter() directly.
FAQ
What does rows_delete do in dplyr?
rows_delete(x, y, by) removes rows from x whose by key appears in y. Errors by default if y has keys not in x.
What is the difference between rows_delete and anti_join?
Functionally equivalent. rows_delete is part of the rows_* SQL-style family; anti_join is part of the join family. Choose based on framing.
What is the unmatched argument?
unmatched = "error" (default) errors if y has keys not in x. unmatched = "ignore" silently skips. Use ignore when y may legitimately have extra IDs.
Does rows_delete handle composite keys?
Yes: rows_delete(x, y, by = c("col1","col2")). Tuple matching applies.
How do I delete by condition instead of key?
Use filter(): filter(x, !condition). rows_delete is for KEY-based deletion only.