dplyr rows_update() in R: Update Existing Rows by Key
The rows_update() function in dplyr modifies columns of EXISTING rows in x using values from y, matched by key. Errors if y has keys not present in x (use rows_upsert for insert-or-update semantics).
rows_update(x, y, by = "id") # update existing; error on missing rows_update(x, y, by = "id", unmatched = "ignore") # ignore unmatched in y rows_upsert(x, y, by = "id") # insert OR update rows_insert(x, y, by = "id") # insert only rows_patch(x, y, by = "id") # only update NAs in x
Need explanation? Read on for examples and pitfalls.
What rows_update() does in one sentence
rows_update(x, y, by) returns x with each row whose key appears in y replaced by y's values for the matching columns; errors if y has keys not in x. It is "UPDATE existing only".
Part of the rows_* family (rows_insert, rows_update, rows_upsert, rows_delete, rows_patch) for SQL-style row mutations.
Syntax
rows_update(x, y, by = NULL, unmatched = "error", in_place = FALSE). unmatched = "ignore" skips y's rows whose keys aren't in x.
rows_update when y is a list of CHANGES and every key in y must exist in x. For mixed insert+update, use rows_upsert instead.Five common patterns
1. Update existing rows
2. Error on unmatched in y
3. Ignore unmatched
4. Multi-column update
All non-key columns in y overwrite the corresponding columns in x.
5. Composite key
bind_rows + filter + summarise for incremental updates.rows_update() vs rows_upsert() vs rows_patch()
Three update patterns in dplyr's rows_* family.
| Function | Inserts new? | Updates existing? | Touches NA only? |
|---|---|---|---|
rows_insert(x, y) |
Yes | No | n/a |
rows_update(x, y) |
No | Yes | No (overwrites all) |
rows_upsert(x, y) |
Yes | Yes | No (overwrites all) |
rows_patch(x, y) |
No | Only NA cells | Yes |
rows_delete(x, y) |
No (removes) | n/a | n/a |
When to use which:
rows_updateto apply CORRECTIONS to existing records.rows_upsertto SYNC from an authoritative source.rows_patchto FILL IN missing values without overwriting good ones.
A practical workflow
Use rows_update for "apply corrections" workflows.
The error-on-unmatched behaviour catches bugs where corrections.csv has stray IDs not in master.
For incremental sync (add new + update existing):
Common pitfalls
Pitfall 1: errors on unmatched in y. Default unmatched = "error" errors if y has keys not in x. Use "ignore" to silently skip.
Pitfall 2: column order or types must match. y's update columns must exist in x with compatible types. Mismatch errors at the call site.
rows_update overwrites ALL non-key columns in y, even if the value is NA. If y has NA in a column, the corresponding x value is set to NA. Use rows_patch if you only want to update NA values in x.Why rows_update beats hand-rolled filter+mutate
Before rows_update existed, applying corrections required filter to the matching rows, modifying their columns, then binding back. This is verbose, easy to misorder, and silently corrupts data if you mistakenly join on the wrong key. rows_update encapsulates the safe pattern: validate keys, replace columns, return a clean result. The error-on-unmatched default catches a common bug class (corrections referencing non-existent IDs). For data-quality-sensitive pipelines, this safety check pays for itself within the first few runs.
Try it yourself
Try it: Update car names in a small mtcars_top (first 3 rows) with new SEO-friendly labels. Save to ex_updated.
Click to reveal solution
Explanation: rows_update overwrites mpg for matching car names (Mazda RX4 and Datsun 710); Mazda RX4 Wag keeps its original mpg.
Related dplyr functions
After mastering rows_update, look at:
rows_insert(): insert onlyrows_upsert(): insert OR updaterows_patch(): only fill NA cellsrows_delete(): remove by keymutate(): change values without key matchingcase_when(): conditional update inside mutate
For "selective update by condition" (not key-based), mutate(col = case_when(condition ~ new_value, TRUE ~ col)) is the right pattern.
FAQ
What does rows_update do in dplyr?
rows_update(x, y, by) updates rows in x whose key matches a row in y, replacing non-key columns with y's values. Errors if y has keys not in x.
What is the difference between rows_update and rows_upsert?
rows_update only updates existing rows (errors on new keys in y). rows_upsert updates existing AND inserts new. Use upsert for sync; update for corrections.
What is the unmatched argument?
unmatched = "error" (default) errors if y has keys not in x. unmatched = "ignore" silently skips them. Use ignore when y may legitimately have extra keys.
Does rows_update overwrite NAs?
Yes. If y has NA in a column, the corresponding x value is set to NA. Use rows_patch to update ONLY x's existing NA cells.
Can rows_update handle composite keys?
Yes: rows_update(x, y, by = c("col1","col2")). Tuple matching applies.