dplyr rows_patch() in R: Fill NA Cells Without Overwriting
The rows_patch() function in dplyr updates ONLY the NA cells in x with values from y, matched by key. Existing non-NA values in x are preserved, even if y has different values.
rows_patch(x, y, by = "id") # fill NAs in x from y rows_update(x, y, by = "id") # different: overwrites all rows_upsert(x, y, by = "id") # different: insert + overwrite
Need explanation? Read on for examples and pitfalls.
What rows_patch() does in one sentence
rows_patch(x, y, by) returns x with each cell that is NA replaced by the corresponding value from y (when the key matches); non-NA cells in x are PRESERVED. The "patch" semantic: fill in missing data, don't overwrite good data.
This is the safest mutation in the rows_* family. Useful when y is a "best-effort" data source that may have stale or wrong values for some cells.
Syntax
rows_patch(x, y, by = NULL, unmatched = "error", in_place = FALSE). NA cells in x are filled from y; non-NA cells stay.
rows_patch is the dplyr equivalent of "fill NA from a lookup" without overwriting good data. Use when y is supplementary/fallback information rather than authoritative.Five common patterns
1. Fill missing values from y
2. Multi-column patch
3. Patch from a fallback source
4. Compare with rows_update
rows_update overwrites every non-key column; rows_patch only fills NAs.
5. Composite key
rows_patch is the SAFE version of rows_update. rows_update overwrites everything; rows_patch only fills NAs. When merging from a less-trusted source (e.g., a cache or third-party feed), rows_patch protects your authoritative data while still benefiting from the supplementary information.rows_patch() vs rows_update() vs coalesce()
Three ways to fill missing data in dplyr.
| Function | Scope | Best for |
|---|---|---|
rows_patch(x, y, by) |
NA cells only | Key-based merge with existing values preserved |
rows_update(x, y, by) |
All non-key columns | Overwriting authoritative values |
coalesce(x, y) |
Element-wise (no key) | Simple "first non-NA" fill |
tidyr::replace_na() |
Scalar default | "If NA then 0" |
When to use which:
rows_patchfor selective key-based fills.rows_updatewhen y is more authoritative than x.coalescefor simple vector / column-wise fills.replace_nafor scalar defaults.
A practical workflow
The "augment with lookup, preserve manual edits" pattern is rows_patch's sweet spot.
Manual edits stay; missing values are filled from the auto lookup. Re-running is idempotent.
For data-quality auditing of what was patched:
Common pitfalls
Pitfall 1: confusing rows_patch with rows_update. rows_patch only fills NAs. rows_update overwrites all non-key columns. Always pick the right one for your data trust hierarchy.
Pitfall 2: errors on unmatched in y. Default unmatched = "error". Use "ignore" if y may have extra keys.
rows_patch does NOT fill NAs in y itself, only in x. If y has NAs where x has values, x's values are kept. If both have NAs, the result has NA. Patch goes from y -> x for missing values only.Trust hierarchy: when to patch vs update
The choice between rows_patch and rows_update encodes a TRUST hierarchy in your data sources. rows_patch says "x is authoritative; y supplies missing values". rows_update says "y is authoritative; overwrite x where y has a match". Most workflows have one source you trust more than the other. If your master record has been manually edited and the lookup is auto-fetched, patch protects the manual work. If a daily refresh from a system-of-record should always win, update is correct. Articulating the trust hierarchy explicitly (in code or comments) prevents subtle bugs.
Try it yourself
Try it: Fill missing mpg values in a small data frame using a fallback lookup. Save to ex_filled.
Click to reveal solution
Explanation: rows_patch fills the NA mpg values (id 2 and 4) from y, keeps the non-NA values from x intact.
Related dplyr functions
After mastering rows_patch, look at:
rows_insert(): insert new rowsrows_update(): overwrite existingrows_upsert(): insert OR overwriterows_delete(): remove by keycoalesce(): column-wise NA filltidyr::replace_na(): scalar NA fill
For mass NA filling in a single column from a fallback column, coalesce() is more direct than rows_patch.
FAQ
What does rows_patch do in dplyr?
rows_patch(x, y, by) fills NA cells in x using values from y, matched by key. Non-NA values in x are preserved.
What is the difference between rows_patch and rows_update?
rows_update overwrites ALL non-key columns. rows_patch ONLY fills NAs in x. Use patch when x's existing values are authoritative; use update when y is.
Does rows_patch overwrite NAs in y?
No. If x has a non-NA value, it stays. If y has a value where x has NA, x is patched. The direction is y -> x for NAs only.
What if both x and y have NA in the same cell?
The result has NA. rows_patch can only fill what y provides.
Can rows_patch handle composite keys?
Yes: rows_patch(x, y, by = c("col1","col2")). Tuple matching applies.