dplyr rows_upsert() in R: Insert New OR Update Existing

The rows_upsert() function in dplyr is the SQL-style UPSERT: rows from y replace matching rows in x by key, and rows in y with new keys are appended. It is "insert or update" in one verb.

By Selva Prabhakaran · Published May 12, 2026 · Last updated May 12, 2026

⚡ Quick Answer

rows_upsert(x, y, by = "id")              # insert new + update existing
rows_insert(x, y, by = "id")              # insert only; error on dup
rows_update(x, y, by = "id")              # update only; error on new
rows_patch(x, y, by = "id")               # update only NA cells
bind_rows(x, y) |> distinct(id, .keep_all = TRUE) # rough alternative

Need explanation? Read on for examples and pitfalls.

📊 Is rows_upsert() the right tool?

What rows_upsert() does in one sentence

rows_upsert(x, y, by) returns x with: existing rows whose key matches y replaced by y's values, AND new rows from y (whose keys aren't in x) appended. It is "insert or update".

This is the standard incremental-sync operation: take a fresh batch, merge it with the master, end up with the union of both.

Syntax

rows_upsert(x, y, by = NULL, in_place = FALSE). Combines rows_insert + rows_update.

Run live

Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.

RSync master with batch

library(dplyr) x <- data.frame(id = 1:3, val = c(10, 20, 30)) y <- data.frame(id = c(2, 4), val = c(99, 40)) rows_upsert(x, y, by = "id") #> id val #> 1 1 10 #> 2 2 99 <-- updated #> 3 3 30 #> 4 4 40 <-- inserted

Tip

rows_upsert is the dplyr equivalent of SQL INSERT ... ON CONFLICT (key) DO UPDATE. It is the standard sync verb when y might contain a mix of new and updated records.

Five common patterns

1. Standard upsert

RSync master with batch

master <- data.frame(id = 1:3, status = c("a","b","c")) batch <- data.frame(id = c(2, 4), status = c("X","Z")) rows_upsert(master, batch, by = "id") #> 4 rows: id=1, 2 (updated), 3, 4 (inserted)

2. Multi-column update

RUpdate multiple fields

master <- data.frame(id=1:3, name=c("a","b","c"), score=c(10,20,30)) batch <- data.frame(id=c(1,4), name=c("ALICE","D"), score=c(15, 40)) rows_upsert(master, batch, by = "id")

All non-key columns in batch overwrite (or extend) master's columns.

3. Composite key

RUpsert on multi-column key

master <- data.frame(region=c("NA","EU"), product=c("X","Y"), qty=c(100, 50)) batch <- data.frame(region=c("NA","AS"), product=c("X","Z"), qty=c(99, 30)) rows_upsert(master, batch, by = c("region","product")) #> 3 rows: (NA,X) updated, (EU,Y) unchanged, (AS,Z) inserted

4. Subset of columns updated

RUpdate only some fields

master <- data.frame(id=1:3, name=c("a","b","c"), score=c(10,20,30)) batch_score_only <- data.frame(id=c(1,2), score=c(99, 88)) rows_upsert(master, batch_score_only, by = "id") #> id name score #> 1 1 a 99 #> 2 2 b 88 #> 3 3 c 30

batch_score_only has no name column, so master's name is preserved.

5. Idempotent batch

RRe-running upsert is safe

master_v1 <- rows_upsert(master, batch, by = "id") master_v2 <- rows_upsert(master_v1, batch, by = "id") identical(master_v1, master_v2) #> [1] TRUE

upsert is idempotent: applying the same batch twice gives the same result.

Key Insight

rows_upsert is "merge with replacement". Existing rows are OVERWRITTEN by their match in y; new rows are APPENDED. The semantics match SQL MERGE and Postgres INSERT ON CONFLICT DO UPDATE. Use it for incremental sync where the batch is the source of truth for the keys it contains.

rows_upsert() vs rows_insert() vs rows_update() vs bind_rows()

Four "merge x and y" strategies in dplyr.

Function	Updates?	Inserts?	Errors
`rows_insert`	No	Yes	On duplicate key
`rows_update`	Yes	No	On new key in y
`rows_upsert`	Yes	Yes	Never (in normal usage)
`bind_rows`	No	Appends all	Never (no key)
`rows_patch`	Only NA cells	No	On new key in y

When to use which:

rows_upsert for incremental sync.
rows_insert when duplicates indicate a bug.
rows_update for corrections-only.
bind_rows when keys don't matter.

A practical workflow

The "incremental ETL load" pattern uses rows_upsert.

RDaily ETL upsert

library(dplyr) master <- read_csv("master.csv") daily <- read_csv("today.csv") master_v2 <- master |> rows_upsert(daily, by = "id") write_csv(master_v2, "master.csv")

Each day's load merges with the master. New keys append; existing keys update. No row count surprises; idempotent.

For audit logging, capture diffs first:

RAudit upsert diffs

new_keys <- daily |> anti_join(master, by = "id") |> pull(id) updated_keys <- daily |> semi_join(master, by = "id") |> pull(id) cat("Inserted:", length(new_keys), " Updated:", length(updated_keys), "\n") master_v2 <- rows_upsert(master, daily, by = "id")

Common pitfalls

Pitfall 1: column order matters when types differ. y's columns must match x's types. A type mismatch errors at the call site.

Pitfall 2: NAs in batch overwrite good data. If batch has NA in a column, the corresponding master row gets NA. Use rows_patch if you only want to fill NAs in master.

Warning

rows_upsert is NOT in-place by default. It returns a new data frame. For data.tables with in_place = TRUE, modification can be in place; for tibbles always assign the result.

Idempotency makes upsert pipelines safe

Idempotency means applying the same operation twice produces the same result as applying it once. rows_upsert is idempotent: re-running yesterday's batch on an already-updated master leaves the master unchanged. This property is enormously useful for ETL pipelines that may retry on failure or be re-run after manual fixes. Without idempotency, retries can multiply rows or corrupt data; with it, retries are safe by construction. The same property holds for rows_update and rows_patch, but not for rows_insert (which errors on retry due to duplicate keys) or bind_rows (which appends every time).

Try it yourself

Try it: Apply a batch update + insert to a small master table. Save to ex_synced.

RYour turn: sync master with batch

master <- data.frame(id = 1:3, mpg = c(21, 22, 18)) batch <- data.frame(id = c(2, 4), mpg = c(99, 40)) ex_synced <- master |> # your code here ex_synced #> Expected: 4 rows; id=2 updated, id=4 inserted

Click to reveal solution

RSolution

ex_synced <- master |> rows_upsert(batch, by = "id") ex_synced #> id mpg #> 1 1 21 #> 2 2 99 #> 3 3 18 #> 4 4 40

Explanation: rows_upsert updates id=2 with the batch value and appends id=4 as a new row.

After mastering rows_upsert, look at:

rows_insert(): insert only
rows_update(): update only
rows_patch(): only fill NA cells
rows_delete(): remove by key
rows_append(): bind_rows alias (dplyr 1.1+)
mutate(): modify without key matching

For point-in-time data versioning (preserve history), tools like dplyr::rows_* aren't sufficient: you need a proper SCD-2 pattern with effective dates.

FAQ

What does rows_upsert do in dplyr?

rows_upsert(x, y, by) updates rows of x whose keys match y, AND inserts y's rows whose keys aren't in x. The "insert or update" verb.

What is the difference between rows_upsert and rows_insert?

rows_insert errors if any of y's keys already exist. rows_upsert overwrites the existing rows AND inserts new ones. Use upsert for sync; insert for strict no-duplicates.

Is rows_upsert idempotent?

Yes. Applying the same batch twice produces the same result. Useful for retry-safe pipelines.

Does rows_upsert overwrite NAs?

Yes. If y has NA in a column, the corresponding x row gets NA. Use rows_patch to fill ONLY x's existing NAs without overwriting non-NA values.

Can rows_upsert handle composite keys?

Yes: rows_upsert(x, y, by = c("col1","col2")). Tuple matching applies.

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

dplyr rows_upsert() in R: Insert New OR Update Existing

What rows_upsert() does in one sentence

Syntax

Five common patterns

1. Standard upsert

2. Multi-column update

3. Composite key

4. Subset of columns updated

5. Idempotent batch

rows_upsert() vs rows_insert() vs rows_update() vs bind_rows()

A practical workflow

Common pitfalls

Idempotency makes upsert pipelines safe

Try it yourself

FAQ

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

dplyr rows_upsert() in R: Insert New OR Update Existing

What rows_upsert() does in one sentence

Syntax

Five common patterns

1. Standard upsert

2. Multi-column update

3. Composite key

4. Subset of columns updated

5. Idempotent batch

rows_upsert() vs rows_insert() vs rows_update() vs bind_rows()

A practical workflow

Common pitfalls

Idempotency makes upsert pipelines safe

Try it yourself

Related dplyr functions

FAQ