dplyr rows_insert() in R: Add New Rows by Key
The rows_insert() function in dplyr appends rows from y into x, errors if y contains keys already in x. It is the SQL INSERT equivalent: insert new, never overwrite.
rows_insert(x, y, by = "id") # error on duplicate keys rows_insert(x, y, by = "id", conflict = "ignore") # silently skip dups rows_upsert(x, y, by = "id") # insert OR update rows_update(x, y, by = "id") # update only bind_rows(x, y) # appends without key check rows_append(x, y) # append (no key check, dplyr 1.1+)
Need explanation? Read on for examples and pitfalls.
What rows_insert() does in one sentence
rows_insert(x, y, by) returns x with all rows from y appended, IF none of y's keys already exist in x; otherwise it errors (unless conflict = "ignore"). It is "insert new keys only".
This and the rows_* family (rows_update, rows_upsert, rows_delete, rows_patch) implement SQL-style row mutations on data frames.
Syntax
rows_insert(x, y, by = NULL, conflict = "error", in_place = FALSE). by defaults to common columns.
rows_insert ERRORS on duplicate keys by default. Pass conflict = "ignore" to silently skip rows whose keys already exist in x. Use this to enforce the invariant "key is unique".Five common patterns
1. Insert new keys
2. Conflict on existing key
3. Ignore conflicts
4. Multi-column key
5. In-place modification (data.table style)
rows_insert() vs bind_rows() vs rows_upsert()
Three append/update patterns in dplyr.
| Function | Behavior | Best for |
|---|---|---|
rows_insert(x, y, by) |
Error on dup keys, append new | Strict "no duplicates" |
rows_upsert(x, y, by) |
Insert new, update existing | Sync from updated source |
rows_update(x, y, by) |
Update existing, error on new | Patch existing only |
bind_rows(x, y) |
Append blindly, no key check | Quick stack |
rows_append(x, y) |
Append (no key check) | dplyr 1.1+ explicit append |
When to use which:
rows_insertfor adding new records, validating no duplicates.rows_upsertfor incremental sync from a source.rows_updatefor patching existing records.bind_rowsfor ignore-the-key vertical stacking.
A practical workflow
Use rows_insert in incremental load pipelines where duplicate keys are bugs.
If today's batch has any id that already exists in master, the insert errors out: a forced data-quality check.
For "either insert new or update existing":
Common pitfalls
Pitfall 1: forgetting that conflict defaults to "error". A single duplicate key crashes the whole call. Use conflict = "ignore" to silently skip; or pre-filter to ensure uniqueness.
Pitfall 2: column order or types must match. y must have all columns x has (or be a subset). Type mismatches error.
rows_insert and friends are NOT in-place for tibbles by default. They return a new data frame. The in_place = TRUE option only applies to data.tables for speed; for tibbles the result must be assigned.Why the rows_* family exists
Before the rows_ family (added in dplyr 1.0), incremental updates to data frames required hand-rolled patterns: filter the rows to update, modify, bind_rows, watch for duplicates. This is error-prone and verbose. The rows_ family encapsulates the SQL-style operations as named verbs. rows_insert says "append new", rows_update says "patch existing", rows_upsert says "do both", rows_delete says "remove", rows_patch says "fill NA only". Each is one clear semantics, easy to reason about. For analytic workflows that mimic production data pipelines (incremental loads, corrections, deletions), this family is the right toolkit.
Try it yourself
Try it: Append 3 new car rows to a small mtcars_top (first 3 rows of mtcars). Verify the result has 6 rows. Save to ex_added.
Click to reveal solution
Explanation: rows_insert appends the 3 Tesla rows because their car names don't conflict with the existing 3.
Related dplyr functions
After mastering rows_insert, look at:
rows_update(): update only existing rowsrows_upsert(): insert or updaterows_delete(): remove rows by keyrows_patch(): update only NA valuesbind_rows(): append without key checkrows_append(): dplyr 1.1+; explicit append
For SQL-style upsert workflows on data frames, the rows_* family covers the standard CRUD operations.
FAQ
What does rows_insert do in dplyr?
rows_insert(x, y, by) appends rows from y to x; errors if any of y's keys already exist in x. The "insert new only" semantics.
What is the difference between rows_insert and bind_rows?
bind_rows ignores keys, just stacks vertically. rows_insert validates that y's keys don't collide with x's. Use rows_insert when key uniqueness matters.
What is the conflict argument in rows_insert?
conflict = "error" (default) errors on duplicate keys. conflict = "ignore" silently skips conflicting rows. No replace option (use rows_upsert for that).
Can rows_insert handle multi-column keys?
Yes: rows_insert(x, y, by = c("col1","col2")). The composite is the key; conflicts are detected by tuple matching.
Does rows_insert modify x in place?
For tibbles, no: it returns a new data frame. For data.tables with in_place = TRUE, modification can be in place (rare in dplyr workflows).