dplyr union() in R: Combine Rows From X and Y, Deduplicated

The union() function in dplyr stacks rows from x and y vertically, REMOVING DUPLICATES. It is the SQL UNION equivalent. Both inputs must have the same columns.

⚡ Quick Answer
union(x, y)                          # rows in x OR y (deduplicated)
union_all(x, y)                       # union without dedup (= bind_rows)
intersect(x, y)                       # rows in BOTH
setdiff(x, y)                         # rows in x not in y
bind_rows(x, y)                       # stack vertically (no dedup)

Need explanation? Read on for examples and pitfalls.

📊 Is union() the right tool?
STARTcombine x + y rows, dedupunion()combine x + y rows, keep allunion_all() or bind_rows()only common rowsintersect()only unique to xsetdiff()combine with column-name flexibilitybind_rows()

What union() does in one sentence

union(x, y) stacks rows from both x and y and removes duplicates based on whole-row equality. Both inputs must have the same columns and types.

It is the data-frame version of mathematical set union. Use it when you want a deduplicated combined table.

Syntax

union(x, y). Both must have identical columns (same names and order).

Run live
Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.
RCombine two tables, dedup
library(dplyr) library(purrr) library(tibble) x <- data.frame(id = c(1, 2, 3)) y <- data.frame(id = c(2, 3, 4)) union(x, y) #> id #> 1 1 #> 2 2 #> 3 3 #> 4 4

  
Tip
Use union_all (or bind_rows) if you need to keep duplicates. union always deduplicates, which can hide important counts.

Five common patterns

1. Combine and dedup

Rx | y, no duplicates
x <- data.frame(id = c(1, 2, 3)) y <- data.frame(id = c(2, 3, 4)) union(x, y)

  

2. Multi-column dedup

RWhole-row equality
x <- data.frame(region=c("NA","EU"), product=c("X","Y")) y <- data.frame(region=c("NA","AS"), product=c("X","Z")) union(x, y) #> region product #> 1 NA X #> 2 EU Y #> 3 AS Z #> ((NA,X) appears in both; deduped to one)

  

3. union_all (no dedup)

RLike bind_rows
union_all(x, y) #> region product #> 1 NA X #> 2 EU Y #> 3 NA X <-- duplicate kept #> 4 AS Z

  

union_all and bind_rows produce the same result when columns align.

4. Combine three tables

Runion is variadic via reduce
x <- data.frame(id = 1:2) y <- data.frame(id = 2:3) z <- data.frame(id = 3:4) purrr::reduce(list(x, y, z), union) #> id #> 1 1 #> 2 2 #> 3 3 #> 4 4

  

union takes 2 args; chain with reduce for many.

5. Compare with intersect and setdiff

RThe set-theoretic trio
x <- data.frame(id = 1:4) y <- data.frame(id = c(3, 4, 5)) union(x, y)$id #> 1 2 3 4 5 (all unique) intersect(x, y)$id #> 3 4 (in both) setdiff(x, y)$id #> 1 2 (in x not y)

  
Key Insight
dplyr's union, intersect, setdiff operate on WHOLE ROWS, not on keys. Two rows are "equal" only if every column matches. For key-based set operations, use semi_join (intersect) or anti_join (setdiff). For non-deduplicating combination, use bind_rows.

union() vs union_all() vs bind_rows() vs base union

Four ways to combine rows in R.

Function Dedup Same columns required
dplyr::union(x, y) Yes Yes
dplyr::union_all(x, y) No Yes
dplyr::bind_rows(x, y) No No (fills missing with NA)
base::union(x, y) Yes Vector input, not df
rbind(x, y) No Yes (errors if mismatch)

When to use which:

  • union for set-style combination (dedup).
  • union_all for "stack but enforce same columns".
  • bind_rows for flexible combination (different columns ok).
  • base::union for vector inputs.

A practical workflow

Use union for "merge two sources of records, drop duplicates" workflows.

RUnion two sources, dedup
batch_a <- read_csv("source_a.csv") batch_b <- read_csv("source_b.csv") combined <- union(batch_a, batch_b)

  

Both sources contribute rows; identical rows appear once. For "any row in either" without caring about duplication, union_all is faster.

For multi-source combination:

RUnion of multiple sources via reduce
sources <- list(read_csv("a.csv"), read_csv("b.csv"), read_csv("c.csv")) combined <- purrr::reduce(sources, union)

  

Common pitfalls

Pitfall 1: column mismatch errors. union(x, y) requires identical column names AND order. Use bind_rows or select to align first.

Pitfall 2: dedup hides count information. union(x, x) returns x (deduplicated), not 2x. If row counts matter, use union_all or bind_rows.

Warning
dplyr::union masks base::union. After library(dplyr), plain union(...) calls dplyr's data-frame version. For vector inputs, use base::union(...) explicitly.

When union beats bind_rows + distinct

The two equivalent ways to "combine and dedup" are union(x, y) and bind_rows(x, y) |> distinct(). They produce the same result when x and y have identical columns. union is more concise and signals intent (set semantics). The bind_rows + distinct chain is more flexible (allows different columns, fills with NA) but verbose. For pipelines where you control both inputs and want strict structure validation, union is preferred. For combining heterogeneous sources where columns may differ, bind_rows is more forgiving.

Try it yourself

Try it: Combine two halves of mtcars (rows 1-16 and 15-32, with overlap) and dedup. Save to ex_full.

RYour turn: union two overlapping halves
half_a <- mtcars[1:16, ] |> tibble::rownames_to_column("car") half_b <- mtcars[15:32, ] |> tibble::rownames_to_column("car") ex_full <- # your code here nrow(ex_full) #> Expected: 32 (overlapping rows deduplicated)

  
Click to reveal solution
RSolution
ex_full <- union(half_a, half_b) nrow(ex_full) #> [1] 32

  

Explanation: Rows 15 and 16 appear in both halves. union deduplicates them, returning 32 unique rows.

After mastering union, look at:

  • dplyr::intersect(): rows in both
  • dplyr::setdiff(): rows in x not in y
  • dplyr::union_all(): union without dedup
  • dplyr::bind_rows(): flexible combination
  • base::rbind(): strict same-shape rbind
  • base::union() / base::intersect() / base::setdiff(): vector versions

For combining tables with different columns, bind_rows() is more flexible than union.

FAQ

What does union do in dplyr?

union(x, y) combines rows from x and y, removing duplicates based on whole-row equality. Both inputs must have the same columns.

What is the difference between union and union_all in dplyr?

union deduplicates; union_all keeps all rows (equivalent to bind_rows when columns align).

What is the difference between union and bind_rows?

union dedupes and requires identical columns. bind_rows keeps duplicates and fills missing columns with NA. bind_rows is more flexible; union enforces structure.

Does dplyr::union mask base::union?

Yes. After loading dplyr, plain union calls the data-frame version. For vector union, use base::union(...).

Can I union three or more tables?

Yes, via reduce: purrr::reduce(list(x, y, z), union) or chained: x |> union(y) |> union(z).