dplyr left_join() in R: Keep All Rows From the Left Table

The left_join() function in dplyr merges two tables by key, KEEPING all rows from the left table and adding matching columns from the right. Unmatched rows get NA in the new columns.

By Selva Prabhakaran · Published May 12, 2026 · Last updated May 12, 2026

⚡ Quick Answer

left_join(x, y, by = "id")                 # match by id
left_join(x, y, by = c("id", "date"))      # multi-column key
left_join(x, y, by = c("id" = "user_id"))  # different column names
left_join(x, y, by = join_by(id))          # tidy-style key spec
inner_join(x, y, by = "id")                # only matching rows
right_join(x, y, by = "id")                # all rows from RIGHT

Need explanation? Read on for examples and pitfalls.

📊 Is left_join() the right tool?

What left_join() does in one sentence

left_join(x, y, by) returns a data frame with EVERY row of x, plus matching columns from y joined on the by key; rows in x with no match get NA in y's columns. This is the most common merge in data analysis: "augment my main table with lookup info".

The "left" preserves the left table's row count regardless of matches in y.

Syntax

left_join(x, y, by = NULL, suffix = c(".x", ".y")). by is a character vector or named vector; auto-detected if NULL (matches all common columns, with a message).

Run live

Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.

RAugment customer table with order count

library(dplyr) customers <- data.frame(id = 1:4, name = c("a","b","c","d")) orders <- data.frame(id = c(1, 1, 3, 5), amount = c(10, 20, 30, 40)) left_join(customers, orders, by = "id") #> id name amount #> 1 1 a 10 #> 2 1 a 20 <-- duplicates because customer 1 has 2 orders #> 3 2 b NA <-- no orders #> 4 3 c 30 #> 5 4 d NA

Tip

Always specify by explicitly. Auto-detection works but emits a message and can match unintended columns. by = "id" is unambiguous.

Five common patterns

1. Single-column key

RMatch by id

left_join(customers, orders, by = "id")

2. Multi-column key

RMatch by composite key

sales <- data.frame( region = c("NA","NA","EU"), product = c("X","Y","X"), qty = c(100, 200, 50) ) catalog <- data.frame( region = c("NA","EU"), product = c("X","X"), price = c(10, 12) ) left_join(sales, catalog, by = c("region","product")) #> region product qty price #> 1 NA X 100 10 #> 2 NA Y 200 NA <-- no Y in catalog #> 3 EU X 50 12

3. Differently-named keys

RMap customer.id to order.user_id

customers <- data.frame(id = 1:3, name = c("a","b","c")) orders <- data.frame(user_id = c(1, 1, 2), amount = c(10, 20, 30)) left_join(customers, orders, by = c("id" = "user_id")) #> id name amount #> 1 1 a 10 #> 2 1 a 20 #> 3 2 b 30 #> 4 3 c NA

4. Modern join_by syntax (dplyr 1.1+)

RSame as above, modern syntax

left_join(customers, orders, by = join_by(id == user_id))

join_by() supports inequality and rolling joins (helpful for time-series).

5. Many-to-many warning

RWatch for unintended row multiplication

df_a <- data.frame(id = c(1, 1), x = c("a","b")) df_b <- data.frame(id = c(1, 1), y = c("p","q")) left_join(df_a, df_b, by = "id", relationship = "many-to-many") #> id x y #> 1 1 a p #> 2 1 a q #> 3 1 b p #> 4 1 b q

Each combination of left and right rows for the same id is produced. Specifying relationship = "one-to-one" (or "many-to-one") errors instead, helping catch surprises.

Key Insight

left_join PRESERVES the LEFT table's row count UNLESS keys are duplicated in the right table. If the right table has multiple matching rows per left key, the left rows are DUPLICATED. This is why left_join can grow your data unexpectedly.

left_join() vs inner_join() vs right_join() vs full_join()

The four mutating-join family in dplyr.

Function	Keeps
`left_join(x, y)`	All rows of x; adds y's matches
`inner_join(x, y)`	Only rows in BOTH x and y
`right_join(x, y)`	All rows of y; adds x's matches
`full_join(x, y)`	All rows of x and y; NA for unmatched
`semi_join(x, y)`	Rows of x that have a match in y (no y columns)
`anti_join(x, y)`	Rows of x that have NO match in y

When to use which:

left_join for "augment my main table".
inner_join for "where both have data".
right_join rarely; usually flip and left_join.
full_join for outer joins (rarely needed).
semi_join / anti_join for filter-without-augment.

A practical workflow

The "augment with lookup" pattern is left_join's most common use.

RStar schema augmentation

sales |> left_join(products, by = "product_id") |> left_join(regions, by = "region_id") |> left_join(dates, by = "date")

Successive left_joins enrich the sales fact table with dimension data. Each join keeps all sales rows; missing matches show NA.

For point-in-time joins (e.g., "match each event to the most recent price"), use dplyr::join_by(event_date >= price_date) (1.1+).

Common pitfalls

Pitfall 1: row count multiplication. If the right table has duplicate keys, left_join multiplies left rows. Always check nrow(result) == nrow(x) after a join.

Pitfall 2: column name collision. When both tables have value, the result has value.x and value.y. Use suffix to customize, or rename before joining.

Warning

left_join with auto-by is convenient but RISKY. It joins on ALL common column names, which may include columns you didn't intend. Always specify by explicitly in production code.

Try it yourself

Try it: Augment mtcars with an efficiency label by joining a lookup table mapping cyl to label. Save to ex_aug.

RYour turn: augment mtcars with cyl labels

cyl_labels <- data.frame(cyl = c(4, 6, 8), label = c("eco","balanced","power")) ex_aug <- mtcars |> # your code here head(ex_aug[, c("mpg","cyl","label")]) #> Expected: original + label column

Click to reveal solution

RSolution

ex_aug <- mtcars |> left_join(cyl_labels, by = "cyl") head(ex_aug[, c("mpg","cyl","label")]) #> mpg cyl label #> Mazda RX4 21.0 6 balanced #> Mazda RX4 Wag 21.0 6 balanced #> Datsun 710 22.8 4 eco #> Hornet 4 Drive 21.4 6 balanced #> Hornet Sportabout 18.7 8 power #> Valiant 18.1 6 balanced

Explanation: left_join(cyl_labels, by = "cyl") augments each car with its cyl-group label. All 32 mtcars rows preserved.

After mastering left_join, look at:

inner_join() / right_join() / full_join(): other join types
semi_join() / anti_join(): filtering joins
join_by(): dplyr 1.1+ key spec, supports inequality
bind_cols(): position-based combine (no key)
merge(): base R alternative
data.table::merge(): very fast for large data

For very large tables, the data.table package's joins are often 10x faster than dplyr's.

FAQ

What does left_join do in dplyr?

left_join(x, y, by) keeps every row of x and adds matching columns from y. Rows in x without a match get NA in y's columns.

What is the difference between left_join and inner_join?

left_join keeps ALL rows of x. inner_join keeps only rows that match in BOTH x and y. inner_join may return fewer rows; left_join never drops left rows.

How do I join on multiple columns?

Pass a character vector: by = c("region", "product"). Both must match for rows to merge.

How do I join on differently-named columns?

Use a named vector: by = c("id" = "user_id"). The left side's id matches the right side's user_id.

Why did my left_join return more rows than the original?

Because the right table has duplicate keys. left_join multiplies left rows when right has multiple matches. Use relationship = "one-to-one" to error instead, or deduplicate the right table first.

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

dplyr left_join() in R: Keep All Rows From the Left Table

What left_join() does in one sentence

Syntax

Five common patterns

1. Single-column key

2. Multi-column key

3. Differently-named keys

4. Modern join_by syntax (dplyr 1.1+)

5. Many-to-many warning

left_join() vs inner_join() vs right_join() vs full_join()

A practical workflow

Common pitfalls

Try it yourself

FAQ

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

dplyr left_join() in R: Keep All Rows From the Left Table

What left_join() does in one sentence

Syntax

Five common patterns

1. Single-column key

2. Multi-column key

3. Differently-named keys

4. Modern join_by syntax (dplyr 1.1+)

5. Many-to-many warning

left_join() vs inner_join() vs right_join() vs full_join()

A practical workflow

Common pitfalls

Try it yourself

Related dplyr functions

FAQ