dplyr bind_rows() and bind_cols() in R: Combine Tables
The bind_rows() and bind_cols() functions in dplyr stack data frames vertically (more rows) or horizontally (more columns). Unlike base R rbind() and cbind(), they handle column-name mismatches gracefully and return a tibble.
bind_rows(df1, df2) # stack vertically bind_rows(df1, df2, df3) # multiple data frames bind_rows(list_of_dfs) # from a list bind_rows(df1, df2, .id = "source") # add origin column bind_cols(df1, df2) # stack horizontally (must match nrows) bind_rows(df1, df2) # missing cols filled with NA do.call(bind_rows, list_of_dfs) # equivalent to bind_rows(list)
Need explanation? Read on for examples and pitfalls.
What bind_rows() and bind_cols() do in one sentence
bind_rows() stacks data frames VERTICALLY (more rows); bind_cols() stacks them HORIZONTALLY (more columns). They are pipe-friendly replacements for base R rbind() and cbind(), with smarter handling of mismatched columns and consistent tibble output.
bind_rows() is forgiving: missing columns in one input become NA in the output. bind_cols() is strict: the inputs must have the same number of rows.
Syntax
Both functions accept any number of data frames as separate arguments OR a single list of data frames. .id adds a column tracking which input each row came from.
The full signatures:
bind_rows(..., .id = NULL)
bind_cols(..., .name_repair = c("unique","universal","check_unique","minimal"))
... accepts any number of data frames (or a single list of them). .id (bind_rows only) creates an origin tracking column.
bind_rows(list_of_dfs) accepts a list, no do.call needed. purrr::map(files, read_csv) |> bind_rows() is the standard idiom for combining many files into one data frame.Six common patterns
1. Stack vertically with bind_rows
The two data frames share columns; the result has the union of all rows.
2. Track origin with .id
When inputs are NAMED arguments, .id = "quarter" creates a column with the name of each input.
3. Stack horizontally with bind_cols
bind_cols() requires the same number of rows. It does NOT match by key; it concatenates positionally (row 1 with row 1, etc.).
4. Mismatched columns get NA
b exists in df1 but not df2; c exists in df2 but not df1. The bound result has both columns; missing values are NA.
5. Stack a list of data frames
A common pattern: read N files via lapply()/map(), combine via bind_rows().
6. bind_cols with name conflict handling
When columns conflict, .name_repair = "universal" appends position numbers to make names unique. Default is "unique" which prepends a position prefix.
bind_cols() ONLY when row order is meaningful and you trust the alignment. It zips rows positionally with no matching by key. If you have an id column and want to combine by matching, use left_join() instead. bind_cols() is dangerous when row counts match by coincidence rather than design.bind_rows() / bind_cols() vs base R
Base R rbind() and cbind() behave similarly but are stricter about column-name matches and convert to data.frame (not tibble).
| Task | dplyr | Base R |
|---|---|---|
| Stack rows | bind_rows(df1, df2) |
rbind(df1, df2) |
| Stack rows, mismatched cols | bind_rows() (NA fills missing) |
rbind() errors |
| Stack rows, list input | bind_rows(list_of_dfs) |
do.call(rbind, list_of_dfs) |
| Track origin | bind_rows(.id="src") |
(manual: add column before binding) |
| Stack cols | bind_cols(df1, df2) |
cbind(df1, df2) |
| Output type | tibble | data.frame |
When to use which:
- Use
bind_rows()when columns might differ between inputs. - Use
rbind()when columns are guaranteed identical and you want zero dependencies. - Use
bind_cols()for tibble output;cbind()for base data.frame.
Common pitfalls
Pitfall 1: bind_cols silently aligns by row position, not key. If you have customer_data and order_data both with 1000 rows but different orders, bind_cols() returns garbage. Use left_join() to match by key.
Pitfall 2: type coercion with bind_rows. If column x is integer in df1 and character in df2, the bound result coerces to character (the more permissive type). This silent coercion can change downstream behavior.
bind_cols() requires EXACT row counts. bind_rows() does not require matching column counts. Mixing these up causes confusing errors. Remember: rows = vertical (more rows ok), cols = horizontal (must align rows).Pitfall 3: factor columns may bind weirdly. Two factor columns with different levels get coerced to character or to the union of levels (depending on dplyr version). For predictable results, convert factors to character before binding, or use forcats::fct_unify() first.
Try it yourself
Try it: Combine these two tibbles with bind_rows() and add a column tracking which one each row came from.
Click to reveal solution
Explanation: Naming the inputs (A = storeA, B = storeB) lets .id = "origin" use those names as the origin column values. Without naming, .id would use position numbers ("1", "2") instead.
Related dplyr functions
After mastering bind_rows() and bind_cols(), look at:
left_join(),inner_join(),full_join(): combine by KEY (not position)rows_insert(),rows_update(),rows_upsert(): surgical row updates by key- Base R
rbind(),cbind(): zero-dependency alternatives purrr::list_rbind(),list_cbind(): variants for purrr workflowsdo.call(rbind, list_of_dfs): pre-dplyr idiom, still works
For most data combination tasks, bind_rows() is the right choice. Reach for bind_cols() only when you really mean positional column-stacking (rare).
FAQ
What is the difference between bind_rows and rbind in R?
bind_rows() is from dplyr; rbind() is base R. Both stack data frames vertically. Differences: bind_rows() handles mismatched columns by filling NA (rbind errors), accepts lists directly, and returns a tibble. rbind() is faster on identically-shaped inputs and zero-dependency.
How do I combine multiple data frames in dplyr?
For VERTICAL stacking: bind_rows(df1, df2, df3) or bind_rows(list_of_dfs). For HORIZONTAL stacking: bind_cols(df1, df2) (rows must match). For COMBINING BY KEY: left_join() or other join verbs.
What is the difference between bind_cols and cbind?
bind_cols() is from dplyr; cbind() is base R. Both stack horizontally. bind_cols() returns a tibble and uses .name_repair for column-name conflict handling. cbind() returns a data.frame and may error or recycle on mismatched row counts.
How do I track the source of each row when binding?
Use .id with named inputs: bind_rows(a = df1, b = df2, .id = "source"). The result has a source column with values "a" and "b" indicating where each row came from.
Can I bind data frames with different columns in dplyr?
Yes, with bind_rows(). Missing columns are filled with NA. bind_rows(df1, df2) where df1 has columns (a, b) and df2 has columns (a, c) returns a data frame with columns (a, b, c) and NAs in the missing cells.