purrr map_df() in R: Build a Data Frame From a List
The map_df() function in purrr applies a function to each element of a list or vector and row-binds the per-element results into a single data frame. It is the data-frame-returning variant of map(), and an alias for map_dfr().
map_df(x, ~ data.frame(n = length(.x))) # one row per element map_df(by_grp, summary_fn) # each call returns a row map_df(x, fn, .id = "source") # add a label column map2_df(x, y, fn) # two inputs in parallel pmap_df(list(a, b, c), fn) # many inputs in parallel map_dfc(x, fn) # bind columns, not rows map(x, fn) |> list_rbind() # modern purrr 1.0 form
Need explanation? Read on for examples and pitfalls.
What map_df() does in one sentence
map_df() turns many small results into one data frame. It loops over .x, calls .f on each element, and expects each call to return a data frame (or a named value that can become one row). purrr then stacks those pieces with dplyr::bind_rows(), so the final output is a single tidy data frame. Columns are matched by name across pieces, and any gaps are filled with NA.
Syntax
map_df() takes a list, a function, and an optional id column. The four arguments are:
.x: a list or atomic vector to iterate over. A data frame counts as a list of its columns..f: the function to apply. It must return a data frame, or a named value that coerces to one row....: extra arguments passed unchanged to every call of.f..id: a string. When.xis named, this adds a column holding each element's name.
The .f argument accepts several forms. Here the lambda formula and the native anonymous function are applied to a named list of vectors.
The lambda form is the most common because .x reads cleanly inside short summaries.
.x carries names, .id = "name" captures them as a column, so you never lose track of which row came from which element.Worked examples
These four examples cover the most common map_df() jobs. Each uses a built-in dataset so you can run them directly.
Example 1: summarise groups into rows. Split mtcars by cylinder count, then collapse each group to one summary row.
Example 2: label each row with .id. Add a column showing which group a row came from.
Example 3: summarise every column of a data frame. A data frame is a list of columns, so map_df() iterates columns directly.
Example 4: combine two inputs with map2_df(). When each row needs values from two lists, map2_df() walks them in lockstep.
NA instead of a clean stack. Keep the column names identical across every call of .f.map_df() vs map_dfc() and list_rbind()
map_df() is one of three ways purrr assembles a data frame. The variant you pick depends on direction and on your purrr version.
| Function | Combines by | Use when |
|---|---|---|
map_df() / map_dfr() |
rows (stacked) | each call returns a row or block of rows |
map_dfc() |
columns (side by side) | each call returns a new column |
map() + list_rbind() |
rows (stacked) | you are on purrr 1.0.0 or later |
map_df() and map_dfr() are the same function. map_df() is simply the older alias, so existing code keeps working. map_dfc() is the column-binding twin: it places each result next to the last instead of below it.
map_dfr() and map_dfc() are superseded. The recommended replacement is map() followed by list_rbind() or list_cbind(), which separates iteration from combination and produces clearer errors.Common pitfalls
Most map_df() failures come from the function not returning a data frame. The fix is usually to reshape the per-element result.
Pitfall 1: the function returns a bare vector. map_df() needs each result to be a data frame or a named value.
Fix: wrap the result, ~ data.frame(sq = .x ^ 2).
Pitfall 2: .id is just numbers when the input is unnamed. The .id column reads from the names of .x.
Fix: name the list first, list(a = 1:3, b = 4:6).
Pitfall 3: inconsistent column types across pieces. If one call returns a column as text and another as numeric, the bind fails.
Fix: make .f return a consistent type, or coerce inside the function.
Try it yourself
Try it: Use map_df() to build a one-row-per-species summary of iris: the mean Sepal.Length and the row count for each species. Save it to ex_summary.
Click to reveal solution
Explanation: split() turns iris into a named list of three data frames. map_df() reduces each to a one-row summary and stacks them, while .id = "species" keeps the group label as a column.
Related purrr functions
map_df() sits in purrr's data-frame-output family. Reach for a sibling when row-binding is not what you need:
- map(): returns a plain list when results are not data frames.
- map_dbl(): returns one number per element instead of rows.
- map2(): iterates two vectors in lockstep.
- pmap(): iterates over any number of inputs, including data frame rows.
- reduce(): collapses a list to a single value.
See the broader Functional Programming in R guide for context, and the official purrr map_dfr reference for the superseded-status notes.
FAQ
What is the difference between map_df() and map_dfr() in R?
There is no functional difference. map_df() is an alias for map_dfr(), and both row-bind the per-element results into one data frame. The map_dfr() name was introduced to pair clearly with map_dfc(), where the "c" stands for columns. Older code and tutorials use map_df(), so it stays available, but map_dfr() is the name the documentation now prefers.
Why does map_df() say the argument must be a data frame?
The function you passed returned something that cannot become a row, usually a bare unnamed vector. map_df() row-binds with bind_rows(), which needs each piece to be a data frame or a named atomic vector. Wrap your result in data.frame() or tibble(), or give the vector names, and the error disappears.
Is map_df() deprecated?
Not deprecated, but superseded as of purrr 1.0.0. Superseded means it still works and will not be removed, yet it is no longer the recommended approach. The current advice is to call map() and then list_rbind(), which separates iteration from combination. Existing map_df() code is safe to keep, while new code can prefer the list_rbind() form.
How do I add a column showing which element each row came from?
Pass .id with a column name, for example map_df(x, fn, .id = "source"). The .id column is populated from the names of .x. If your input list is unnamed, .id falls back to the integer position written as text. Name the list elements first when you want meaningful labels rather than 1, 2, 3.