purrr map() for Data Wrangling: Read Files, Transform Lists & Fit Models
purrr's map() family shines in data wrangling: reading multiple CSV files at once, extracting stats from nested data, fitting models per group, and safely handling messy inputs. These are the patterns you'll use daily.
library(purrr)
library(dplyr)
# Nested data: each row has a vector of scores
students <- tibble(
name = c("Alice", "Bob", "Carol"),
scores = list(c(88, 92, 79), c(76, 81, 85), c(92, 95, 88))
)
students |>
mutate(
avg = map_dbl(scores, mean),
best = map_dbl(scores, max),
n = map_int(scores, length)
)
Pattern 3: Split → Map → Combine (Model Fitting)
library(purrr)
library(dplyr)
# Fit a model per cylinder group
models <- mtcars |>
split(mtcars$cyl) |>
map(\(df) lm(mpg ~ wt + hp, data = df))
# Extract R-squared per group
r_sq <- map_dbl(models, \(m) summary(m)$r.squared)
cat("R² per cyl group:\n")
print(round(r_sq, 3))
Use map() when each iteration is independent and returns a value. Use a for loop when iterations depend on previous results or need break/next control flow.
How does map() compare to lapply()?
They do the same thing. map() adds typed variants (map_dbl, map_chr), formula shorthand (~ .x + 1), and element extraction by name (map(list, "field")). Use lapply() for zero dependencies, map() for everything else.