broom Bootstrap in R: Tidy Resampled Estimates
A broom bootstrap workflow combines rsample::bootstraps() (or a manual map_dfr() loop) with broom::tidy() so every resample produces a one-row-per-term tibble. Stack those tibbles and you get bootstrap distributions for every coefficient, ready for confidence intervals, density plots, or report tables.
map_dfr(1:B, ~tidy(lm(y ~ x, data = slice_sample(df, prop = 1, replace = TRUE)))) # manual loop bootstraps(df, times = 1000) # rsample resamples map_dfr(boots$splits, ~tidy(lm(y ~ x, data = analysis(.x))), .id = "id") # tidy each fit boot_estimates |> group_by(term) |> summarise(lo = quantile(estimate, 0.025)) # percentile CI boot_estimates |> ggplot(aes(estimate)) + geom_histogram() + facet_wrap(~term) # bootstrap density int_pctl(boot_results, results) # rsample percentile CI
Need explanation? Read on for examples and pitfalls.
What broom does for bootstrap output in one sentence
broom turns each resampled model fit into a tidy tibble that you can row-bind across resamples. A bootstrap routine produces hundreds or thousands of model fits, one per resample. Without tidying, every fit is an S3 list with coefficients, standard errors, fit statistics, and residuals buried in nested slots.
broom::tidy() rectangularises one fit into a per-term data frame. Loop the tidier across all resamples and bind_rows() (or map_dfr()) gives you a long table keyed by term and resample id. That table is the input to every downstream task: percentile intervals, bias estimates, distribution plots, and side-by-side comparisons.
Syntax
The pattern is loop, tidy, stack. You generate bootstrap resamples, fit a model to each, tidy each fit, and combine the tibbles. The looping primitive can be purrr::map_dfr(), replicate() plus do.call(rbind, ...), or rsample::bootstraps() paired with map_dfr().
The three arguments that matter when wiring broom into a bootstrap loop are:
- the resample generator (manual
slice_sample(replace = TRUE),rsample::bootstraps(), ormodelr::bootstrap()) - the model formula (passed once to
lm(),glm(),nls(), etc.) - the tidier (
tidy()for coefficients,glance()for fit stats,augment()for predictions)
set.seed() once before map_dfr() or bootstraps() and you get reproducible yet varied resamples.Common patterns
1. Manual bootstrap with map_dfr and tidy
Each row is one term from one resample. boot_id keeps the resamples separate so you can group later. With B = 500 and three terms (intercept, wt, hp) the result has 1500 rows.
2. Percentile confidence intervals from the bootstrap table
The 2.5% and 97.5% quantiles of the bootstrap distribution are the percentile 95% confidence interval. Compare these limits with the parametric intervals from confint(lm(mpg ~ wt + hp, data = df)) to see how much the normal-theory assumption matters for your data.
3. Bootstrap with rsample::bootstraps()
bootstraps() returns a tibble of rsplit objects. analysis() extracts the resampled data frame from each split; tidy() runs once per fit. The unnested result is the same long table as the manual version, but rsample keeps the split objects available for paired holdout analysis.
4. Bootstrap density plot per term
Faceting by term is only possible because tidy() puts every coefficient on its own row. Heavy tails or visible skew flag terms whose normal-theory standard errors are misleading.
lm(), glm(), or nls(). The same group_by(term) |> summarise() pipeline produces intervals for all of them.Compare with alternatives
Four routes generate bootstrap resamples; broom tidies whichever you pick. The choice between manual, rsample, modelr, and boot is about resample bookkeeping and interval type, not about whether broom plays nicely.
| Approach | Strength | When to use |
|---|---|---|
map_dfr() + slice_sample() + tidy() |
No extra dependency; transparent loop | Quick exploratory bootstrap, teaching |
rsample::bootstraps() + tidy() |
Holds splits as objects; integrates with tidymodels | Production workflows, paired holdout |
modelr::bootstrap() + tidy() |
Compact list-column workflow | Modeling chapters in R for Data Science style |
boot::boot() + boot::boot.ci() |
BCa and studentized intervals out of the box | Inference where percentile intervals are insufficient |
The boot package is the rigorous choice when you need BCa or studentized intervals. Everything else (manual, rsample, modelr) wraps the same resample-then-fit pattern; broom is what makes their output drillable.
Common pitfalls
Bootstrap output that ignores the intercept term. A naive summarise(mean(estimate)) without group_by(term) averages intercepts with slopes and the answer is meaningless. Always group by term before summarising; broom guarantees the column exists for every model class it supports.
Resampling rows of a paired or time-series dataset. Standard bootstrap assumes exchangeable observations. For time series, use rsample::sliding_period() or a block bootstrap; for paired data, resample pairs (not individuals). Otherwise the resamples violate the dependence structure and the intervals shrink.
Treating bootstrap p-values as gospel. Bootstrap percentile intervals are valid; bootstrap p-values from boot_fits$p.value are per-resample test statistics, not corrected inferential p-values. Compute p-values from the bootstrap distribution of the estimate itself, not from the per-resample fits.
broom::bootstrap() function. Older broom releases shipped a bootstrap() helper that worked with dplyr::do(). It was removed when do() was retired. Use rsample::bootstraps() or a manual map_dfr() loop instead; both are demonstrated above.Try it yourself
Try it: Bootstrap 200 linear fits of mpg ~ wt on mtcars and report the 95% percentile confidence interval for the wt coefficient. Save the interval to ex_wt_ci.
Click to reveal solution
Explanation: The loop fits lm(mpg ~ wt) to 200 row-resamples of mtcars, tidies each fit, filters to the wt term, and takes the empirical 2.5% and 97.5% quantiles. Both bounds are negative, matching the well-known negative association between weight and mileage.
Related broom functions
broom::tidy()for per-term coefficient tibbles, the workhorse of every bootstrap loopbroom::glance()for one-row model summaries (r.squared,sigma,AIC) per resamplebroom::augment()for per-observation predictions when bootstrapping prediction intervalsbroom.mixed::tidy()for mixed models inside the resample looprsample::int_pctl()for percentile intervals computed directly from an rsample object
FAQ
Does broom have a bootstrap function?
Older broom releases shipped broom::bootstrap() for use with dplyr::do(), but it was deprecated and removed. Modern workflows replace it with rsample::bootstraps() for the resample generation step and reserve broom for tidying the fits. Reaching for broom::bootstrap() in current code triggers a "could not find function" error; substitute rsample::bootstraps() and the rest of the pipeline is identical.
How many bootstrap resamples should I use with broom::tidy()?
For point estimates and 95% percentile intervals, 1,000 resamples is the conventional minimum and 2,000 to 5,000 is safer for tails. Each resample is cheap because broom only tidies the fit once it converges, so the bottleneck is lm() or glm(), not tidy(). For 99% intervals or BCa intervals from the boot package, push to 10,000 resamples; the tidier handles the extra rows without code changes.
Can I bootstrap glm or nls with the same workflow?
Yes. broom::tidy() dispatches on model class, so swap lm() for glm(), nls(), survival::survreg(), or lme4::lmer() and the surrounding map_dfr() plus summarise() pipeline is unchanged. For mixed models, load broom.mixed first so its tidy.merMod() method is registered. The output columns (term, estimate, std.error, statistic) are stable across classes, which is the entire point of using broom.
Why use rsample::bootstraps() instead of manual resampling?
bootstraps() returns split objects that hold both the analysis (resampled) and assessment (out-of-bag) rows. That separation matters for the .632 bootstrap, oob error estimates, and any nested cross-validation. A manual slice_sample() loop is shorter but throws away the oob rows. If you only need percentile intervals on coefficients, manual is fine; for predictive performance, use rsample.
How do I compute BCa or studentized confidence intervals?
Use the boot package: boot::boot(df, function(d, i) coef(lm(y ~ x, data = d[i, ])), R = 2000) produces a boot object, and boot::boot.ci(out, type = c("bca", "stud")) returns the BCa and studentized intervals. broom does not implement BCa directly because the correction requires the Jacobian of the influence function. Stack boot (for the interval math) with broom (for the per-fit summary) when you need both.
Authoritative reference: the broom tidiers index and the rsample bootstrap article cover every supported class and split type.