When should I use a bootstrap confidence interval instead of a t-interval?

Use a bootstrap CI when the sampling distribution is not normal, the statistic is non-standard (median, ratio, percentile), or the sample is too small for the central limit theorem to kick in. Bootstrap is also the only practical option for many ML metrics like AUC or F1. Skip it when n is very small (under 20) or data has heavy outliers.

What is the difference between percentile, BCa, and basic bootstrap intervals?

Percentile uses the 2.5th and 97.5th quantiles of bootstrap replicates, simple but biased for skewed statistics. BCa (bias-corrected and accelerated) adjusts for both bias and skew and is the preferred default. Basic bootstrap reflects percentiles around the original estimate. The calculator shows all three so you can compare.

How many bootstrap resamples do I need?

For confidence intervals, 2,000 resamples is the practical minimum and 10,000 is standard. For p-values via bootstrap hypothesis testing, use at least 10,000. The calculator runs 10,000 by default; results stabilize within ±0.5pp of the true bootstrap distribution.

Bootstrap CI Calculator

Bootstrapping resamples your data thousands of times to build a confidence interval for any statistic, even ones with no closed-form formula (medians, ratios, custom functions). Paste your numbers, pick a statistic, and get percentile, basic, or BCa intervals across all three methods.

6 worked examples · raw data only · percentile · basic · BCa · Runs in your browser

Try a real-world example to load.

📝 Scenario

Output

Bootstrap CI

–to–

Recap

R code RUNNABLE

R Reproduce in R

Bootstrap distribution INTERACTIVE

Inference

We resampled your data thousands of times to estimate how much your statistic could plausibly vary across studies.

Read more The bootstrap math, end to end

x* = sample(x, n, replace=TRUE) θ* = stat(x*) repeat B times

Resampling. Draw n observations from the original sample with replacement, compute the statistic on the resample, store it. After B repetitions you have the bootstrap distribution θ*₁, θ*₂, …, θ*ᵀ. The seeded Mulberry32 RNG makes results reproducible across reloads.

CI_perc = (θ*_(α/2 · B), θ*_((1-α/2) · B))

Percentile. Sort the bootstrap distribution; read off the α/2 and 1−α/2 quantiles. Simple, but ignores any bias in the bootstrap distribution.

CI_basic = (2θ̂ − θ*_((1-α/2) · B), 2θ̂ − θ*_(α/2 · B))

Basic (reverse percentile). Reflects the percentile bounds about the original estimate. Equivalent to assuming the bootstrap distribution of θ* − θ̂ mirrors the sampling distribution of θ̂ − θ.

z₀ = Φ⁻¹(#{θ* < θ̂} / B) a = Σ(ḥ − θ_(−i))³ / [6 (Σ(ḥ − θ_(−i))²)^(3/2)] α_lo = Φ(z₀ + (z₀ + z_(α/2)) / (1 − a(z₀ + z_(α/2)))) α_hi = Φ(z₀ + (z₀ + z_(1−α/2)) / (1 − a(z₀ + z_(1−α/2))))

BCa (Efron 1987). z₀ is the bias correction; it shifts the percentiles to account for the bootstrap mean differing from the point estimate. The acceleration a comes from the jackknife: leave one observation out, recompute the statistic, and look at the third-versus-second moment of those n leave-one-out estimates. BCa is second-order accurate and almost always preferable to percentile when the bootstrap distribution is skewed.

bias = mean(θ*) − θ̂ SE = sd(θ*)

Bias and SE. Bias is the mean of the bootstrap distribution minus the original statistic; for an unbiased estimator it should be near zero. The bootstrap SE is the SD of the bootstrap distribution and replaces the parametric standard error in non-Gaussian settings.

Caveats When the bootstrap is the wrong tool

If you have…: Use instead
n < 10 with no model: The bootstrap needs enough information in the sample to mimic the population. With n < 10 the resamples are too repetitive. Try a fully parametric model with a likelihood-based CI.
A statistic depending only on the maximum or minimum: Order-statistic-driven statistics have non-smooth sampling distributions; the bootstrap can be inconsistent. Use the parametric extreme-value asymptotics or subsampling.
Time-series or autocorrelated data: The independent-resample assumption is broken. Use a block bootstrap (moving / circular / stationary) sized to the autocorrelation range.
A regression coefficient: The bootstrap CI is fine, but for clean inference use the model-based CI from lm output interpreter or its glm sibling. For non-Gaussian residuals, residual or wild bootstrap is preferred.
A simple mean of mildly normal data: The parametric t-CI is exact under normality, faster, and gives the same answer to 2 decimals. Use the Confidence Interval Calculator.
Heavy-tailed data with infinite variance (Cauchy-like): The bootstrap will systematically underestimate the spread. Use a truncated-mean estimator or shift to a robust scale (MAD, Qₙ).