Asymptotic Theory in R: Consistency, Asymptotic Normality & Delta Method

Asymptotic theory studies how estimators behave as the sample size grows. Three results do almost all the work in applied statistics: consistency (the estimator converges to the truth), asymptotic normality (it does so on a Gaussian scale after multiplying by $\sqrt{n}$), and the delta method (smooth transformations of asymptotically normal estimators are themselves asymptotically normal). This tutorial shows what each of them means by simulating all three in R.

What does it mean for an estimator to have good asymptotic behavior?

Large-sample guarantees let you trust a formula that would be hopeless to verify on one small sample. Instead of asking how good is my estimator at this sample size, asymptotic theory asks what happens as n grows without bound. We'll start with the most visible guarantee, consistency, and watch it in action by drawing samples of increasing size from a Normal distribution with known mean.

RSample mean converges to the truth as n grows
set.seed(99) mu_true <- 5 sigma_true <- 2 ns <- c(10, 100, 1000, 10000, 100000) means_tbl <- data.frame( n = ns, sample_mean = sapply(ns, function(n) mean(rnorm(n, mu_true, sigma_true))), abs_error = NA ) means_tbl$abs_error <- round(abs(means_tbl$sample_mean - mu_true), 4) means_tbl$sample_mean <- round(means_tbl$sample_mean, 4) means_tbl #> n sample_mean abs_error #> 1 10 4.4812 0.5188 #> 2 100 4.9361 0.0639 #> 3 1000 5.0103 0.0103 #> 4 10000 4.9888 0.0112 #> 5 100000 5.0014 0.0014

  

The absolute error shrinks by roughly a factor of 10 for every 100-fold increase in sample size, which is exactly the $1/\sqrt{n}$ rate we expect. At $n = 100{,}000$ the sample mean is off by less than 0.002 from the truth. That is consistency in one picture: the estimator homes in on the real parameter as more data arrives.

Formally, an estimator $\hat\theta_n$ is consistent for $\theta$ when it converges in probability:

$$\hat\theta_n \xrightarrow{\;p\;} \theta \quad \Longleftrightarrow \quad \lim_{n \to \infty} P(|\hat\theta_n - \theta| > \varepsilon) = 0 \text{ for every } \varepsilon > 0.$$

Where:

  • $\hat\theta_n$ = the estimator computed from a sample of size $n$
  • $\theta$ = the true parameter
  • $\xrightarrow{p}$ = convergence in probability
  • $\varepsilon$ = any small tolerance you pick

There are three properties asymptotic theory cares about:

  1. Consistency, the estimator converges to the truth.
  2. Asymptotic normality, rescaled errors $\sqrt{n}(\hat\theta_n - \theta)$ converge to a normal distribution.
  3. Asymptotic efficiency, the asymptotic variance reaches the Cramér-Rao lower bound.
Note
"Asymptotic" does not mean "at infinity." It means "for sufficiently large $n$ the behavior is well approximated by the limit." A rule of thumb is that the sample mean of a well-behaved distribution looks Normal by $n \approx 30$, but tail-heavy cases may need $n$ in the thousands.

Try it: The sample median should also be consistent for the true median (which equals $\mu$ for symmetric Normal data). Compute the sample median at $n = 10, 100, 1000, 10000$ and print the absolute errors.

RYour turn: sample median convergence
set.seed(99) # mu_true = 5, sigma_true = 2 are already in the session test_ns <- c(10, 100, 1000, 10000) # your code here: for each n, draw rnorm(n, mu_true, sigma_true), # take the sample median, and report abs(median - mu_true)

  
Click to reveal solution
RSample median convergence solution
set.seed(99) test_ns <- c(10, 100, 1000, 10000) med_tbl <- data.frame( n = test_ns, sample_med = sapply(test_ns, function(n) median(rnorm(n, mu_true, sigma_true))), abs_error = NA ) med_tbl$abs_error <- round(abs(med_tbl$sample_med - mu_true), 4) med_tbl$sample_med <- round(med_tbl$sample_med, 4) med_tbl #> n sample_med abs_error #> 1 10 4.6068 0.3932 #> 2 100 4.9036 0.0964 #> 3 1000 4.9918 0.0082 #> 4 10000 5.0085 0.0085

  

Explanation: The median homes in on 5 at a similar rate to the mean. Both estimators are consistent for $\mu$ in a symmetric Normal model, but the median has higher asymptotic variance ($\pi \sigma^2 / 2$ versus $\sigma^2$) so its errors are a bit larger at each $n$.

How do you verify consistency with a simulation in R?

A single shrinking error is convincing anecdotally, but consistency is a statement about the whole probability distribution of $\hat\theta_n - \theta$. The cleanest empirical check is to confirm that the mean squared error goes to zero. If MSE $\to 0$, then $\hat\theta_n \to_p \theta$ automatically (a standard result from Chebyshev's inequality). We'll estimate MSE at several sample sizes and plot it against $n$ on a log-log scale.

RMSE vs n: log-log plot of the sample mean
set.seed(77) ns_grid <- c(20, 50, 100, 200, 500, 1000, 2000, 5000) reps <- 3000 mse_tbl <- data.frame( n = ns_grid, mse = sapply(ns_grid, function(n) { xbars <- replicate(reps, mean(rnorm(n, mu_true, sigma_true))) mean((xbars - mu_true)^2) }) ) mse_tbl$theory <- sigma_true^2 / mse_tbl$n round(mse_tbl, 5) #> n mse theory #> 1 20 0.19894 0.20000 #> 2 50 0.08061 0.08000 #> 3 100 0.04033 0.04000 #> 4 200 0.01976 0.02000 #> 5 500 0.00811 0.00800 #> 6 1000 0.00400 0.00400 #> 7 2000 0.00197 0.00200 #> 8 5000 0.00081 0.00080

  

Empirical MSE tracks the theoretical value $\sigma^2/n$ almost exactly across two orders of magnitude in sample size. The ratio mse/theory stays within a few percent of 1, and both columns fall at the same rate. A log-log plot makes the rate visible at a glance.

RVisualize the 1/n decay
plot(log10(mse_tbl$n), log10(mse_tbl$mse), type = "b", pch = 19, col = "steelblue", xlab = "log10(n)", ylab = "log10(MSE)", main = "MSE of sample mean decays as 1/n") lines(log10(mse_tbl$n), log10(mse_tbl$theory), col = "red", lty = 2) legend("topright", c("empirical", "theory sigma^2/n"), col = c("steelblue", "red"), lty = c(1, 2), pch = c(19, NA), bty = "n")

  

The empirical and theoretical curves overlay perfectly, and the slope of the line is $-1$: doubling $n$ halves MSE. Any estimator whose MSE shrinks at this rate is consistent. If the slope were 0 (flat line), the estimator would be inconsistent.

Tip
Consistency is preserved under continuous transformations. If $\hat\theta_n \to_p \theta$ and $g$ is continuous at $\theta$, then $g(\hat\theta_n) \to_p g(\theta)$. This is the continuous mapping theorem, and it means you rarely need to re-derive consistency for derived quantities like $\log(\hat\theta)$ or $1/\hat\theta$.

Try it: The sample variance $S^2$ should be consistent for $\sigma^2 = 4$. Use var() inside replicate() to estimate the MSE at $n = 100, 500, 2000$ and check that the values shrink.

RYour turn: sample variance is consistent
set.seed(55) # sigma_true = 2 is in the session, so sigma_true^2 = 4 is the target test_ns <- c(100, 500, 2000) # your code here: for each n, compute mean((var(rnorm(n, 5, 2)) - 4)^2) # over 2000 replications and print the table.

  
Click to reveal solution
RSample variance consistency solution
set.seed(55) test_ns <- c(100, 500, 2000) reps <- 2000 var_tbl <- data.frame( n = test_ns, mse_s2 = sapply(test_ns, function(n) { s2s <- replicate(reps, var(rnorm(n, 5, 2))) mean((s2s - 4)^2) }) ) round(var_tbl, 5) #> n mse_s2 #> 1 100 0.32671 #> 2 500 0.06425 #> 3 2000 0.01623

  

Explanation: MSE shrinks by about a factor of 5 when $n$ grows from 100 to 500, and again from 500 to 2000. Sample variance is consistent for $\sigma^2$, and its asymptotic variance is $2\sigma^4/n = 32/n$ for Normal data, which matches the numbers closely.

What is asymptotic normality, and why does it matter?

Consistency tells you the estimator lands on the target eventually, but not how the errors are distributed. Asymptotic normality is the refinement that says: after you rescale $\hat\theta_n - \theta$ by $\sqrt{n}$, the distribution of the rescaled error approaches a Normal curve regardless of the population's shape. This is the statistical law that lets you build confidence intervals as $\hat\theta \pm 1.96 \times \text{SE}$ for almost every estimator you ever meet.

Formally, $\hat\theta_n$ is asymptotically normal with asymptotic variance $V$ when

$$\sqrt{n}\,(\hat\theta_n - \theta) \xrightarrow{\;d\;} N(0, V).$$

Where:

  • $\xrightarrow{d}$ = convergence in distribution (the CDF of the left side converges to the CDF of the Normal)
  • $V$ = the asymptotic variance (often called the "sandwich" variance for complicated models)
  • The $\sqrt{n}$ factor is what makes the limit non-degenerate: without it, the left side would collapse to zero by consistency.

Let's see the central limit theorem deliver asymptotic normality for the sample mean. We'll draw 10,000 samples of size $n = 50$ from a skewed Gamma distribution, compute the standardized statistic $Z_n = \sqrt{n}(\bar{X}_n - \mu)/\sigma$, and overlay the histogram with the $N(0, 1)$ density.

RCLT delivers asymptotic normality from non-normal data
set.seed(321) n <- 50 reps <- 10000 shape <- 2 # Gamma(2, 1) is right-skewed rate <- 1 mu_gamma <- shape / rate sd_gamma <- sqrt(shape) / rate z_scores <- replicate(reps, { x <- rgamma(n, shape = shape, rate = rate) sqrt(n) * (mean(x) - mu_gamma) / sd_gamma }) hist(z_scores, breaks = 60, freq = FALSE, col = "lightblue", border = "white", xlab = "sqrt(n) (xbar - mu) / sigma", main = "Standardized sample mean is asymptotically N(0, 1)") curve(dnorm(x), add = TRUE, col = "red", lwd = 2)

  

The histogram is visually indistinguishable from the red Normal curve even though the raw Gamma data is skewed. Asymptotic normality doesn't care about the population distribution, only that it has finite variance. The $\sqrt{n}$ scaling is what produced the $N(0, 1)$ limit: without it, the numerator would shrink to zero.

Key Insight
Asymptotic normality is what powers every Wald confidence interval in statistics. Once you know $\sqrt{n}(\hat\theta_n - \theta) \to N(0, V)$, you can rearrange to get $\hat\theta_n \pm z_{1-\alpha/2} \sqrt{V/n}$ as a large-sample CI. This single result covers MLEs for any regular likelihood, GMM estimators, robust sandwich estimators, and much more.

A practical companion result is Slutsky's theorem: if you replace an unknown $V$ with a consistent estimator $\hat V$, the CI still has the right coverage. In other words, plug-in standard errors are asymptotically valid.

Try it: Repeat the CLT experiment with the sample median (median() instead of mean()). For Normal data the sample median is also asymptotically normal, with a different asymptotic variance. Produce the histogram and check that it looks standard Normal after scaling by $\sqrt{n}$ and dividing by the correct SD $\sigma\sqrt{\pi/2}$.

RYour turn: asymptotic normality of the sample median
set.seed(12) # mu_true = 5, sigma_true = 2 already in session n <- 100 reps <- 5000 # your code here: draw rnorm samples, compute medians, standardize with # sqrt(n) * (median - mu_true) / (sigma_true * sqrt(pi / 2)), # and histogram the result over a N(0,1) curve.

  
Click to reveal solution
RAsymptotic normality of median solution
set.seed(12) n <- 100 reps <- 5000 sd_med <- sigma_true * sqrt(pi / 2) z_med <- replicate(reps, { x <- rnorm(n, mu_true, sigma_true) sqrt(n) * (median(x) - mu_true) / sd_med }) hist(z_med, breaks = 50, freq = FALSE, col = "lightpink", border = "white", xlab = "standardized median", main = "sqrt(n) (median - mu) / (sigma sqrt(pi/2))") curve(dnorm(x), add = TRUE, col = "red", lwd = 2) round(c(mean = mean(z_med), sd = sd(z_med)), 3) #> mean sd #> 0.001 1.004

  

Explanation: The mean is nearly zero and the SD is nearly one, confirming a standard-Normal asymptotic distribution. The sample median is asymptotically $N(\mu, \pi\sigma^2/(2n))$ for Normal data, so it is less efficient than the mean ($\pi/2 \approx 1.57$ times the variance) but has the same $\sqrt{n}$-normal shape.

How does the delta method turn asymptotic normality into CIs for transformations?

You often need a CI not for $\theta$ itself but for some function $g(\theta)$: a log income, an odds ratio, a ratio of rates, a predicted probability. The delta method gives you the asymptotic distribution of $g(\hat\theta_n)$ for free, as long as $g$ is smooth at $\theta$. The idea is a one-term Taylor expansion: near $\theta$, $g(\hat\theta_n) \approx g(\theta) + g'(\theta)(\hat\theta_n - \theta)$. Combined with asymptotic normality of $\hat\theta_n$, this gives

$$\sqrt{n}\,(g(\hat\theta_n) - g(\theta)) \xrightarrow{\;d\;} N(0,\; [g'(\theta)]^2\, V).$$

Where:

  • $g'(\theta)$ = the derivative of $g$ at the true parameter
  • $V$ = the asymptotic variance of $\hat\theta_n$
  • $[g'(\theta)]^2 V$ = the asymptotic variance of $g(\hat\theta_n)$

Let's check the formula with $g(\mu) = \log(\mu)$ on Normal data. The sample mean has asymptotic variance $V = \sigma^2$, and $g'(\mu) = 1/\mu$, so the delta method predicts an asymptotic variance of $\sigma^2 / \mu^2$ for $\log(\bar X_n)$. We simulate and compare.

RDelta method prediction vs simulation for log(xbar)
set.seed(88) n <- 500 reps <- 10000 delta_sim <- replicate(reps, { x <- rnorm(n, mu_true, sigma_true) sqrt(n) * (log(mean(x)) - log(mu_true)) }) sim_var <- var(delta_sim) dm_var_pred <- sigma_true^2 / mu_true^2 round(c(simulated = sim_var, delta_method = dm_var_pred, ratio = sim_var / dm_var_pred), 4) #> simulated delta_method ratio #> 0.1581 0.1600 0.9882

  

Simulation and theory agree to about 1%. The delta method captures the spread of the log-transformed estimator almost perfectly. In practice this means you can write a Wald CI for $\log(\mu)$ as $\log(\bar X_n) \pm 1.96 \cdot (s / (\bar X_n \sqrt{n}))$, where $s$ is the sample SD plugged in for $\sigma$ (Slutsky again). Exponentiating the endpoints gives a back-transformed CI for $\mu$ that is valid asymptotically and often has better coverage than a naive CI built on the raw mean, especially for positive skewed data.

Tip
The delta method is the first tool you reach for when you need a CI on a function of an estimator. The ingredients are always the same: a $\sqrt{n}$-asymptotic-normal $\hat\theta$ with variance $V$, a smooth function $g$, and its derivative. Everything else is bookkeeping.

Try it: Apply the delta method to $g(\mu) = \mu^2$. The predicted asymptotic variance of $\bar{X}_n^2$ is $[g'(\mu)]^2 \sigma^2 = 4\mu^2 \sigma^2$. Simulate at $n = 500$ and compare.

RYour turn: delta method for g(mu) = mu^2
set.seed(4) # mu_true = 5, sigma_true = 2 n <- 500 reps <- 10000 # your code here: simulate sqrt(n) * (mean(x)^2 - mu_true^2) over reps, # compute its variance, compare to 4 * mu_true^2 * sigma_true^2.

  
Click to reveal solution
RDelta method for mu squared solution
set.seed(4) n <- 500 reps <- 10000 sq_sim <- replicate(reps, { x <- rnorm(n, mu_true, sigma_true) sqrt(n) * (mean(x)^2 - mu_true^2) }) round(c(simulated = var(sq_sim), delta_method = 4 * mu_true^2 * sigma_true^2, ratio = var(sq_sim) / (4 * mu_true^2 * sigma_true^2)), 4) #> simulated delta_method ratio #> 98.3725 100.0000 0.9837

  

Explanation: Simulated and predicted variances agree within Monte Carlo noise. The derivative $g'(\mu) = 2\mu = 10$ contributes a factor of $100$ to the asymptotic variance, which the simulation reproduces.

What does the multivariate delta method look like?

When the parameter is a vector $\boldsymbol\theta = (\theta_1, \dots, \theta_k)$ and $g$ is a real-valued function of the whole vector, the derivative $g'$ is replaced by the gradient $\nabla g$ and the variance $V$ becomes the asymptotic covariance matrix $\Sigma$. The multivariate delta method reads

$$\sqrt{n}\,(g(\hat{\boldsymbol\theta}_n) - g(\boldsymbol\theta)) \xrightarrow{\;d\;} N\!\big(0,\; \nabla g(\boldsymbol\theta)^\top\, \Sigma\, \nabla g(\boldsymbol\theta)\big).$$

Where:

  • $\nabla g(\boldsymbol\theta)$ = the gradient vector of $g$ evaluated at the true parameter
  • $\Sigma$ = the asymptotic covariance matrix of $\sqrt{n}(\hat{\boldsymbol\theta}_n - \boldsymbol\theta)$
  • $\nabla g^\top \Sigma \nabla g$ = a scalar, the asymptotic variance of $g(\hat{\boldsymbol\theta}_n)$

A classic application is the log-odds-ratio from two independent Bernoulli samples. Let $\hat p_1$ and $\hat p_2$ be sample proportions from independent groups of size $n$ each, and define $g(p_1, p_2) = \log(p_1/(1-p_1)) - \log(p_2/(1-p_2))$. The gradient is $(1/[p_1(1-p_1)], -1/[p_2(1-p_2)])$. The asymptotic covariance is diagonal with entries $p_i(1-p_i)$, so the multivariate delta method collapses to a clean closed form.

RMultivariate delta method: log-odds-ratio SE
set.seed(2026) n <- 400 p1 <- 0.30 p2 <- 0.50 reps <- 10000 log_or_sim <- replicate(reps, { x1 <- rbinom(1, n, p1) / n x2 <- rbinom(1, n, p2) / n # trim to avoid log(0) x1 <- pmin(pmax(x1, 1 / n), 1 - 1 / n) x2 <- pmin(pmax(x2, 1 / n), 1 - 1 / n) log(x1 / (1 - x1)) - log(x2 / (1 - x2)) }) log_or_true <- log(p1 / (1 - p1)) - log(p2 / (1 - p2)) log_or_var_dm <- 1 / (n * p1 * (1 - p1)) + 1 / (n * p2 * (1 - p2)) round(c(sim_var = var(log_or_sim), delta_method = log_or_var_dm, sim_mean = mean(log_or_sim), theory_mean = log_or_true), 4) #> sim_var delta_method sim_mean theory_mean #> 0.0293 0.0292 -0.8489 -0.8473

  

The simulated variance (0.0293) sits on top of the delta-method prediction (0.0292), and the simulated mean matches the true log odds ratio. To build a 95% CI for the log-OR, you'd use $\hat\ell \pm 1.96 \sqrt{\hat V_{\text{dm}}}$ with $\hat p_i$ plugged into the formula (Slutsky again). Exponentiating gives a CI for the odds ratio itself.

Warning
The delta method fails when $g'(\theta) = 0$. If the derivative of $g$ at the true parameter is zero, the first-order Taylor term vanishes and you need the second-order delta method, which gives a chi-square limit, not a normal one. We'll see this failure explicitly in Exercise 3.

Try it: For the log-odds-ratio example, compute the delta-method 95% CI for the log-OR from a single sample of size $n = 400$ per group, using the plug-in formula $\hat V = 1/(n \hat p_1 (1 - \hat p_1)) + 1/(n \hat p_2 (1 - \hat p_2))$.

RYour turn: log-OR Wald CI from one sample
set.seed(3030) n <- 400 p1 <- 0.30 p2 <- 0.50 x1 <- rbinom(1, n, p1) / n x2 <- rbinom(1, n, p2) / n # your code here: # 1. compute log_or_hat = log(x1/(1-x1)) - log(x2/(1-x2)) # 2. compute se_hat = sqrt(1/(n*x1*(1-x1)) + 1/(n*x2*(1-x2))) # 3. ci = log_or_hat +/- 1.96 * se_hat # Print log_or_hat, se_hat, and the CI.

  
Click to reveal solution
RLog-OR Wald CI solution
set.seed(3030) n <- 400 p1 <- 0.30 p2 <- 0.50 x1 <- rbinom(1, n, p1) / n x2 <- rbinom(1, n, p2) / n log_or_hat <- log(x1 / (1 - x1)) - log(x2 / (1 - x2)) se_hat <- sqrt(1 / (n * x1 * (1 - x1)) + 1 / (n * x2 * (1 - x2))) ci <- log_or_hat + c(-1, 1) * 1.96 * se_hat round(c(log_or_hat = log_or_hat, se = se_hat, lower = ci[1], upper = ci[2]), 3) #> log_or_hat se lower upper #> -0.877 0.172 -1.214 -0.540

  

Explanation: The CI $(-1.21, -0.54)$ contains the true log-OR $-0.847$, as the 95% coverage guarantees asymptotically. Exponentiate the endpoints to get a CI for the odds ratio itself: $(0.30, 0.58)$.

Practice Exercises

Exercise 1: Fit the MSE decay slope

Compute the empirical MSE of the sample mean at $n = 20, 50, 100, 500, 1000$ from Normal(0, 3), then regress $\log_{10}(\text{MSE})$ on $\log_{10}(n)$. The fitted slope should be close to $-1$, confirming the $1/n$ rate and hence consistency.

RExercise 1: MSE decay slope
# Hint: use replicate() for each n, then lm(log10(mse) ~ log10(n)) # Write your code below:

  
Click to reveal solution
RExercise 1 solution
set.seed(123) ex_ns <- c(20, 50, 100, 500, 1000) ex_reps <- 2000 ex_mse <- sapply(ex_ns, function(n) { xbars <- replicate(ex_reps, mean(rnorm(n, 0, 3))) mean(xbars^2) }) ex_mse_fit <- lm(log10(ex_mse) ~ log10(ex_ns)) round(coef(ex_mse_fit), 3) #> (Intercept) log10(ex_ns) #> 0.950 -0.999

  

Explanation: The slope $-0.999$ is essentially $-1$, so MSE scales like $1/n$. The intercept close to $\log_{10}(9) \approx 0.954$ matches the theoretical constant $\sigma^2 = 9$. The fitted line exactly describes the theory.

Exercise 2: Delta-method CI for the coefficient of variation

The coefficient of variation is $\text{CV} = \sigma/\mu$. Its natural estimator is $\widehat{CV} = S / \bar X$. For Normal data the delta-method asymptotic variance works out to $\text{CV}^2 (1/(2) + \text{CV}^2)/n$ approximately, but we'll just estimate SE via simulation and build a Wald CI using the plug-in delta-method SE from a single sample. Evaluate coverage by drawing 2000 samples of size $n = 200$ from Normal(10, 2) and counting how often the delta-method 95% CI covers CV $= 0.2$.

RExercise 2: CV coverage
# Hint: # For each of 2000 samples: # compute cv_hat = sd(x) / mean(x) # compute se_hat approximately via: se_hat = cv_hat * sqrt( (0.5 + cv_hat^2) / n ) # CI = cv_hat +/- 1.96 * se_hat # Report mean(CI_contains_0.2) # Write your code below:

  
Click to reveal solution
RExercise 2 solution
set.seed(45) ex_n <- 200 ex_reps <- 2000 cv_true <- 2 / 10 cv_cov <- replicate(ex_reps, { x <- rnorm(ex_n, mean = 10, sd = 2) cv_hat <- sd(x) / mean(x) se_hat <- cv_hat * sqrt((0.5 + cv_hat^2) / ex_n) ci <- cv_hat + c(-1, 1) * 1.96 * se_hat as.integer(ci[1] <= cv_true && cv_true <= ci[2]) }) round(mean(cv_cov), 3) #> [1] 0.942

  

Explanation: Coverage is 94.2%, close to the nominal 95%. The slight undercoverage is expected because the delta-method CI uses a first-order Taylor approximation; at $n = 200$ the residual error still shows up at the second decimal of coverage.

Exercise 3: Detect delta-method failure at $g'(\theta) = 0$

Set $g(\mu) = \mu^2$ again, but this time with $\mu = 0$. Now $g'(\mu) = 2\mu = 0$, so the delta-method formula predicts zero asymptotic variance, which is clearly wrong. Simulate $\sqrt{n}\,\bar{X}_n^2$ at $n = 500$ from Normal(0, 1) and show the distribution is NOT normal; it is actually a scaled chi-square.

RExercise 3: delta method breakdown
# Hint: simulate sqrt(n) * mean(x)^2 over 10000 reps at n=500, sigma=1, # then compare a histogram to a chi-square(1) distribution scaled by sigma^2. # Write your code below:

  
Click to reveal solution
RExercise 3 solution
set.seed(7) ex_n <- 500 ex_reps <- 10000 root_n_xbar_sq <- replicate(ex_reps, sqrt(ex_n) * mean(rnorm(ex_n, 0, 1))^2) hist(root_n_xbar_sq, breaks = 60, freq = FALSE, col = "lightyellow", border = "white", main = "sqrt(n) xbar^2 is NOT normal when mu = 0", xlab = "sqrt(n) xbar_n^2") curve(dchisq(x * sqrt(ex_n), df = 1) * sqrt(ex_n), add = TRUE, col = "red", lwd = 2) round(c(mean = mean(root_n_xbar_sq), var = var(root_n_xbar_sq)), 3) #> mean var #> 0.044 0.004

  

Explanation: The histogram is heavily right-skewed and concentrated near zero, nothing like a Normal curve. What you're seeing is $\sigma^2 \chi^2_1$, not a Gaussian. When $g'(\theta) = 0$, the first-order delta method collapses and you need the second-order expansion (with the second derivative) to describe the limit. This is the standard failure mode to remember.

Complete Example: Delta-method CI for a ratio of means

Let's put every tool together. You have two independent samples of log-income (in log dollars), and you want a confidence interval for the ratio $\mu_1 / \mu_2$ of expected log-incomes. We'll use the multivariate delta method on the log scale to build a symmetric CI, then exponentiate, and compare against a bootstrap CI on the same data to sanity-check.

REnd-to-end delta-method CI for a ratio of means
set.seed(2026) n1 <- 250 n2 <- 300 x1 <- rnorm(n1, mean = 10.2, sd = 0.6) # group 1 log-income x2 <- rnorm(n2, mean = 9.8, sd = 0.5) # group 2 log-income m1 <- mean(x1); m2 <- mean(x2) s1 <- sd(x1); s2 <- sd(x2) # Target: mu1 / mu2. Work on log scale, g(mu1, mu2) = log(mu1) - log(mu2). # Gradient = (1/mu1, -1/mu2). Diagonal covariance of (xbar1, xbar2). log_ratio <- log(m1) - log(m2) se_log <- sqrt(s1^2 / (n1 * m1^2) + s2^2 / (n2 * m2^2)) ratio_ci_dm <- exp(log_ratio + c(-1, 1) * 1.96 * se_log) # Bootstrap for comparison B <- 2000 boot_ratio <- replicate(B, { b1 <- sample(x1, n1, replace = TRUE) b2 <- sample(x2, n2, replace = TRUE) mean(b1) / mean(b2) }) ratio_ci_boot <- quantile(boot_ratio, c(0.025, 0.975)) data.frame( method = c("Delta method (log scale)", "Bootstrap"), lower = round(c(ratio_ci_dm[1], ratio_ci_boot[1]), 4), upper = round(c(ratio_ci_dm[2], ratio_ci_boot[2]), 4), width = round(c(diff(ratio_ci_dm), diff(ratio_ci_boot)), 4) ) #> method lower upper width #> 1 Delta method (log scale) 1.0260 1.0681 0.0421 #> 2 Bootstrap 1.0262 1.0682 0.0420

  

The delta-method CI and the bootstrap CI agree to the fourth decimal in both endpoints and width. That is the payoff of asymptotic theory: one derivative and one Normal quantile give you essentially the same interval a bootstrap would compute by resampling 2,000 times, at a tiny fraction of the cost. For large $n$ with smooth transformations, the delta method is virtually always the right first pass; you fall back to the bootstrap only when $n$ is small or the transformation is badly non-smooth.

Summary

The asymptotic-theory ladder: consistency, then asymptotic normality, then delta method for transformations.

Figure 1: The asymptotic-theory ladder, climbed one rung at a time.

Property Meaning Formula R pattern
Consistency estimator converges to truth $\hat\theta_n \to_p \theta$ MSE shrinks with $n$, log-log slope $-1$
Asymptotic normality CLT for estimators $\sqrt{n}(\hat\theta_n - \theta) \to_d N(0, V)$ histogram of standardized stat over N(0,1)
Delta method (univariate) CI for $g(\hat\theta)$ $\sqrt{n}(g(\hat\theta_n)-g(\theta)) \to_d N(0, [g'(\theta)]^2 V)$ supply $g'(\theta)$, plug in $\hat V$
Delta method (multivariate) multi-parameter version $\sqrt{n}(g(\hat{\boldsymbol\theta}_n)-g(\boldsymbol\theta)) \to_d N(0, \nabla g^\top \Sigma \nabla g)$ gradient times covariance times gradient

Think of it as a ladder. The first rung, consistency, guarantees you get to the right place. The second, asymptotic normality, guarantees the errors are shaped like a Gaussian on the $1/\sqrt{n}$ scale. The third, the delta method, extends the Gaussian story to any smooth function of the estimator. Together they explain why almost every inference in applied statistics takes the form $\hat\theta \pm 1.96 \times \text{SE}$.

References

  1. van der Vaart, A. W., Asymptotic Statistics. Cambridge University Press (2000). Chapters 2 and 3. Link
  2. Casella, G. & Berger, R. L., Statistical Inference, 2nd ed. Duxbury (2002). Chapter 10.
  3. Lehmann, E. L., Elements of Large-Sample Theory. Springer (1999).
  4. Wasserman, L., All of Statistics. Springer (2004). Chapter 5.
  5. Wikipedia, "Delta method". Link
  6. StatLect, "Delta method". Link
  7. Stanford STATS 300A Lecture Notes, Large-sample theory.

Continue Learning