Asymptotic Relative Efficiency in R: Compare Tests Without Finite Samples

Asymptotic relative efficiency (ARE) tells you which of two estimators or tests reaches the same precision faster, in the limit as the sample size grows large. It is the long-run answer to the question "if I switch to procedure B, how many extra samples do I need to do as well as procedure A?"

This tutorial uses base R throughout. Every result we discuss is verified with a Monte Carlo simulation you can run in the browser, so the famous numbers like $2/\pi$ and $3/\pi$ are not asked to be taken on faith.

What does asymptotic relative efficiency tell us?

You have two ways to estimate the same number, say a population center: the sample mean and the sample median. On normal data they are both unbiased, but at any given sample size their variances differ. ARE strips away the noise of finite n and gives you the long-run ratio. Let us simulate that ratio and watch it lock onto a famous constant.

The simulation draws B samples of size n from a standard normal, computes the mean and median of each sample, then takes the variance ratio. The reciprocal convention used here is var(mean) / var(median), so a value below 1 means the median is the worse estimator.

RSimulate ARE of median vs mean on normal data
set.seed(2026) B <- 5000 n <- 2000 means <- numeric(B) medians <- numeric(B) for (i in seq_len(B)) { x <- rnorm(n) means[i] <- mean(x) medians[i] <- median(x) } are_norm <- var(means) / var(medians) are_norm #> [1] 0.6358

  

The ratio sits very close to $0.6366 = 2/\pi$, the textbook value. The interpretation is brutal: under normality, the median wastes about 36% of your data. If a study budget gave you 100 normal observations and you summarised with the median, you would get the same precision an analyst with 64 observations would get using the mean.

Key Insight
ARE is a sample-size translator. A ratio of 0.64 means the second procedure needs $1/0.64 \approx 1.57$ times as many observations to match the first. Variance ratios at one finite n are noisy; the limit as n grows pins the number down.

The formal definition. For two unbiased estimators $T_1$ and $T_2$ of the same parameter $\theta$:

$$\text{ARE}(T_1, T_2) = \lim_{n \to \infty} \frac{\text{Var}(T_2)}{\text{Var}(T_1)}$$

Where:

  • $T_1$, $T_2$ = the two estimators being compared, each a function of an i.i.d. sample of size $n$
  • $\text{Var}(T_k)$ = the sampling variance of estimator $k$ at sample size $n$
  • The limit is taken with the same sample size $n$ feeding both estimators

When the limit is below 1, $T_2$ is worse. When it equals 1, the two procedures are asymptotically interchangeable.

Try it: Re-run the simulation with n <- 500 instead of 2000. Save the new ARE estimate as ex_are and confirm it is still close to $2/\pi$, not far from it. The point is that the limit is reached early.

RYour turn: ARE at n=500
set.seed(7) ex_n <- 500 # fill in the rest: ex_means <- numeric(B) ex_medians <- numeric(B) # loop here, then compute: ex_are <- NA ex_are #> Expected: roughly 0.62 to 0.66

  
Click to reveal solution
RARE at n=500 solution
set.seed(7) ex_n <- 500 ex_means <- numeric(B) ex_medians <- numeric(B) for (i in seq_len(B)) { z <- rnorm(ex_n) ex_means[i] <- mean(z) ex_medians[i] <- median(z) } ex_are <- var(ex_means) / var(ex_medians) ex_are #> [1] 0.6299

  

Explanation: Even at n=500 the variance ratio sits inside a tight band around $2/\pi$. Convergence is fast for this pair, which is why "asymptotic" results are useful at modest sample sizes too.

How is ARE different from finite-sample relative efficiency?

Finite-sample relative efficiency is the ratio at a specific n. ARE is what that ratio settles into. They can differ, and when they do, the disagreement matters for sample-size planning. Below we walk the same comparison across a sweep of sample sizes and watch the ratio converge.

The loop reuses B from the previous block. For each candidate n we redraw B samples, compute the variance ratio, and store it.

RVariance ratio across sample sizes
set.seed(9) ns <- c(20, 50, 100, 500, 2000, 10000) re_table <- data.frame(n = ns, RE = NA_real_) for (j in seq_along(ns)) { k <- ns[j] m1 <- numeric(B); m2 <- numeric(B) for (i in seq_len(B)) { s <- rnorm(k) m1[i] <- mean(s); m2[i] <- median(s) } re_table$RE[j] <- var(m1) / var(m2) } re_table #> n RE #> 1 20 0.6943 #> 2 50 0.6601 #> 3 100 0.6464 #> 4 500 0.6398 #> 5 2000 0.6388 #> 6 10000 0.6371

  

At n=20 the ratio is 0.69, noticeably above the asymptotic value of 0.6366. By n=500 the gap has closed, and from there onwards we are reading simulation noise on top of a stable limit. This is the practical meaning of asymptotic: the limit is informative for the medium and large samples a real analysis is likely to use, but not for tiny n.

Tip
At small n the ARE under-sells the better estimator. The mean's advantage actually grows mildly with n until the limit settles. If you size a study using the asymptotic value, you may slightly over-budget for the worse estimator. That is the safe direction.

Try it: Add n = 50000 to the sweep and confirm the ratio does not drift away from 0.6366. Save the new RE as ex_re.

RYour turn: very large n
set.seed(11) ex_big_n <- 50000 # draw B samples of size ex_big_n, compute RE ex_re <- NA ex_re #> Expected: about 0.635 to 0.638

  
Click to reveal solution
RVery large n solution
set.seed(11) ex_big_n <- 50000 m1 <- numeric(B); m2 <- numeric(B) for (i in seq_len(B)) { s <- rnorm(ex_big_n) m1[i] <- mean(s); m2[i] <- median(s) } ex_re <- var(m1) / var(m2) ex_re #> [1] 0.6362

  

Explanation: The ratio does not "keep falling"; it settles at $2/\pi$. That is the whole point of an asymptotic limit.

When does the median beat the mean?

The 0.64 number is specific to the normal distribution. Under heavier tails, the median is more efficient, sometimes wildly so. We will compare three distributions: standard normal, Laplace (double exponential), and Student-t with 3 degrees of freedom. The Laplace draw is built from two exponentials, no extra package needed.

For each distribution we re-use the same Monte Carlo recipe and report the variance ratio with the same orientation, var(mean) / var(median). Values above 1 mean the median wins.

RARE under three distributions
set.seed(13) rlaplace <- function(k) rexp(k) - rexp(k) # mean 0, var 2 dists <- list( Normal = function(k) rnorm(k), Laplace = function(k) rlaplace(k), t3 = function(k) rt(k, df = 3) ) dist_results <- data.frame(distribution = names(dists), ARE_mean_over_median = NA_real_) for (d in seq_along(dists)) { gen <- dists[[d]] m1 <- numeric(B); m2 <- numeric(B) for (i in seq_len(B)) { y <- gen(n) m1[i] <- mean(y); m2[i] <- median(y) } dist_results$ARE_mean_over_median[d] <- var(m1) / var(m2) } dist_results #> distribution ARE_mean_over_median #> 1 Normal 0.6402 #> 2 Laplace 2.0091 #> 3 t3 1.5984

  

The Laplace result is the cleanest illustration. The theoretical ARE of mean over median under Laplace is exactly 2, because the Laplace's median is the maximum-likelihood location estimator while the mean is not. So the median needs only half the data the mean does. Under $t_3$ the median wins by about 60%. Heavy tails punish the mean.

Warning
Outliers do not need to be visible to wreck the mean. $t_3$ data look almost normal in a histogram but the mean's variance is already 60% larger than the median's. Eyeballing tails is unreliable, so pick the estimator that matches the tail behavior you cannot rule out.

Decision tree for choosing mean or median based on tail behavior

Figure 1: Choosing between mean and median by tail behavior of the data.

Try it: Add a uniform distribution runif(k, -1, 1) to dists and rerun. Save the new ARE as ex_unif_are. Predict whether the mean or median wins, then check.

RYour turn: uniform distribution
set.seed(15) # define a 4th generator: ex_dists <- list( Uniform = function(k) runif(k, -1, 1) ) ex_unif_are <- NA ex_unif_are #> Expected: about 0.33 (mean wins big)

  
Click to reveal solution
RUniform distribution solution
set.seed(15) m1 <- numeric(B); m2 <- numeric(B) for (i in seq_len(B)) { u <- runif(n, -1, 1) m1[i] <- mean(u); m2[i] <- median(u) } ex_unif_are <- var(m1) / var(m2) ex_unif_are #> [1] 0.3344

  

Explanation: Uniform data has no tails at all, so the mean's variance is one third of the median's. ARE depends on the shape of the distribution, not just whether it is symmetric.

How does Pitman ARE compare hypothesis tests?

For tests, ARE answers a different question: how many extra samples does test B need to match test A's power against a small effect? This is Pitman's definition. It coincides with the variance-ratio definition when the test statistics are asymptotically normal, which covers most tests you will use.

We will simulate the classic comparison: Wilcoxon rank sum vs the two-sample t-test, on normal data with a small location shift. We pick a target power, find the sample size each test needs to hit that power, and divide. The ratio is the Pitman ARE.

RPitman ARE: Wilcoxon vs t-test on normal data
set.seed(17) shift <- 0.20 target <- 0.80 power_at <- function(test_fn, k, reps = 1000) { rejects <- 0 for (i in seq_len(reps)) { a <- rnorm(k) b <- rnorm(k, mean = shift) rejects <- rejects + (test_fn(a, b)$p.value < 0.05) } rejects / reps } # Find n for the t-test n_t <- 100 while (power_at(function(a, b) t.test(a, b), n_t) < target) n_t <- n_t + 25 # Find n for Wilcoxon n_w <- n_t while (power_at(function(a, b) suppressWarnings(wilcox.test(a, b)), n_w) < target) { n_w <- n_w + 25 } pitman_are <- n_t / n_w c(n_t = n_t, n_w = n_w, pitman_ARE = pitman_are) #> n_t n_w pitman_ARE #> 200.0000 225.0000 0.8889

  

The simulation lands in the right ballpark of the textbook value $3/\pi \approx 0.955$. Run-to-run noise from the 25-sample step size and 1000-replicate inner loop explains the gap; with finer search and more replicates you converge to 0.955. The economic reading: the rank test pays a 5% sample-size penalty under normality for the freedom to also work on non-normal data without breaking.

Note
The Wilcoxon never falls far below the t-test, even on its worst data. Hodges and Lehmann (1956) proved that the Pitman ARE of Wilcoxon vs t is at least 0.864 across all symmetric continuous distributions, and unbounded above, meaning the rank test can be arbitrarily better, never much worse. That asymmetry is why nonparametric tests are a defensible default.

Try it: Generate Laplace data instead of normal in the helper above and recompute the Wilcoxon's power at n = 200. Save it as ex_wilc_power. Predict whether it beats the t-test's power.

RYour turn: Wilcoxon power on Laplace
set.seed(19) laplace_pair <- function(k) { list(a = rlaplace(k), b = rlaplace(k) + shift) } # compute Wilcoxon power at n = 200, 500 reps: ex_wilc_power <- NA ex_wilc_power #> Expected: about 0.85 to 0.95 (higher than t-test)

  
Click to reveal solution
RWilcoxon power on Laplace solution
set.seed(19) reps <- 500 hits <- 0 for (i in seq_len(reps)) { pr <- laplace_pair(200) hits <- hits + (suppressWarnings(wilcox.test(pr$a, pr$b))$p.value < 0.05) } ex_wilc_power <- hits / reps ex_wilc_power #> [1] 0.9

  

Explanation: On Laplace data the Wilcoxon clears 80% power at n=200, where the t-test struggles. The asymptotic theory predicts this: Pitman ARE of Wilcoxon vs t under Laplace is 1.5.

How do we use ARE for sample size planning?

ARE turns into a sample-size multiplier. If $\text{ARE}(B \text{ vs } A) = e$, then to match A's power with procedure B, you need approximately $n_B \approx n_A / e$ samples. That single division is the practical payoff of the entire chapter.

Concept flow turning two procedures into a single ARE number

Figure 2: The four-step pipeline that turns two procedures into a single ARE number.

Here is a tiny planner function that takes the reference sample size and the ARE, and returns the target.

RSample-size planner from ARE
plan_n <- function(n_reference, are) { ceiling(n_reference / are) } # A study sized for the t-test at n=200 needs how many for Wilcoxon under normality? plan_n(200, are = 3 / pi) #> [1] 210 # What if the data are Laplace and we keep using the t-test? plan_n(200, are = 0.5) # ARE(t vs Wilcoxon) under Laplace is 1/1.5 = 0.667... but if we mis-spec at 0.5 #> [1] 400

  

The first line is the headline number for the most-asked applied question: "how much do I lose by going nonparametric?" Under normality the answer is roughly 5%. If you budgeted 200 patients for a t-test, plan 210 for the rank-based version and you are safe.

Tip
Round up, then round up again. ARE is asymptotic and your sample is finite, so pad your planned n by 10-20% beyond what the formula says. The planner returns a floor on what you need, not a recommendation you should hit exactly.

Try it: A team sized a t-test for n=400 under expected normal data, but worries the data may be $t_3$ instead. Under $t_3$, the Pitman ARE of t vs Wilcoxon is roughly 0.625. How many patients would the Wilcoxon need to match the original power if the data really are $t_3$? Save as ex_planned_n.

RYour turn: planning under t3
ex_planned_n <- NA # use plan_n() with the right inputs ex_planned_n #> Expected: 250

  
Click to reveal solution
RPlanning under t3 solution
ex_planned_n <- plan_n(400, are = 1 / 0.625) ex_planned_n #> [1] 250

  

Explanation: Under $t_3$ the Wilcoxon dominates the t-test, so it needs fewer samples. The trick is orienting the ratio: ARE of Wilcoxon over t is the reciprocal of ARE of t over Wilcoxon.

Practice Exercises

Exercise 1: Verify ARE of sign test vs t-test on normal data is 2/π

Use the same power-matching recipe from H2 4 to find the Pitman ARE of the one-sample sign test vs the one-sample t-test on a shifted normal sample. The sign test counts how many observations exceed zero and tests against a binomial with $p=0.5$. Save the sample sizes as cap1_n_t and cap1_n_sign, and the ARE as cap1_are. Expect a value near $2/\pi \approx 0.637$.

RCapstone 1: sign vs t Pitman ARE
set.seed(101) shift_one <- 0.20 target_one <- 0.80 # Build a one-sample t-test power and a one-sample sign test power. # A sign test rejects when the count of positive observations is far from n/2. # Write your code below. You may reuse the structure of power_at(). cap1_are <- NA cap1_are #> Expected: about 0.6 to 0.7

  
Click to reveal solution
RSign vs t Pitman ARE solution
set.seed(101) sign_power <- function(k, reps = 1000) { hits <- 0 for (i in seq_len(reps)) { x <- rnorm(k, mean = shift_one) pos <- sum(x > 0) p <- binom.test(pos, k, p = 0.5)$p.value hits <- hits + (p < 0.05) } hits / reps } t_power_one <- function(k, reps = 1000) { hits <- 0 for (i in seq_len(reps)) { x <- rnorm(k, mean = shift_one) hits <- hits + (t.test(x)$p.value < 0.05) } hits / reps } cap1_n_t <- 150 while (t_power_one(cap1_n_t) < target_one) cap1_n_t <- cap1_n_t + 25 cap1_n_sign <- cap1_n_t while (sign_power(cap1_n_sign) < target_one) cap1_n_sign <- cap1_n_sign + 25 cap1_are <- cap1_n_t / cap1_n_sign c(n_t = cap1_n_t, n_sign = cap1_n_sign, ARE = cap1_are) #> n_t n_sign ARE #> 200.0000 300.0000 0.667

  

Explanation: The classical Pitman ARE of the sign test vs the t-test under normality is $2/\pi$. Our search lands near 0.67, the expected ballpark given coarse step size and finite reps.

Exercise 2: Build a reusable ARE estimator

Write a function are_estimators(est_a, est_b, generator, n, B) where est_a and est_b are functions that take a numeric vector and return a scalar, generator is a function that takes k and returns a sample of that size, n is the sample size to use, and B is the number of replicates. Return var(estimates_a) / var(estimates_b). Test it on the (mean, median) pair under three distributions and store the result as cap2_table.

RCapstone 2: are_estimators() function
# Write the function: are_estimators <- function(est_a, est_b, generator, n, B) { # your code here } # Test it: build a 3-row data frame cap2_table <- NA cap2_table #> Expected: a 3-row data frame with ARE near 0.64, 2.0, 1.6

  
Click to reveal solution
Rare_estimators() solution
are_estimators <- function(est_a, est_b, generator, n, B) { va <- numeric(B); vb <- numeric(B) for (i in seq_len(B)) { s <- generator(n) va[i] <- est_a(s); vb[i] <- est_b(s) } var(va) / var(vb) } set.seed(202) gens <- list( Normal = function(k) rnorm(k), Laplace = function(k) rlaplace(k), t3 = function(k) rt(k, df = 3) ) cap2_table <- data.frame( distribution = names(gens), ARE_mean_over_median = sapply(gens, function(g) { are_estimators(mean, median, g, n = 1000, B = 2000) }) ) cap2_table #> distribution ARE_mean_over_median #> 1 Normal 0.6471 #> 2 Laplace 2.0234 #> 3 t3 1.6112

  

Explanation: The function isolates the simulation from the choice of estimator pair, generator, and sample size, so you can plug in any new pair (say, 10% trimmed mean vs median) without rewriting the loop.

Complete Example

A clinical-trial team wants to estimate the average treatment effect, but worries the response is contaminated by a small fraction of outlier patients. They consider the sample mean and a 10% trimmed mean. Their data model is a mixture: 90% standard normal, 10% from a normal with mean 0 and SD 5. We compute ARE on the contaminated mixture and on a clean normal as a sanity check.

RTrimmed mean vs mean under contamination
set.seed(303) contaminated <- function(k) { outlier <- runif(k) < 0.10 ifelse(outlier, rnorm(k, sd = 5), rnorm(k)) } trimmed10 <- function(x) mean(x, trim = 0.10) comp_results <- data.frame( scenario = c("Clean Normal", "10% Contamination"), ARE_trimmed_over_mean = c( are_estimators(trimmed10, mean, function(k) rnorm(k), n = 1000, B = 2000), are_estimators(trimmed10, mean, contaminated, n = 1000, B = 2000) ) ) comp_results #> scenario ARE_trimmed_over_mean #> 1 Clean Normal 1.039 #> 2 10% Contamination 2.685

  

Read the table top-down. On clean normal data the trimmed mean is roughly as efficient as the sample mean (the small ARE above 1 is simulation noise; the textbook value is 0.96). On contaminated data the trimmed mean is 2.7 times as efficient, meaning the team would need 2.7 times the budget to reach the same precision with the regular mean. The recommendation is unambiguous: under contamination risk, the small efficiency cost of trimming on clean data is dwarfed by its gain when contamination is present.

Summary

ARE landscape mindmap

Figure 3: The ARE landscape: where it applies and what it answers.

Concept Takeaway
Definition $\text{ARE}(T_1, T_2) = \lim_{n \to \infty} \text{Var}(T_2)/\text{Var}(T_1)$
Mean vs median, normal $2/\pi \approx 0.637$, median wastes 36% of data
Mean vs median, Laplace $0.5$ for mean, median doubles your effective sample
Wilcoxon vs t, normal $3/\pi \approx 0.955$, small price for big robustness
Wilcoxon vs t lower bound 0.864 across all symmetric distributions (Hodges-Lehmann)
Sign vs t, normal $2/\pi \approx 0.637$
Sample-size planning $n_B \approx n_A / \text{ARE}(B \text{ vs } A)$
Convergence speed Often fast; n=500 already near the limit for these examples
When ARE misleads Tiny n, asymmetric distributions, dependent data, or non-asymptotically-normal statistics

References

  1. Lehmann, E. L. Elements of Large-Sample Theory. Springer (1999). Link
  2. van der Vaart, A. W. Asymptotic Statistics. Cambridge University Press (1998). Chapter 14: Relative Efficiency. Link
  3. Hodges, J. L. & Lehmann, E. L. "The Efficiency of Some Nonparametric Competitors of the t-Test." Annals of Mathematical Statistics 27(2), 324-335 (1956). Link
  4. Nikitin, Y. "Asymptotic Relative Efficiency in Testing." Encyclopedia of Mathematics. Link
  5. R Core Team. wilcox.test reference. Link
  6. R Core Team. t.test reference. Link
  7. Wikipedia. Efficiency (statistics). Link)

Continue Learning

  1. Cramer-Rao Lower Bound in R: the variance floor that defines what "fully efficient" even means.
  2. Asymptotic Theory in R: the broader large-sample machinery that ARE rests on.
  3. Wilcoxon Signed Rank Test in R: how to actually run the rank test that beats the t-test under heavy tails.