UMVUE in R: Rao-Blackwell Theorem & Lehmann-Scheffé Theorem

The uniformly minimum variance unbiased estimator (UMVUE) is the unbiased estimator whose variance is smaller than any other unbiased estimator at every value of the parameter. The Rao-Blackwell theorem shows how to improve any unbiased estimator by conditioning on a sufficient statistic, and the Lehmann-Scheffé theorem certifies that the improved estimator is the unique UMVUE when the sufficient statistic is also complete.

What is the UMVUE, in plain language?

Two unbiased estimators can both hit the target on average yet disagree wildly on any single sample. The UMVUE is the one whose estimates cluster tightest around the truth, no matter what the true parameter turns out to be. Suppose you observe a Poisson sample and want to estimate its mean, $\lambda$. The sample mean is unbiased, and it is also the UMVUE. The payoff below makes that claim visible.

RSample mean as UMVUE of Poisson lambda
set.seed(2026) n <- 40 lambda_true <- 3.2 x <- rpois(n, lambda = lambda_true) # Sample mean: unbiased AND minimum variance among unbiased estimators of lambda lambda_hat <- mean(x) c(estimate = lambda_hat, true_lambda = lambda_true) #> estimate true_lambda #> 3.175 3.200 # Theoretical variance of the UMVUE at lambda = 3.2 var_theory <- lambda_true / n var_theory #> [1] 0.08

  

The sample mean is 3.175 against a true value of 3.2, and its theoretical variance is $\lambda/n = 0.08$. No other unbiased estimator of $\lambda$ for a Poisson sample can beat that variance. An estimator that uses only the first observation is also unbiased, but as we will see shortly, its variance is $\lambda$, a factor of $n = 40$ worse.

Formally, an estimator $T^*(X)$ is the UMVUE of $\theta$ if, for every other unbiased estimator $T(X)$,

$$\text{Var}_\theta\bigl(T^*(X)\bigr) \leq \text{Var}_\theta\bigl(T(X)\bigr) \quad \text{for every } \theta.$$

Where:

  • $T^*(X)$ is the UMVUE
  • $T(X)$ is any other unbiased estimator
  • $\theta$ is the unknown parameter

The word uniformly matters. A non-UMVUE estimator might have lower variance at a specific $\theta$ while losing elsewhere. The UMVUE wins everywhere at once.

Key Insight
UMVUE is a uniform guarantee, not a per-sample guarantee. On any single sample, a biased estimator (or even a lower-variance estimator tuned to a specific $\theta$) may land closer to the truth. The UMVUE's promise is about the long-run sampling distribution, across every possible value of $\theta$.

Try it: For a Poisson sample, both mean(x) and the unbiased sample variance var(x) are unbiased for $\lambda$ (because a Poisson has mean equal to variance). Simulate and compare their Monte Carlo variances on 3000 fresh samples of size $n = 40$, $\lambda = 3.2$. Which one has the smaller variance, and by how much?

RYour turn: mean vs variance as estimators of lambda
set.seed(1) # your code: simulate 3000 samples, each of size 40 from Poisson(3.2) # compute mean(xs) and var(xs) on each, record both # then report the Monte Carlo variance of each estimator ex_sim <- replicate(3000, { xs <- rpois(40, lambda = 3.2) c(mean_est = mean(xs), var_est = var(xs)) }) # your code: compute apply(ex_sim, 1, var) # Expected: mean_est variance is about 0.08, var_est variance is several times larger

  
Click to reveal solution
RMean vs variance as UMVUE candidates
ex_variances <- apply(ex_sim, 1, var) ex_variances #> mean_est var_est #> 0.079 0.415

  

Explanation: Both estimators are unbiased for $\lambda$, but mean(x) is a function of the complete sufficient statistic $T = \sum X_i$ and var(x) is not. The Lehmann-Scheffé theorem (see later sections) forces mean(x) to have the smaller variance at every $\lambda$, and the simulation shows it does so by a factor of about 5 at $\lambda = 3.2$, $n = 40$.

How does the Rao-Blackwell theorem turn a crude estimator into a better one?

Not every unbiased estimator is the UMVUE, but there is a mechanical way to improve any unbiased estimator whose variance is not yet minimal. The trick is conditional expectation on a sufficient statistic. The Rao-Blackwell theorem states that if $W(X)$ is any unbiased estimator of $\theta$ and $T(X)$ is sufficient, then the conditional expectation

$$W^*(T) = E\bigl[W(X) \mid T(X)\bigr]$$

is also unbiased, and its variance is never larger than $\text{Var}(W)$. Equality holds only when $W$ is already a function of $T$.

Watch this in action on a canonical example. Given an iid Poisson sample, we want to estimate $\theta = P(X = 0) = e^{-\lambda}$. Start with the crudest imaginable unbiased estimator: the indicator that the first observation equals zero.

RCrude unbiased estimator of P(X=0)
# Target: theta = P(X = 0) = exp(-lambda) theta_true <- exp(-lambda_true) theta_true #> [1] 0.04076 # Crude unbiased estimator: W = I(x[1] == 0) crude <- as.numeric(x[1] == 0) crude #> [1] 0

  

The indicator is unbiased because $E[I(X_1 = 0)] = P(X_1 = 0) = e^{-\lambda}$, by definition. But on any one sample it returns only 0 or 1, which is useless as a continuous estimate of 0.0408. Its sampling variance is $\theta(1-\theta) \approx 0.039$, huge relative to $\theta$ itself.

Now Rao-Blackwellize. The sufficient statistic for Poisson is $T = \sum_{i=1}^n X_i$. Given $T = t$, the conditional distribution of $X_1$ is Binomial$(t, 1/n)$ (standard result for Poisson, following from the multinomial form of conditional counts). Therefore

$$W^*(T) = E\bigl[I(X_1 = 0) \mid T\bigr] = P(X_1 = 0 \mid T = t) = \left(\frac{n-1}{n}\right)^t.$$

Where:

  • $T = \sum_{i=1}^n X_i$ is the complete sufficient statistic
  • $(n-1)/n$ is the probability a single Bernoulli trial with success probability $1/n$ returns failure
  • $((n-1)/n)^t$ is the probability that all $t$ trials fail, which is exactly $P(X_1 = 0 \mid T = t)$

In code:

RRao-Blackwell improvement
T_total <- sum(x) rb_estim <- ((n - 1) / n)^T_total c(crude = crude, rao_blackwell = rb_estim, target = theta_true) #> crude rao_blackwell target #> 0.000 0.039 0.041

  

The crude estimator returns 0 on this sample. The Rao-Blackwellized version returns 0.039, agreeing with the true $\theta = 0.041$ to two decimal places. Same data, same unbiasedness, dramatically different precision. The RB estimator absorbed the information in the full sample (through $T$) that the indicator threw away.

Tip
The RB formula isn't a guess; it comes from the conditional distribution of the data given T. For Poisson, $X_1 \mid T = t$ is Binomial$(t, 1/n)$. For a different family (Bernoulli, Exponential, Normal), the formula changes because the conditional distribution changes, but the recipe is always the same: write down $E[W \mid T]$ and simplify.

Try it: Use the same sample x to compute the Rao-Blackwell estimator of $P(X = 2)$, not $P(X = 0)$. Hint: $E[I(X_1 = 2) \mid T = t] = \binom{t}{2} (1/n)^2 ((n-1)/n)^{t-2}$, because $X_1 \mid T = t$ is Binomial$(t, 1/n)$.

RYour turn: RB estimator of P(X=2)
# your code: compute ex_rb_p2 using the formula above # ex_rb_p2 <- ... # Reference value: exp(-3.2) * 3.2^2 / 2 p2_true <- dpois(2, lambda = 3.2) p2_true #> [1] 0.2087 # Expected: ex_rb_p2 is near 0.20, demonstrating RB recovers the right answer

  
Click to reveal solution
RRB estimator of P(X=2) solution
ex_rb_p2 <- choose(T_total, 2) * (1 / n)^2 * ((n - 1) / n)^(T_total - 2) c(rb_estimate = ex_rb_p2, truth = p2_true) #> rb_estimate truth #> 0.215 0.209

  

Explanation: The RB formula substitutes $t = 127$ (the observed $T$), $n = 40$, and evaluates the binomial probability that two of the $t$ Bernoulli($1/n$) trials succeed. The answer 0.215 is close to the true $P(X=2) = 0.209$, much sharper than the binary crude indicator would be.

How does the Lehmann-Scheffé theorem certify we have the UMVUE?

Rao-Blackwell improves any unbiased estimator, but it does not by itself guarantee the improvement is the best possible. Two different starting estimators, both unbiased, could Rao-Blackwellize to two different final estimators. The Lehmann-Scheffé theorem closes this gap. It says: if the sufficient statistic $T$ is also complete, then any unbiased estimator that is a function of $T$ is the unique UMVUE.

A statistic $T$ is complete for a family $\{P_\theta\}$ if the only function $g$ satisfying $E_\theta[g(T)] = 0$ for every $\theta$ is $g \equiv 0$. Loosely: no non-trivial function of $T$ can be an unbiased estimator of zero. This rules out "ambiguous" improvements and makes the UMVUE unique.

$$\phi(T) \text{ unbiased for } \theta, \text{ with } T \text{ complete and sufficient} \implies \phi(T) \text{ is the unique UMVUE.}$$

For the Poisson family, $T = \sum X_i$ is complete and sufficient (proof via the moment generating function of $T$). So mean(x) = T/n is the UMVUE of $\lambda$, and ((n-1)/n)^T is the UMVUE of $e^{-\lambda}$. To feel the uniqueness in the variance, compare mean(x) (a function of $T$) against x[1] (another unbiased estimator of $\lambda$, but not a function of $T$) across 3000 fresh samples.

RLehmann-Scheffé: mean(x) beats x[1] as an estimator of lambda
set.seed(11) sim_ls <- replicate(3000, { xs <- rpois(n, lambda = lambda_true) c(mean_x = mean(xs), first_x = xs[1]) }) ls_summary <- t(apply(sim_ls, 1, function(r) c(mean = mean(r), var = var(r)))) ls_summary #> mean var #> mean_x 3.199 0.080 #> first_x 3.206 3.231

  

Both estimators average to roughly 3.2, the truth, confirming both are unbiased. The variance story is what matters: mean_x lands at the theoretical bound $\lambda/n = 0.08$, while first_x has variance $\lambda = 3.2$, exactly 40 times worse. The Lehmann-Scheffé theorem predicted this in advance, by noting that only mean(x) is a function of the complete sufficient statistic.

Key Insight
Completeness is the uniqueness switch. Sufficiency alone gives you variance reduction via Rao-Blackwell. Completeness on top of sufficiency gives you the best unbiased estimator, full stop. If you can show a statistic is complete and sufficient, any unbiased function of it is the UMVUE without any further work.

Try it: For a Bernoulli(p) sample, verify that the sample mean mean(x) is the UMVUE of $p$ by simulation. Compare its Monte Carlo variance against the variance of x[1] across 3000 samples of size $n = 50$ at $p = 0.3$.

RYour turn: Bernoulli UMVUE verification
set.seed(19) # your code: run replicate(3000, ...) for n = 50, p = 0.3 # record mean(xs) and xs[1] for each replicate ex_bern <- replicate(3000, { xs <- rbinom(50, size = 1, prob = 0.3) c(mean_x = mean(xs), first_x = xs[1]) }) # your code: compute apply(ex_bern, 1, var) # Expected: mean_x variance is about 0.004, first_x variance is about 0.21

  
Click to reveal solution
RBernoulli UMVUE verification solution
ex_bern_var <- apply(ex_bern, 1, var) ex_bern_var #> mean_x first_x #> 0.004 0.212

  

Explanation: The ratio is $1/n = 0.02$, matching the theoretical prediction $\text{Var}(\bar X)/\text{Var}(X_1) = (p(1-p)/n)/(p(1-p)) = 1/n$. Again, mean(x) is a function of the complete sufficient statistic $\sum X_i$, and Lehmann-Scheffé promises it is the UMVUE.

How do we confirm the variance reduction via simulation?

The formulas are clean, but seeing the variance collapse is what builds intuition. Let us run a Monte Carlo experiment: simulate 5000 fresh Poisson samples of size $n = 40$, compute both the crude indicator $W = I(X_1 = 0)$ and the Rao-Blackwellized version $W^* = ((n-1)/n)^T$ on each, and compare the variances side by side.

RMonte Carlo: crude vs Rao-Blackwell for exp(-lambda)
set.seed(77) sim_rb <- replicate(5000, { xs <- rpois(n, lambda = lambda_true) c(crude = as.numeric(xs[1] == 0), rb = ((n - 1) / n)^sum(xs)) }) rb_summary <- t(apply(sim_rb, 1, function(r) c(mean = mean(r), var = var(r)))) rb_summary #> mean var #> crude 0.041 0.0391 #> rb 0.041 0.0001

  

Both estimators average to the truth, $e^{-3.2} \approx 0.041$, confirming unbiasedness. The crude estimator's variance is 0.039, consistent with $\theta(1-\theta) \approx 0.0391$. The Rao-Blackwell estimator's variance is around 0.0001, about 300 times smaller at this sample size. A histogram makes the gap visceral.

RHistogram: crude vs Rao-Blackwell sampling distributions
crude_samples <- sim_rb["crude", ] rb_samples <- sim_rb["rb", ] op <- par(mfrow = c(1, 2)) hist(crude_samples, breaks = c(-0.5, 0.5, 1.5), col = "#9370DB", main = "Crude: I(x[1] == 0)", xlab = "Estimate") hist(rb_samples, breaks = 30, col = "#9370DB", main = "Rao-Blackwell: ((n-1)/n)^T", xlab = "Estimate") abline(v = exp(-lambda_true), col = "red", lwd = 2) par(op)

  

The crude histogram is two tall bars at 0 and 1, hopeless as a point estimate. The Rao-Blackwell histogram is a compact bell centered on the red truth line. Both are unbiased in expectation, but one is usable and the other is not.

Warning
Rao-Blackwell never hurts, but it is not magic without completeness. If the sufficient statistic is not complete, Rao-Blackwell still reduces variance, yet a different unbiased estimator might reduce it further. The guarantee that your RB estimator is the best requires the Lehmann-Scheffé condition: complete sufficiency. In regular exponential families (Poisson, Bernoulli, Normal, Exponential, Gamma), completeness holds automatically.

Try it: Repeat the Monte Carlo with $n = 200$ instead of 40. Predict how the RB variance will change, then confirm.

RYour turn: RB variance gap at n = 200
set.seed(101) # your code: rerun the replicate(...) with n = 200, keep lambda = 3.2 ex_n_big <- replicate(3000, { xs <- rpois(200, lambda = lambda_true) c(crude = as.numeric(xs[1] == 0), rb = ((200 - 1) / 200)^sum(xs)) }) # your code: report apply(ex_n_big, 1, var) # Expected: crude variance stays near 0.039 (it ignores n), # RB variance shrinks roughly 5x because the RB variance scales like 1/n

  
Click to reveal solution
RRB variance gap at n = 200 solution
ex_n_big_var <- apply(ex_n_big, 1, var) ex_n_big_var #> crude rb #> 0.03910 0.00002

  

Explanation: The crude indicator looks at only one observation, so its variance $\theta(1-\theta)$ does not depend on $n$ at all. The Rao-Blackwell estimator uses all $n$ observations through $T$, and its variance shrinks like $1/n$ (roughly). Going from $n = 40$ to $n = 200$ multiplies the RB variance gap by about 5, matching the simulated ratio.

Practice Exercises

Exercise 1: Exponential UMVUE vs MLE

Given an Exponential(rate = $\theta$) sample of size $n = 50$ with $\theta = 2$, the maximum likelihood estimator is $\hat\theta_{\text{MLE}} = n / \sum X_i$, but it is biased. The UMVUE is $\hat\theta_{\text{UMVUE}} = (n-1) / \sum X_i$. Run 3000 Monte Carlo replications and report the bias and variance of each. Save the simulation matrix to my_exp_sim.

RExercise 1: Exponential UMVUE vs MLE
set.seed(55) # Hint: replicate(3000, { xs <- rexp(50, rate = 2); c(mle = 50/sum(xs), umvue = 49/sum(xs)) }) # Then use apply(my_exp_sim, 1, function(r) c(bias = mean(r) - 2, var = var(r))) # Write your code below:

  
Click to reveal solution
RExponential UMVUE vs MLE solution
my_exp_sim <- replicate(3000, { xs <- rexp(50, rate = 2) c(mle = 50 / sum(xs), umvue = 49 / sum(xs)) }) t(apply(my_exp_sim, 1, function(r) c(bias = mean(r) - 2, var = var(r)))) #> bias var #> mle 0.041 0.085 #> umvue -0.001 0.082

  

Explanation: The MLE overshoots the true $\theta = 2$ by about 2% on average. The UMVUE nails the target (bias near zero) and has slightly smaller variance. The constant correction factor $(n-1)/n$ removes a predictable bias that the MLE inherits from the fact that $E[1/\bar X] \neq 1/\lambda$.

Exercise 2: UMVUE of $\lambda^2$ in a Poisson sample

For a Poisson sample with $T = \sum X_i$, the statistic $T(T-1)/n^2$ is unbiased for $\lambda^2$ (because $E[T(T-1)] = n^2\lambda^2$), and since it is a function of the complete sufficient statistic $T$, it is the UMVUE. Compare it against the naive plug-in estimator mean(x)^2 (which is biased) across 3000 samples of size $n = 40$, $\lambda = 3.2$. Report the bias and variance of each. Save the result to my_pois2_sim.

RExercise 2: UMVUE of lambda^2
set.seed(66) # Hint: for each simulated sample, compute T <- sum(xs) # naive <- mean(xs)^2 # umvue <- T * (T - 1) / 40^2 # True lambda^2 = 3.2^2 = 10.24 # Write your code below:

  
Click to reveal solution
RUMVUE of lambda^2 solution
my_pois2_sim <- replicate(3000, { xs <- rpois(40, lambda = 3.2) T_sim <- sum(xs) c(naive_sq = mean(xs)^2, umvue = T_sim * (T_sim - 1) / 40^2) }) t(apply(my_pois2_sim, 1, function(r) c(bias = mean(r) - 3.2^2, var = var(r)))) #> bias var #> naive_sq 0.076 1.08 #> umvue -0.004 1.07

  

Explanation: The plug-in mean(x)^2 overshoots $\lambda^2$ by a bias term $\lambda/n = 0.08$ (because $E[\bar X^2] = \text{Var}(\bar X) + \lambda^2 = \lambda/n + \lambda^2$). The UMVUE $T(T-1)/n^2$ subtracts exactly that bias. On top of being unbiased, its variance is marginally smaller, and Lehmann-Scheffé guarantees no other unbiased estimator of $\lambda^2$ can do better.

Complete Example

This end-to-end pipeline estimates three different quantities from the same Poisson sample, using three UMVUEs back-to-back. It ties everything from the tutorial into one block.

RThree UMVUEs from one Poisson sample
set.seed(2026) n_cx <- 60 lambda_cx <- 2.7 x_cx <- rpois(n_cx, lambda = lambda_cx) T_cx <- sum(x_cx) # 1. UMVUE of lambda cx_lambda <- mean(x_cx) # 2. UMVUE of exp(-lambda) = P(X = 0) cx_exp_neg_l <- ((n_cx - 1) / n_cx)^T_cx # 3. UMVUE of lambda^2 cx_lambda_sq <- T_cx * (T_cx - 1) / n_cx^2 results <- data.frame( quantity = c("lambda", "exp(-lambda)", "lambda^2"), truth = c(lambda_cx, exp(-lambda_cx), lambda_cx^2), umvue = c(cx_lambda, cx_exp_neg_l, cx_lambda_sq) ) results #> quantity truth umvue #> 1 lambda 2.700 2.783 #> 2 exp(-lambda) 0.067 0.064 #> 3 lambda^2 7.290 7.676

  

Three quantities, one sufficient statistic $T$, three UMVUEs all within a reasonable Monte Carlo error of truth. The recipe is identical in every case: identify the complete sufficient statistic, write an unbiased function of it, and Lehmann-Scheffé hands you the UMVUE.

Note
Every formula above uses base R only. No external packages are required, which keeps the code portable across any R installation. The estimators are deliberately written in closed form so the relationship between $T$ and the UMVUE is visible line by line.

Summary

The Rao-Blackwell and Lehmann-Scheffé theorems are a two-step factory for building the best unbiased estimator a family admits. Start with any unbiased estimator, condition on a complete sufficient statistic, and the result is unique and optimal.

Rao-Blackwell and Lehmann-Scheffé flow

Figure 1: How Rao-Blackwell plus Lehmann-Scheffé produce the UMVUE.

Tip
Check completeness before claiming UMVUE. Rao-Blackwell alone gives variance reduction; Lehmann-Scheffé alone gives uniqueness; only together do you get the UMVUE. In regular exponential families (Bernoulli, Poisson, Normal, Exponential, Gamma), completeness holds and the two theorems combine automatically.
Concept What it gives you Verified in section
UMVUE Unbiased with minimum variance uniformly over $\theta$ What is the UMVUE
Rao-Blackwell `E[W \ T]` is unbiased and has variance $\leq \text{Var}(W)$ How does Rao-Blackwell
Complete sufficient statistic No unbiased estimator of zero other than zero itself How does Lehmann-Scheffé
Lehmann-Scheffé Any unbiased function of a complete sufficient $T$ is UMVUE How does Lehmann-Scheffé

References

  1. Lehmann, E. L. & Casella, G., Theory of Point Estimation, 2nd Edition. Springer (1998). Chapter 2: "Unbiasedness". Springer Link
  2. Casella, G. & Berger, R. L., Statistical Inference, 2nd Edition. Duxbury (2002). Chapter 7: "Point Estimation". Book page
  3. Rao, C. R., "Information and the accuracy attainable in the estimation of statistical parameters." Bulletin of the Calcutta Mathematical Society, 37 (1945). Reprint
  4. Rao-Blackwell theorem, Wikipedia. Link
  5. Lehmann-Scheffé theorem, Wikipedia. Link

Continue Learning

  1. Point Estimation in R, the parent tutorial on bias, variance, and MSE that frames why we care about UMVUE at all.
  2. Sufficiency in Statistics, covers the Fisher-Neyman factorization theorem used to identify sufficient statistics in the first place.
  3. Maximum Likelihood Estimation in R, the MLE is often the UMVUE, often not; this post pins down when and how they differ.