Central Limit Theorem in R: Simulate It From Skewed, Bimodal, and Uniform Distributions
The Central Limit Theorem (CLT) says the distribution of sample means approaches a normal distribution as the sample size grows — no matter what the parent distribution looks like. This post uses R simulations to show the CLT at work on skewed, bimodal, and uniform populations, so you can watch normality emerge with your own eyes.
How does the Central Limit Theorem work in R?
Here's the claim the CLT makes, and why it feels strange the first time: if you repeatedly draw small samples from any population and average each sample, those averages start to form a bell curve — even when the population itself looks nothing like a bell. The fastest way to believe this is to watch it happen. Let's draw 1,000 sample means from a heavily right-skewed exponential distribution and plot them.
The histogram is a textbook bell curve centered close to 1. The red line is the normal density the CLT predicts: mean 1, standard deviation $1/\sqrt{30} \approx 0.183$. The original exponential was strongly right-skewed, yet its means are nearly symmetric. That is the whole magic trick of the CLT, in one plot.

Figure 1: The five-step recipe any CLT simulation follows — draw, average, repeat, plot.
To appreciate what just happened, compare the parent population to what came out. Here is the exponential itself — 10,000 raw draws, no averaging.
Side by side, the transformation is obvious. The parent has a long right tail and a hard edge at zero. After averaging samples of size 30, the skew is gone and the result looks normal. The parent's shape stopped mattering once we started averaging.
Formally, if $X_1, X_2, \ldots, X_n$ are independent draws from a distribution with mean $\mu$ and finite variance $\sigma^2$, then as $n$ grows:
$$\bar{X}_n \;\sim\; \mathcal{N}\!\left(\mu,\; \frac{\sigma^2}{n}\right)$$
Where:
- $\bar{X}_n$ = the mean of a sample of size $n$
- $\mu$ = the population mean
- $\sigma^2$ = the population variance
- $\sigma / \sqrt{n}$ = the standard error — how tightly the sample means cluster around $\mu$
The standard error shrinks with $\sqrt{n}$, which is why doubling your sample size doesn't halve uncertainty — you need four times the sample to cut the standard error in half.
Try it: Change the exponential rate from 1 to 0.5 and predict the mean and standard error of the resulting sample means at n = 30. Hint: for Exponential(rate = r), $\mu = 1/r$ and $\sigma = 1/r$.
Click to reveal solution
Explanation: Exponential(rate = 0.5) has population mean and sd both equal to $1 / 0.5 = 2$. The CLT predicts sample means at n = 30 have mean 2 and standard error $2/\sqrt{30} \approx 0.365$. The simulation matches.
How does sample size change CLT convergence?
Sample size is the dial that controls how close the sampling distribution gets to normal. Small $n$ means lumpy, skewed histograms. Large $n$ means a tight, symmetric bell. Rather than eyeball three separate plots, let's build a small helper function and then run the experiment across three sample sizes at once.
The mean of means lands on 1.0 (the population mean) and the observed spread lands on 0.181, within 1% of the theoretical $1/\sqrt{30} = 0.183$. That's the CLT formula working in practice, not just on paper. Having simulate_means() means we can now run the same experiment across any distribution with one line.
replicate(k, expr) evaluates expr k times and collects the results into a vector, no manual indexing required. It's faster to write, faster to read, and vectorized under the hood.Now compare three sample sizes on the same exponential. We stack them into a data frame and use facet_wrap() so the three histograms share axes.
At $n = 5$ the sampling distribution still leans right — you can see the tail. At $n = 30$ it's noticeably symmetric. At $n = 100$ it's a crisp bell. This matches the standard error shrinking from $1/\sqrt{5} = 0.447$ to $1/\sqrt{100} = 0.1$. The lesson: there is no single "magic" $n$ — how fast you converge depends on how skewed the parent is.
Try it: Simulate 1,000 sample means of size n = 50 from Exponential(rate = 1) and verify that the standard error is close to $1/\sqrt{50}$.
Click to reveal solution
Explanation: The simulated standard error (0.141) is within 1% of the theoretical $1/\sqrt{50} = 0.141$. Each time you quadruple $n$, the standard error is roughly halved — the $\sqrt{n}$ law.
Does the CLT work for uniform and bimodal distributions?
Skewed data was the dramatic case. Two other shapes test the theorem from opposite ends: a uniform distribution (symmetric, bounded, no tails) and a bimodal mixture (two separate humps). Uniform parents converge fastest because they are already symmetric. Bimodal parents are the most visually interesting because averaging literally fuses the two peaks into one.
Even at $n = 5$ the uniform means are approximately bell-shaped (technically a triangular distribution, which is close enough that your eye barely notices). By $n = 30$ they're indistinguishable from normal. Symmetric parents are the easy case for the CLT.
Bimodal is more striking. Let's build a population that draws from one of two normal components 50/50 — a visibly double-humped shape — and then compute sample means.
The parent has two obvious modes near $-2$ and $+2$. The sampling distribution of the mean has one mode near $0$. Why? Most samples of size 30 pull roughly half their points from each hump, and the average of those pulls lands near zero. The two modes are smoothed out by averaging — which is exactly what the CLT predicts should happen.
Try it: Simulate 1,000 sample means of size n = 20 from Uniform(-1, 1) and confirm the mean of means is close to 0.
Click to reveal solution
Explanation: Uniform(-1, 1) has mean 0 and variance $1/3$, so standard deviation $\sqrt{1/3} \approx 0.577$. The CLT predicts sample means at n = 20 have SE $= 0.577 / \sqrt{20} \approx 0.129$ — the simulation agrees.
How do you confirm normality with Q-Q plots and Shapiro-Wilk?
Eyeballing histograms is a start. For a sharper diagnostic, statisticians use two tools: the Q-Q plot (visual) and the Shapiro-Wilk test (numeric). A Q-Q plot compares the quantiles of your data against the quantiles a normal distribution would have — points that fall on a straight line mean the data is normal. Shapiro-Wilk returns a p-value where a high p (typically > 0.05) means you cannot reject the hypothesis that the data is normal.
The points hug the reference line across the bulk of the distribution. A slight curl in the upper-right corner is the last ghost of the exponential's skew — there is still a hint of a heavier right tail at $n = 30$. At $n = 100$ even this curl would disappear.
Now the numeric test. We run Shapiro-Wilk across three sample sizes to see the p-value change.
At $n = 5$ the p-value is essentially zero — the means are clearly non-normal. At $n = 30$ it's small but far from zero. At $n = 100$ it crosses the usual 0.05 threshold — you can no longer reject normality. Shapiro-Wilk puts a number on the progression our eyes saw earlier.
Try it: Run Shapiro-Wilk on 1,000 uniform U(0,1) sample means at n = 5 and predict whether the p-value will be large or small.
Click to reveal solution
Explanation: The p-value (~0.2) is far above 0.05, so we fail to reject normality. This matches intuition: the uniform is symmetric and bounded, so even n = 5 already looks roughly bell-shaped.
When does the Central Limit Theorem fail?
The CLT is powerful, but it has two requirements written in fine print: (1) the population must have a finite variance, and (2) the samples must be independent. Break either one and the theorem stops working. The textbook counter-example for the variance rule is the Cauchy distribution — it has infinite variance, and its sample means do not converge to normal at any $n$.
Both panels look terrible. The spread at $n = 1000$ is not meaningfully smaller than at $n = 30$. That's because the Cauchy distribution has no population mean or variance to converge to — the tails are heavy enough that a single extreme draw can dominate any sample. Increasing $n$ doesn't help.
Independence is the second trap. Time-series data with strong autocorrelation breaks the CLT because consecutive observations carry shared information — your effective sample size is much smaller than the number of rows.
Try it: Compute the spread (standard deviation) of Cauchy sample means at n = 30 and n = 1000 and confirm that increasing n does NOT shrink it.
Click to reveal solution
Explanation: Both standard deviations are enormous and of similar order. Compare to the exponential, where SE shrank from 0.18 (n=30) to 0.10 (n=100). For Cauchy, the $1/\sqrt{n}$ law doesn't apply because variance is undefined — a classic CLT-fails case.
Practice Exercises
Three capstone exercises that combine multiple ideas from the post. The simulate_means() helper from earlier is still in scope — reuse it.
Exercise 1: Compare how fast lognormal vs exponential means converge
Both Lognormal(0, 1) and Exponential(1) are right-skewed, but they skew differently. Draw 1,000 sample means of size n = 30 from each, run Shapiro-Wilk on both, and save the p-values to my_lnorm and my_exp. Which one converges to normal faster at n = 30?
Click to reveal solution
Explanation: Exponential means are closer to normal at n = 30 than lognormal means. Lognormal has a heavier right tail — its skewness is about 6 versus 2 for the exponential — so convergence needs a bigger sample.
Exercise 2: Write a function that finds the smallest n passing Shapiro-Wilk
Write needed_n(rdist, threshold = 0.05) that returns the smallest value from c(5, 10, 20, 30, 50, 100) whose simulated sample means at that size pass Shapiro-Wilk with p > threshold. If none pass, return NA. Test it on Uniform(0,1) and Exponential(1).
Click to reveal solution
Explanation: Uniform passes at the smallest n tested (5) because it's already symmetric. Exponential doesn't pass until n = 100 — the heavy skew takes work to smooth out. Simulation results carry randomness, so expect a slightly different answer if you change the seed.
Exercise 3: Three-component mixture
Create a mixture population that draws equally from N(-3, 0.5), N(0, 0.5), and N(3, 0.5). Plot the parent (10,000 draws) and the sampling distribution of the mean at n = 30 side by side. Confirm the sampling distribution is unimodal and roughly normal.
Click to reveal solution
Explanation: Three humps in the parent become one hump in the mean. Samples of size 30 blend across all three components, and the result is unimodal and approximately normal — the CLT at work.
Complete Example
Let's bring everything together in one four-panel plot: sample means at n = 30 from each of the four populations we studied — exponential, uniform, bimodal, and Cauchy. Three of the four should look normal. One should not.
Three of the four panels show neat bell curves, regardless of how wildly different their parent distributions are. The Cauchy panel is visibly broken — no bell, extreme outliers, no convergence. That single picture contains the post's entire message: the CLT works for finite-variance distributions, and the parent's shape stops mattering once you average.
Summary
The CLT turns any well-behaved population into a normal sampling distribution, given enough samples. How many you need depends on the parent.
| Parent shape | Typical example | Convergence speed | Practical n |
|---|---|---|---|
| Symmetric, bounded | Uniform, Normal | Very fast | 5–15 |
| Bimodal / mixture | Two-normal mixture | Moderate | ~30 |
| Right-skewed | Exponential, Lognormal | Slower | 30–100 |
| Heavy-tailed | Cauchy, stable laws (α < 2) | Never | — |

Figure 2: How fast sample means converge depends on the parent shape — heavy tails never converge.
The three rules to remember: (1) the CLT is about the sampling distribution of the mean, not the raw data; (2) the standard error is $\sigma/\sqrt{n}$, so quadrupling $n$ halves uncertainty; (3) check finite variance and independence before you trust any CLT-based inference.
References
- Rice, J.A. — Mathematical Statistics and Data Analysis, 3rd Edition. Duxbury (2007). Chapter 5 covers the CLT and its proof. Book listing
- Casella, G. & Berger, R.L. — Statistical Inference, 2nd Edition. Duxbury (2002). Section 5.5 on convergence in distribution. Book listing
- Wasserman, L. — All of Statistics. Springer (2004). Section 5.4 on the CLT. Publisher page
- Hesterberg, T. — "What Teachers Should Know About the Bootstrap". The American Statistician 69(4), 2015. Discusses when CLT approximations break and bootstrap alternatives. PDF
- R Core Team — Base R distribution functions:
?rexp,?runif,?rcauchy,?shapiro.test. R manuals - ggplot2 documentation —
stat_function()for overlaying theoretical densities. Link - Wikipedia — Central Limit Theorem entry, useful for the formal proof and historical context. Link
Continue Learning
- Normal, t, F, and Chi-Squared Distributions in R — the normal distribution the CLT produces, plus three close relatives used in testing.
- Binomial and Poisson Distributions in R — discrete distributions whose normal approximations rest on the CLT.
- Sample Size Planning in R — use the $\sigma/\sqrt{n}$ relationship to plan how much data you need for a target precision.