Normality Test Picker
Many statistical tests assume your data come from a normal (bell-curve) distribution. Shapiro-Wilk, Anderson-Darling, Lilliefors, and Jarque-Bera each check this differently. Paste your data to pick the right test for your sample size, see the verdict and Q-Q plot, and get a recommendation on parametric vs non-parametric.
New to normality testing? Read the 4-min primer ▾
What it is. A normality test is a formal check of the question “does my data look like it came from a bell-shaped curve?”. The test returns a p-value: small p means “the shape is suspicious”, large p means “no strong evidence against normal”. The Q-Q plot is the visual companion - sort your data, plot it against where a normal sample would land, and read the shape.
How to read p. p < 0.05 is the conventional reject line. p = 0.034 means: if the truth were a perfect normal, we'd see data this weird only 3.4% of the time - so we doubt the truth. p = 0.42 means we have no quarrel with normal. Failing to reject is not proof of normality; with small n, the test is often just under-powered.
Picking the right test. Small to moderate samples (n ≤ 50) → Shapiro-Wilk, the most powerful general test. Care about tail behaviour or outliers? → Anderson-Darling, weighted toward the tails. Want simple intuition? → Jarque-Bera, built from skew and kurtosis. Estimating the mean & SD from the same data? → Lilliefors (a corrected Kolmogorov-Smirnov). The Q-Q plot beats them all when n is small or huge.
When normality matters. t-tests, ANOVA and regression CIs assume residuals are roughly normal - but they're robust enough that mild deviations don't matter once n > 30 (CLT). Variance / SD CIs and exact tail probabilities are far more sensitive. For n > 5000, every formal test will reject; trust the Q-Q plot, not the p-value.
Try a real-world example to load.
Twenty values from a standard normal. Tests should not reject; the Q-Q plot should look like a clean diagonal.
Read more Anatomy of normality testing
Caveats When this is the wrong tool
- If you have…
- Use instead
- Ordered or categorical data
- Normality is undefined for ordinal / categorical scales. Use a χ² goodness-of-fit or non-parametric methods.
- n < 8
- Tests have almost no power; non-rejection is meaningless. Read the Q-Q plot, then assume nothing.
- n > 5000
- Formal tests over-reject - they catch tiny irrelevant deviations. The Q-Q plot is the only honest answer.
- Need an equivalence-style "data is close enough to normal"
- Use a Bayes factor or TOST-style equivalence test - coming in a later batch.
- Multivariate normality
- Mardia's test or Henze-Zirkler - out of scope. Use the
MVNR package. - Regression / ANOVA assumption check
- Run the test on the residuals, not the raw outcome:
shapiro.test(residuals(fit)).
- Why normality tests under-perform on big data - the over-rejection problem and what to do.
- Reading a Q-Q plot - the four classic shapes and what they mean.
- The Shapiro-Wilk test, intuitively - why it has such good small-n power.
- Confidence Interval Calculator - the next step once you've checked normality.
Numerical accuracy: Φ(z) accurate to ~7.5 × 10⁻⁸ (Hart's algorithm); Shapiro-Wilk via Royston AS R94; Anderson-Darling via Stephens (1986); Lilliefors p via Dallal-Wilkinson (1986).