Rr‑statistics.co

Normality Test Picker

Many statistical tests assume your data come from a normal (bell-curve) distribution. Shapiro-Wilk, Anderson-Darling, Lilliefors, and Jarque-Bera each check this differently. Paste your data to pick the right test for your sample size, see the verdict and Q-Q plot, and get a recommendation on parametric vs non-parametric.

i New to normality testing? Read the 4-min primer

What it is. A normality test is a formal check of the question “does my data look like it came from a bell-shaped curve?”. The test returns a p-value: small p means “the shape is suspicious”, large p means “no strong evidence against normal”. The Q-Q plot is the visual companion - sort your data, plot it against where a normal sample would land, and read the shape.

How to read p. p < 0.05 is the conventional reject line. p = 0.034 means: if the truth were a perfect normal, we'd see data this weird only 3.4% of the time - so we doubt the truth. p = 0.42 means we have no quarrel with normal. Failing to reject is not proof of normality; with small n, the test is often just under-powered.

Picking the right test. Small to moderate samples (n ≤ 50) → Shapiro-Wilk, the most powerful general test. Care about tail behaviour or outliers? → Anderson-Darling, weighted toward the tails. Want simple intuition? → Jarque-Bera, built from skew and kurtosis. Estimating the mean & SD from the same data? → Lilliefors (a corrected Kolmogorov-Smirnov). The Q-Q plot beats them all when n is small or huge.

When normality matters. t-tests, ANOVA and regression CIs assume residuals are roughly normal - but they're robust enough that mild deviations don't matter once n > 30 (CLT). Variance / SD CIs and exact tail probabilities are far more sensitive. For n > 5000, every formal test will reject; trust the Q-Q plot, not the p-value.

5 tests · one tool · Shapiro-Wilk · Anderson-Darling · Lilliefors · Jarque-Bera · Q-Q plot · Runs in your browser

Try a real-world example to load.

📊 Small sample

Twenty values from a standard normal. Tests should not reject; the Q-Q plot should look like a clean diagonal.

R code RUNNABLE
R Reproduce in R

        
Diagnostic INTERACTIVE
Inference

Read more Anatomy of normality testing
Live recap - your inputs plugged in
Pick a scenario or paste data to see the derivation chain.
W = (Σ aᵢ x⁽ᵢ⁾)² / Σ (xᵢ − x̄)² H₀: data is normal ⇒ W ≈ 1
Shapiro-Wilk. Compares the order-statistics-weighted sum to the sample variance. The aᵢ weights are the expected normal order statistics; deviations from a normal shape pull W below 1. Royston AS R94 gives p-values up to n = 5000.
INx = your Raw data (n ≥ 3).OUTW in the headline; p-value drives the REJECT / KEEP verdict pill against your α.
A² = −n − (1/n) Σ (2i−1) · [ln Φ(zᵢ) + ln(1 − Φ(z₄))] A* = A² (1 + 0.75/n + 2.25/n²)
Anderson-Darling. A weighted KS-style distance, weighted to be more sensitive in the tails. Stephens (1986) gives the small-sample correction A* and a closed-form p-value approximation. Reach for this when outliers / tail behaviour matter.
INx = your Raw data (n ≥ 5).OUT in the headline; recap shows the modified A* and tail-sensitive flag.
D = maxᵢ |Fₙ(xᵢ) − Φ(zᵢ)| where zᵢ = (xᵢ − x̄) / s
Lilliefors. The Kolmogorov-Smirnov statistic, but with critical values corrected for the fact that the mean and SD were estimated from the data themselves. Dallal-Wilkinson (1986) gives the asymptotic p-value.
INx = your Raw data (n ≥ 4).OUTD in the headline; p-value compared to your α.
JB = (n/6) · (S² + (K − 3)² / 4) JB ~ χ²(2) under H₀
Jarque-Bera. Combines sample skewness S and excess kurtosis K−3 into a single χ²(2) statistic. Asymptotic - needs n ≥ 30 to be reliable. Cheap and intuitive, but lower power than Shapiro-Wilk at small n.
INx = your Raw data (n ≥ 4 in practice; trust at n ≥ 30).OUTJB + skew + excess kurtosis in the headline / recap.
Q-Q plot: sample order stats vs. Φ⁻¹((i − 0.5)/n) ⇒ straight line if normal
Q-Q plot is the test you should always look at. Points on a straight line ⇒ normal. Curves at the ends ⇒ heavy / light tails. S-shape ⇒ bimodal or skewed. For very large n, formal tests will reject everything; the plot is the only honest verdict.
INx = your Raw data.OUTThe chart on the right; toggle to histogram + normal-density overlay for a second view.
Caveats When this is the wrong tool
If you have…
Use instead
Ordered or categorical data
Normality is undefined for ordinal / categorical scales. Use a χ² goodness-of-fit or non-parametric methods.
n < 8
Tests have almost no power; non-rejection is meaningless. Read the Q-Q plot, then assume nothing.
n > 5000
Formal tests over-reject - they catch tiny irrelevant deviations. The Q-Q plot is the only honest answer.
Need an equivalence-style "data is close enough to normal"
Use a Bayes factor or TOST-style equivalence test - coming in a later batch.
Multivariate normality
Mardia's test or Henze-Zirkler - out of scope. Use the MVN R package.
Regression / ANOVA assumption check
Run the test on the residuals, not the raw outcome: shapiro.test(residuals(fit)).
Further reading

Numerical accuracy: Φ(z) accurate to ~7.5 × 10⁻⁸ (Hart's algorithm); Shapiro-Wilk via Royston AS R94; Anderson-Darling via Stephens (1986); Lilliefors p via Dallal-Wilkinson (1986).