Shapiro-Wilk Test in R: Test Normality With shapiro.test()
The Shapiro-Wilk test in R checks whether a numeric sample is consistent with a normal distribution. Use shapiro.test(x); a p-value below your threshold (e.g., 0.05) suggests non-normality.
shapiro.test(x) # default test shapiro.test(x)$p.value # extract p-value shapiro.test(x)$statistic # W statistic (close to 1 = normal) qqnorm(x); qqline(x) # visual Q-Q plot ks.test(x, "pnorm", mean(x), sd(x)) # alternative: KS test nortest::ad.test(x) # Anderson-Darling alternative
Need explanation? Read on for examples and pitfalls.
What Shapiro-Wilk does in one sentence
Shapiro-Wilk computes a test statistic W that compares the sample's order statistics to those expected from a normal distribution; W close to 1 means normal, lower means non-normal. The p-value tells you the probability of seeing a deviation this large by chance under normality.
The test is widely used as an automated normality check before t-tests, ANOVA, or regression. It works best with sample sizes between 7 and 5000.
Syntax
shapiro.test(x) returns the W statistic and p-value.
W = 0.95 (close to 1, suggesting normality), p = 0.12 (not significant, cannot reject normality).
qqnorm(x); qqline(x) is the one-line standard.Five common patterns
1. Basic normality test
A p-value above 0.05 means the data are CONSISTENT with normal. This is not "proof of normality"; it just means you cannot reject it.
2. Test residuals from a regression
Many regression assumptions concern RESIDUALS, not raw data. Test residuals(fit) after fitting your model.
3. Pair with Q-Q plot
Points on the line in the Q-Q plot indicate normality. Curvature suggests skew; outliers at the ends suggest heavy tails.
4. Compare normality across groups
Use by() (base R) or dplyr::group_by() %>% summarise() to apply the test per group. Useful before deciding ANOVA vs Kruskal-Wallis.
5. Alternative tests for n > 5000
nortest::ad.test() (Anderson-Darling) handles arbitrary sample sizes. Other options: nortest::lillie.test() (Lilliefors-corrected KS), or just visual Q-Q plots.
Shapiro-Wilk vs other normality tests
Several normality tests exist, each with different sample-size limits and sensitivities. Pick based on your data and what kind of deviation matters.
| Test | Best for | Strengths | Weaknesses |
|---|---|---|---|
| Shapiro-Wilk | n in 7-5000 | Most powerful for typical samples | Errors above n=5000 |
| Anderson-Darling | Any n | Sensitive to tail differences | Requires nortest package |
| Lilliefors (KS) | Any n | Simple; works on any continuous | Less powerful than Shapiro |
| Jarque-Bera | Large n | Tests skew + kurtosis directly | Sensitive only to higher moments |
When to use which:
- Use Shapiro-Wilk for typical sample sizes (7 to 5000).
- Use Anderson-Darling for n > 5000 or when tails matter.
- Always pair with a Q-Q plot for visual confirmation.
Common pitfalls
Pitfall 1: relying on a non-significant p-value as PROOF of normality. A non-significant test means you cannot REJECT normality, not that data ARE normal. With small samples, the test has low power to detect non-normality.
Pitfall 2: testing normality on already-grouped or already-transformed data. Shapiro tests the full distribution. If you suspect different group means, test residuals after fitting a model, not the raw data.
nortest::ad.test(), ks.test(), or just rely on Q-Q plots. The CLT also makes inferential tests robust at large n, so normality testing matters less.Pitfall 3: using Shapiro on count, ordinal, or categorical data. These data types are not normal by construction. Skip the test and use methods designed for the data type (Poisson regression, ordinal regression, etc.).
Try it yourself
Try it: Test whether iris$Sepal.Length for the species "setosa" is normally distributed. Save to ex_test.
Click to reveal solution
Explanation: W = 0.98 (very close to 1) and p = 0.46 (well above 0.05). The data are consistent with normality. Pairing this with a Q-Q plot would visually confirm the points fall close to the reference line.
Related tests
After mastering Shapiro-Wilk, look at:
qqnorm(),qqline(): Q-Q plot for visual normality assessmentks.test(x, "pnorm", mean, sd): Kolmogorov-Smirnov against any specified distributionnortest::ad.test(): Anderson-Darling (works on any n)nortest::lillie.test(): Lilliefors-corrected KS testmoments::skewness(),kurtosis(): direct measures of distribution shapebartlett.test(),levene.test(): equal-variance tests (related but different)
For multivariate normality, MVN::mvn() provides multiple tests including Mardia and Henze-Zirkler.
FAQ
How do I test for normality in R?
shapiro.test(x) for sample sizes 7 to 5000. Pair with qqnorm(x); qqline(x) for a visual check. P-value above 0.05 means data are consistent with normal; below means evidence against normality.
What is a good Shapiro-Wilk W statistic?
W ranges from 0 to 1. W = 1 means perfect normality; values above 0.95 typically suggest the data are normal-like. Pair the W value with the p-value: high W and high p both suggest normality.
Why does Shapiro-Wilk error with large samples?
R limits Shapiro-Wilk to n <= 5000 because the test becomes hypersensitive to trivial deviations at larger sizes (large n + any tiny non-normality = "significant"). For larger samples, use Anderson-Darling (nortest::ad.test) or visual Q-Q plots.
How do I test if model residuals are normal?
Fit the model, extract residuals, then run shapiro: fit <- lm(y ~ x); shapiro.test(residuals(fit)). Residual normality is what regression and ANOVA assume, not raw-data normality.
What if Shapiro-Wilk says my data are not normal?
Three options. 1) Transform the data (log, sqrt, Box-Cox) and retest. 2) Use a non-parametric alternative (Wilcoxon instead of t-test, Kruskal-Wallis instead of ANOVA). 3) Use a more flexible model (GLM, robust regression, bootstrapping). The choice depends on your downstream analysis.