Wilcoxon Test in R: Signed-Rank and Rank-Sum With Examples
The wilcox.test() function in R runs Wilcoxon tests: signed-rank (paired samples) and rank-sum (independent samples, also known as Mann-Whitney U). Use them when t-test assumptions are violated.
wilcox.test(x, mu = 0) # one-sample signed-rank wilcox.test(x, y) # two-sample (Mann-Whitney U) wilcox.test(x, y, paired = TRUE) # paired signed-rank wilcox.test(x, y, alternative = "greater") # one-sided wilcox.test(x, y, conf.int = TRUE) # add Hodges-Lehmann CI wilcox.test(x, y)$p.value # extract p-value wilcox.test(value ~ group, data = df) # formula syntax
Need explanation? Read on for examples and pitfalls.
What Wilcoxon does in one sentence
Wilcoxon tests compare distributions using RANKS instead of raw values, so they do not assume normality. Signed-rank uses absolute differences from a center; rank-sum (Mann-Whitney U) uses the combined ranks of two samples to test whether one tends to produce larger values.
Both are non-parametric alternatives to the t-test. Use them when sample sizes are small AND data are non-normal. For larger samples, the t-test is robust enough that Wilcoxon's slight power loss is unnecessary.
Syntax
wilcox.test(x, y) runs rank-sum; wilcox.test(x, y, paired = TRUE) runs signed-rank on differences.
The W statistic and p-value tell you whether x tends to produce different values than y.
mood.test.Five common patterns
1. Two-sample (Mann-Whitney U / rank-sum)
Formula syntax: wilcox.test(value ~ group, data = df). Tests whether value differs by group.
2. Paired / repeated measures
paired = TRUE runs signed-rank on the differences. The signed-rank looks at WITHIN-subject changes, which is more powerful when measurements are correlated.
3. One-sample signed-rank
wilcox.test(x, mu = 100) tests whether x's median is 100. Non-parametric alternative to one-sample t-test.
4. Hodges-Lehmann confidence interval
conf.int = TRUE returns the Hodges-Lehmann CI for the median difference. More informative than just a p-value.
5. One-sided Wilcoxon
Direction must be specified BEFORE seeing data. Post-hoc directional tests inflate false-positive rates.
Wilcoxon vs t-test comparison
The choice between Wilcoxon and t-test depends on data shape and sample size. Both test similar things but with different assumptions.
| Property | t-test | Wilcoxon |
|---|---|---|
| Normality assumption | Yes (or large N) | No |
| Outlier sensitivity | High | Low (uses ranks) |
| Power on truly normal data | Higher | Slightly lower |
| Power on non-normal data | Lower | Higher |
| Returns | Mean difference + CI | Location shift + CI |
| Best for small N | If normal | If non-normal |
When to use which:
- Use t-test when normality is plausible OR N >= 30.
- Use Wilcoxon for small N with non-normal data or outliers.
- For ordinal data, Wilcoxon is the right choice (t-test makes no sense on ranks).
Common pitfalls
Pitfall 1: confusing signed-rank with rank-sum. Signed-rank is for PAIRED data (same subjects, two measurements). Rank-sum (Mann-Whitney U) is for INDEPENDENT samples (different subjects). Specify paired = TRUE when paired.
Pitfall 2: claiming "Wilcoxon compares medians". Strictly, it tests location shift assuming both distributions have the same shape. With different shapes, the result depends on the entire distribution, not just medians.
coin package give better p-values: coin::wilcox_test(value ~ group, data = df).Pitfall 3: using Wilcoxon when t-test would work. With sample size 50+ and roughly symmetric data, the t-test is more powerful. Reach for Wilcoxon only when t-test assumptions clearly fail.
Try it yourself
Try it: Use Wilcoxon to compare mpg between automatic (am=0) and manual (am=1) cars in mtcars. Save to ex_test.
Click to reveal solution
Explanation: The Mann-Whitney U test on mpg by am gives W = 42, p = 0.0019. Strong evidence that the distribution of mpg differs between automatic and manual cars. Manual cars (am=1) tend to have higher mpg.
Related tests
After mastering Wilcoxon, look at:
t.test(): parametric counterpart for normal datakruskal.test(): extension to 3+ groups (non-parametric ANOVA)friedman.test(): 3+ paired groups (non-parametric repeated-measures ANOVA)coin::wilcox_test(): exact Wilcoxon with permutationmood.test(): tests for scale (variance) differencesks.test(): Kolmogorov-Smirnov for whole distribution comparison
For effect size after Wilcoxon, the rank-biserial correlation r = (1 - 2*W/(n1*n2)) works for the rank-sum case.
FAQ
What is the difference between Wilcoxon signed-rank and rank-sum tests?
Signed-rank is for PAIRED data: same subjects, two measurements. It tests differences within pairs. Rank-sum (Mann-Whitney U) is for INDEPENDENT samples: two different groups. It tests whether one group tends to produce higher values.
When should I use Wilcoxon instead of t-test in R?
Use Wilcoxon when sample sizes are small AND the data are non-normal (skewed, heavy-tailed, or with outliers). For sample sizes 30+, the t-test is robust enough that Wilcoxon's advantage shrinks.
How do I run Mann-Whitney U test in R?
wilcox.test(x, y) (without paired = TRUE) IS the Mann-Whitney U test. R uses the equivalent rank-sum formulation. Both names refer to the same test; "Mann-Whitney U" is more common in non-statistics fields.
Does Wilcoxon test compare medians?
Not strictly. It tests for a LOCATION SHIFT, assuming both distributions have the same shape. If shapes differ, the test detects ANY location difference, not specifically medians. To formally compare medians, use a quantile regression or median test.
How do I get a p-value from wilcox.test in R?
result <- wilcox.test(x, y); result$p.value. Other useful fields: $statistic (W or V), $conf.int (with conf.int = TRUE), $estimate (Hodges-Lehmann shift estimate).