Non-Parametric Test Picker
Non-parametric tests (Mann-Whitney, Wilcoxon signed-rank, Kruskal-Wallis, sign test) compare groups when your data are skewed, ordinal, or otherwise not normal. Paste raw values, pick the right test for your design, and get the test statistic, p-value, rank-based effect size, and a Hodges-Lehmann confidence interval.
New to rank tests? Read the 4-min primer ▾
Why rank-based tests. When the residuals are not normal, ordinal, or full of ties and outliers, the t-test and ANOVA can mis-state the p-value. Rank tests replace the raw values with their ranks, so the test only depends on the order of the observations. That makes them robust to skew and outliers, at the cost of a small efficiency loss when the data really are normal.
Picking the right one. Two independent groups of continuous-or-ordinal data: Mann-Whitney U. One group, or paired data, of symmetric continuous-or-ordinal data: Wilcoxon signed-rank. Three or more independent groups: Kruskal-Wallis. Heavy ties or direction-only data (better/worse, +/-): the Sign test, which only counts how many differences fall on each side of zero.
What gets reported. A test statistic (U / W / H), a p-value, an effect size (rank-biserial correlation or Cliff's delta for two groups, epsilon-squared for Kruskal-Wallis), and a Hodges-Lehmann estimate of the location shift with a confidence interval. The HL estimate is the median of all pairwise (or Walsh-average) differences and is the rank-based analogue of a mean difference.
What rank tests do not test. They are not strictly tests of medians unless the two distributions have the same shape. More precisely, Mann-Whitney tests whether one distribution is stochastically larger; Kruskal-Wallis tests whether at least one group is stochastically larger than another. Reject means “the groups differ in location somewhere,” not “the medians differ by exactly X.”
Try a real-world example to load.
Recap
Read more The rank-test math, end to end
rank(x, ties.method="average") does and what every test below assumes.Caveats When a rank test is the wrong tool
- If you have…
- Use instead
- Two normal-looking groups, large n
- The Welch t-test is more powerful and the difference of means is easier to interpret. See the t-Test Calculator.
- Three or more groups with normal residuals
- Use one-way ANOVA + Tukey HSD. Kruskal-Wallis loses about 5% efficiency on truly normal data.
- Counts or proportions
- Chi-squared or Fisher\u0027s exact for tables; Poisson regression for rates. Rank tests work but are less informative than the count-based GLM.
- Repeated-measures with three or more conditions
- Friedman\u0027s test (rank-based, blocked) or a mixed-effects ANOVA on transformed data.
- A clear monotone but non-linear relationship
- Spearman or Kendall rank correlation, which directly measures monotone association rather than a location shift.
- Heavily tied data with only a direction (better/worse)
- The Sign test is the right primitive; the signed-rank assumption of symmetric differences is doubtful.
- Wilcoxon, Mann-Whitney & Kruskal-Wallis in R – the long-form companion: assumptions, R syntax, follow-up tests.
- When to use nonparametric tests – the decision tree: normality, outliers, ordinal data.
- Mann-Whitney U test in R – the deep dive on U, the normal approximation, and exact p-values.
- Wilcoxon signed-rank test in R – one-sample and paired-sample workflows.
- Kruskal-Wallis test in R – multi-group rank ANOVA + Dunn post-hoc.
- t-Test Calculator – the parametric counterpart for two means.
Math: midrank for ties; large-sample normal approximation with optional continuity correction; Kruskal-Wallis tie-corrected; Hodges-Lehmann from sorted Walsh averages / pairwise differences; chi-squared and normal CDFs from the same series used elsewhere in this toolkit.