Rr‑statistics.co

Non-Parametric Test Picker

Non-parametric tests (Mann-Whitney, Wilcoxon signed-rank, Kruskal-Wallis, sign test) compare groups when your data are skewed, ordinal, or otherwise not normal. Paste raw values, pick the right test for your design, and get the test statistic, p-value, rank-based effect size, and a Hodges-Lehmann confidence interval.

i New to rank tests? Read the 4-min primer

Why rank-based tests. When the residuals are not normal, ordinal, or full of ties and outliers, the t-test and ANOVA can mis-state the p-value. Rank tests replace the raw values with their ranks, so the test only depends on the order of the observations. That makes them robust to skew and outliers, at the cost of a small efficiency loss when the data really are normal.

Picking the right one. Two independent groups of continuous-or-ordinal data: Mann-Whitney U. One group, or paired data, of symmetric continuous-or-ordinal data: Wilcoxon signed-rank. Three or more independent groups: Kruskal-Wallis. Heavy ties or direction-only data (better/worse, +/-): the Sign test, which only counts how many differences fall on each side of zero.

What gets reported. A test statistic (U / W / H), a p-value, an effect size (rank-biserial correlation or Cliff's delta for two groups, epsilon-squared for Kruskal-Wallis), and a Hodges-Lehmann estimate of the location shift with a confidence interval. The HL estimate is the median of all pairwise (or Walsh-average) differences and is the rank-based analogue of a mean difference.

What rank tests do not test. They are not strictly tests of medians unless the two distributions have the same shape. More precisely, Mann-Whitney tests whether one distribution is stochastically larger; Kruskal-Wallis tests whether at least one group is stochastically larger than another. Reject means “the groups differ in location somewhere,” not “the medians differ by exactly X.”

4 tests · raw data only · U / W / H + effect size + HL · Runs in your browser

Try a real-world example to load.

📝 Scenario

Mann-Whitney U test
U =
Recap
R code RUNNABLE
R Reproduce in R

        
Rank dot plot INTERACTIVE
Jittered dots, x = rank, colour = group
Inference

Read more The rank-test math, end to end
R(x) = midrank of x in the pooled sample ties get the average of the ranks they would have occupied
Step 1: rank. Combine all observations, sort, assign ranks 1..N. When values tie, give every tied observation the mean of the ranks they cover (the so-called midrank or average-rank rule). This is what R's rank(x, ties.method="average") does and what every test below assumes.
U₁ = R₁ − n₁(n₁+1)/2 Z = (U − n₁n₂/2) / √(n₁n₂(N+1)/12 − tie correction)
Mann-Whitney U. Sum the ranks in group 1 (call it R₁) and subtract the minimum rank sum that group could have. The result U₁ is small when group 1 has the small values and large when it has the large values. For n > 20 per group, the standardised statistic Z is approximately N(0,1); we apply a 0.5 continuity correction to the numerator unless you turn it off. The tie correction subtracts a term proportional to Σ(t³-t)/12 to keep the variance honest.
dᵢ = xᵢ − yᵢ (drop dᵢ = 0) W⁺ = Σ rank(|dᵢ|) for dᵢ > 0 Z = (W⁺ − n(n+1)/4) / √(n(n+1)(2n+1)/24 − tie correction)
Wilcoxon signed-rank. Compute the differences (or x − μ₀ for a one-sample test), drop zero differences, rank the absolute differences, then sum the ranks of the positive differences. The expected sum under H₀ is n(n+1)/4. Standardise as Z and read p from the normal distribution. The signed-rank test assumes the differences are symmetric about their median; if they are not, prefer the sign test.
H = (12 / (N(N+1))) · Σ Rₒ²/nₒ − 3(N+1) H′ = H / (1 − Σ(t³ − t) / (N³ − N)) p = P(χ²_(k-1) > H′)
Kruskal-Wallis. Pool, rank, then for each of k groups compute the sum of ranks Rₒ and divide by the group size. The H statistic measures how much the group rank-means differ from the overall rank-mean (N+1)/2. Apply the tie correction H/(1−…) and refer to a chi-squared with k−1 degrees of freedom. Significant H means at least one group is stochastically larger; follow up with pairwise MWU + Bonferroni if you need to know which.
HL = median{ (xᵢ + xⱼ) / 2 : i <= j } (paired / one-sample) HL = median{ xᵢ − yⱼ : i, j } (two-sample) CI₋₃: order the Walsh averages, take indices set by the Wilcoxon critical value
Hodges-Lehmann. The HL point estimate is the median of all pairwise differences (or Walsh averages for paired data). It is the rank-based analogue of a mean difference and is robust to outliers. The 95% CI is read off the same sorted vector at indices set by the Wilcoxon distribution's 2.5% and 97.5% quantiles, which keeps the CI consistent with the test it accompanies.
r₋₋ = 1 − 2U / (n₁n₂) (rank-biserial) δ = (#{x>y} − #{x<y}) / (n₁n₂) (Cliff's δ) ε² = H / (N − 1) (Kruskal-Wallis effect size)
Effect sizes. The rank-biserial correlation r₋₋ runs from −1 to +1; |0.10| is small, |0.30| medium, |0.50| large by the usual Cohen-style benchmarks. Cliff's δ is identical numerically when there are no ties. Epsilon-squared is the Kruskal-Wallis analogue of η² and tells you the share of rank variance explained by the grouping; cap it at 1.
Caveats When a rank test is the wrong tool
If you have…
Use instead
Two normal-looking groups, large n
The Welch t-test is more powerful and the difference of means is easier to interpret. See the t-Test Calculator.
Three or more groups with normal residuals
Use one-way ANOVA + Tukey HSD. Kruskal-Wallis loses about 5% efficiency on truly normal data.
Counts or proportions
Chi-squared or Fisher\u0027s exact for tables; Poisson regression for rates. Rank tests work but are less informative than the count-based GLM.
Repeated-measures with three or more conditions
Friedman\u0027s test (rank-based, blocked) or a mixed-effects ANOVA on transformed data.
A clear monotone but non-linear relationship
Spearman or Kendall rank correlation, which directly measures monotone association rather than a location shift.
Heavily tied data with only a direction (better/worse)
The Sign test is the right primitive; the signed-rank assumption of symmetric differences is doubtful.
Further reading

Math: midrank for ties; large-sample normal approximation with optional continuity correction; Kruskal-Wallis tie-corrected; Hodges-Lehmann from sorted Walsh averages / pairwise differences; chi-squared and normal CDFs from the same series used elsewhere in this toolkit.