When should I use Mann-Whitney instead of a t-test?

Use Mann-Whitney (Wilcoxon rank-sum) when the data are ordinal, heavily skewed, or have outliers that would distort means. It tests whether one distribution stochastically dominates the other. The cost is power, when residuals truly are normal, t-test is more powerful. With n above 30 per group, the difference is small.

What is the Wilcoxon signed-rank test?

It is the non-parametric counterpart of the paired t-test. Use it when you have paired or matched observations (before/after, twins, matched cases) and the differences are not normally distributed. It tests whether the median of the differences is zero. Sign test is even more robust but loses power.

When should I use Kruskal-Wallis instead of ANOVA?

Kruskal-Wallis tests whether three or more groups come from the same distribution, using ranks instead of means. Use it when ANOVA assumptions (normality, equal variance) are clearly violated. Follow up significant results with pairwise Wilcoxon tests with multiple-testing correction; R's pairwise.wilcox.test does this automatically.

Non-Parametric Test Picker

Non-parametric tests (Mann-Whitney, Wilcoxon signed-rank, Kruskal-Wallis, sign test) compare groups when your data are skewed, ordinal, or otherwise not normal. Paste raw values, pick the right test for your design, and get the test statistic, p-value, rank-based effect size, and a Hodges-Lehmann confidence interval.

4 tests · raw data only · U / W / H + effect size + HL · Runs in your browser

Try a real-world example to load.

📝 Scenario

Context

When to use this

Use when

Two independent samples of continuous or ordinal data where you doubt the normality assumption. Tests whether one distribution is stochastically larger than the other.

e.g. response times for treated vs control patients; satisfaction scores for two product variants.

Inputs needed

g1 first group, one value per line
g2 second group, one value per line
α significance level
alt two-sided / less / greater
cc continuity correction (yes/no)

Heuristics

For small n (< 20 per group) the normal approximation is rough. With heavy ties, prefer the sign test. Effect size: rank-biserial r is the directly-comparable analogue of Cohen\u0027s d.

Output

Mann-Whitney U test

U = –

Recap

R code RUNNABLE

R Reproduce in R

Rank dot plot INTERACTIVE

Jittered dots, x = rank, colour = group

Inference

We ran the rank-based equivalent of the test you needed, since your data don't meet parametric assumptions.

Read more The rank-test math, end to end

R(x) = midrank of x in the pooled sample ties get the average of the ranks they would have occupied

Step 1: rank. Combine all observations, sort, assign ranks 1..N. When values tie, give every tied observation the mean of the ranks they cover (the so-called midrank or average-rank rule). This is what R's rank(x, ties.method="average") does and what every test below assumes.

U₁ = R₁ − n₁(n₁+1)/2 Z = (U − n₁n₂/2) / √(n₁n₂(N+1)/12 − tie correction)

Mann-Whitney U. Sum the ranks in group 1 (call it R₁) and subtract the minimum rank sum that group could have. The result U₁ is small when group 1 has the small values and large when it has the large values. For n > 20 per group, the standardised statistic Z is approximately N(0,1); we apply a 0.5 continuity correction to the numerator unless you turn it off. The tie correction subtracts a term proportional to Σ(t³-t)/12 to keep the variance honest.

dᵢ = xᵢ − yᵢ (drop dᵢ = 0) W⁺ = Σ rank(|dᵢ|) for dᵢ > 0 Z = (W⁺ − n(n+1)/4) / √(n(n+1)(2n+1)/24 − tie correction)

Wilcoxon signed-rank. Compute the differences (or x − μ₀ for a one-sample test), drop zero differences, rank the absolute differences, then sum the ranks of the positive differences. The expected sum under H₀ is n(n+1)/4. Standardise as Z and read p from the normal distribution. The signed-rank test assumes the differences are symmetric about their median; if they are not, prefer the sign test.

H = (12 / (N(N+1))) · Σ Rₒ²/nₒ − 3(N+1) H′ = H / (1 − Σ(t³ − t) / (N³ − N)) p = P(χ²_(k-1) > H′)

Kruskal-Wallis. Pool, rank, then for each of k groups compute the sum of ranks Rₒ and divide by the group size. The H statistic measures how much the group rank-means differ from the overall rank-mean (N+1)/2. Apply the tie correction H/(1−…) and refer to a chi-squared with k−1 degrees of freedom. Significant H means at least one group is stochastically larger; follow up with pairwise MWU + Bonferroni if you need to know which.

HL = median{ (xᵢ + xⱼ) / 2 : i <= j } (paired / one-sample) HL = median{ xᵢ − yⱼ : i, j } (two-sample) CI₋₃: order the Walsh averages, take indices set by the Wilcoxon critical value

Hodges-Lehmann. The HL point estimate is the median of all pairwise differences (or Walsh averages for paired data). It is the rank-based analogue of a mean difference and is robust to outliers. The 95% CI is read off the same sorted vector at indices set by the Wilcoxon distribution's 2.5% and 97.5% quantiles, which keeps the CI consistent with the test it accompanies.

r₋₋ = 1 − 2U / (n₁n₂) (rank-biserial) δ = (#{x>y} − #{x<y}) / (n₁n₂) (Cliff's δ) ε² = H / (N − 1) (Kruskal-Wallis effect size)

Effect sizes. The rank-biserial correlation r₋₋ runs from −1 to +1; |0.10| is small, |0.30| medium, |0.50| large by the usual Cohen-style benchmarks. Cliff's δ is identical numerically when there are no ties. Epsilon-squared is the Kruskal-Wallis analogue of η² and tells you the share of rank variance explained by the grouping; cap it at 1.

Caveats When a rank test is the wrong tool

If you have…: Use instead
Two normal-looking groups, large n: The Welch t-test is more powerful and the difference of means is easier to interpret. See the t-Test Calculator.
Three or more groups with normal residuals: Use one-way ANOVA + Tukey HSD. Kruskal-Wallis loses about 5% efficiency on truly normal data.
Counts or proportions: Chi-squared or Fisher\u0027s exact for tables; Poisson regression for rates. Rank tests work but are less informative than the count-based GLM.
Repeated-measures with three or more conditions: Friedman\u0027s test (rank-based, blocked) or a mixed-effects ANOVA on transformed data.
A clear monotone but non-linear relationship: Spearman or Kendall rank correlation, which directly measures monotone association rather than a location shift.
Heavily tied data with only a direction (better/worse): The Sign test is the right primitive; the signed-rank assumption of symmetric differences is doubtful.