Rr‑statistics.co

t-Test Calculator

A t-test asks a simple question: do two groups (or one group versus a known value) have meaningfully different averages, or could the gap be random chance? Drop in your raw data, or just the means and standard deviations, to get a clear verdict, the size of the difference (Cohen's d), and a confidence interval.

i New to t-tests? Read the 4-min primer

What a t-test answers. A t-test compares one or two means against a reference. Either a single sample mean against a hypothesised value μ₀, or one group’s mean against another group’s mean. The output is a t statistic, its degrees of freedom, and a p-value that says how surprised the null hypothesis would be by the data you collected.

How to read t, df, p, and the CI. The t statistic is the gap between the means divided by its standard error: how many standard errors away from the null are we? df measures information available; with small df the t distribution has fatter tails and the same t produces a larger p. The 95% CI for the difference is the range of mean differences the data are compatible with; if it excludes zero, you reject H₀ at α = 0.05.

One-sample, paired, or two-sample? One value vs one number: one-sample. One value before vs after on the same units: paired. Two independent groups: two-sample. Paired and one-sample share the same math: paired = one-sample on the within-pair differences, with df = n − 1.

Welch vs pooled. Pooled assumes both groups have the same SD. Welch (the R default) drops that assumption and lets each group keep its own SD; it adjusts df with the Satterthwaite formula. Use Welch unless you have a strong reason to believe the variances really are equal. Welch is robust when SDs differ; pooled is slightly more powerful when they really are equal.

5 modes · raw or summary · one-sample · Welch · pooled · paired · Runs in your browser

Try a real-world example to load.

📝 Scenario

t / df / p
·
Recap
R code RUNNABLE
R Reproduce in R

        
Sampling distribution INTERACTIVE
Inference

Read more The t-test math, end to end
t = (x̄ − μ₀) / (s / √n) df = n − 1
One-sample. Standard error is the sample SD divided by √n. Subtract the null mean from the sample mean and scale by SE. Compare against the t distribution with n − 1 df.
t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂) df = (s₁²/n₁ + s₂²/n₂)² / [(s₁²/n₁)²/(n₁−1) + (s₂²/n₂)²/(n₂−1)]
Welch / Satterthwaite. Each group keeps its own variance. The standard error is the square root of the sum of variance / n for each group. The Satterthwaite df is the usual fractional df: it lands between min(n₁,n₂)−1 and n₁+n₂−2 depending on how much the variances differ.
sₓ² = ((n₁−1)s₁² + (n₂−1)s₂²) / (n₁ + n₂ − 2) SE = sₓ · √(1/n₁ + 1/n₂) df = n₁ + n₂ − 2
Pooled. Equal-variance assumption: build one shared variance estimate sₓ² from both groups, then plug it into a single SE. The df is the sum of within-group df.
dᵢ = xᵢ₁ − xᵢ₂ t = d̄ / (sᶆ / √n), df = n − 1
Paired = one-sample on differences. Subtract within each pair to get a single column of differences, then test whether their mean is zero with a one-sample t. The pairing is what makes this more powerful than a two-sample test on the same data.
Cohen's d = (x̄₁ − x̄₂) / sₓ Hedges' g = d · (1 − 3/(4 · df − 1))
Cohen's d and Hedges' g. d expresses the gap between means in pooled-SD units – 0.2 small, 0.5 medium, 0.8 large by Cohen's benchmarks. Hedges' g multiplies d by the bias-correction factor J; with df > 50 the correction is < 1.5%, so g is mainly a small-sample fix.
Caveats When this is the wrong tool
If you have…
Use instead
Small n (under ~15) with clear outliers or skew
The t distribution leans on near-normality when n is small. Try a non-parametric test (Mann–Whitney for two samples, Wilcoxon for paired). See the Normality Test Picker first.
A binary outcome (success / fail)
Don't bend a t-test around proportions. Use a two-proportion z-test or chi-square; for A/B-style tests, the A/B Test Calculator handles it end-to-end.
A ratio of two means or two rates
The log-ratio with the delta method or a bootstrap is more honest than a t on the raw ratios. The Confidence Interval Calculator covers Poisson rate ratios.
Count data with low rates
Use a Poisson or negative-binomial test rather than a t-test on counts. With over-dispersion, even Welch will mis-state the SE.
Three or more groups
Don't fish for the lowest pairwise p-value. Run an ANOVA omnibus first, then follow up with a multiple-testing correction; the Multiple Testing Correction tool handles BH and Holm.
Heavy-tailed continuous outcome where you want a p but not the parametric assumption
A bootstrap p-value is robust to non-normality without leaving the means-difference framing.
Further reading

Math: central-t CDF via regularised incomplete beta; Satterthwaite df for Welch; pooled-SD Cohen's d; Hedges' J small-sample correction; Tukey 1.5·IQR for the outlier flag.