t-Test Calculator
A t-test asks a simple question: do two groups (or one group versus a known value) have meaningfully different averages, or could the gap be random chance? Drop in your raw data, or just the means and standard deviations, to get a clear verdict, the size of the difference (Cohen's d), and a confidence interval.
New to t-tests? Read the 4-min primer ▾
What a t-test answers. A t-test compares one or two means against a reference. Either a single sample mean against a hypothesised value μ₀, or one group’s mean against another group’s mean. The output is a t statistic, its degrees of freedom, and a p-value that says how surprised the null hypothesis would be by the data you collected.
How to read t, df, p, and the CI. The t statistic is the gap between the means divided by its standard error: how many standard errors away from the null are we? df measures information available; with small df the t distribution has fatter tails and the same t produces a larger p. The 95% CI for the difference is the range of mean differences the data are compatible with; if it excludes zero, you reject H₀ at α = 0.05.
One-sample, paired, or two-sample? One value vs one number: one-sample. One value before vs after on the same units: paired. Two independent groups: two-sample. Paired and one-sample share the same math: paired = one-sample on the within-pair differences, with df = n − 1.
Welch vs pooled. Pooled assumes both groups have the same SD. Welch (the R default) drops that assumption and lets each group keep its own SD; it adjusts df with the Satterthwaite formula. Use Welch unless you have a strong reason to believe the variances really are equal. Welch is robust when SDs differ; pooled is slightly more powerful when they really are equal.
Try a real-world example to load.
Recap
Read more The t-test math, end to end
Caveats When this is the wrong tool
- If you have…
- Use instead
- Small n (under ~15) with clear outliers or skew
- The t distribution leans on near-normality when n is small. Try a non-parametric test (Mann–Whitney for two samples, Wilcoxon for paired). See the Normality Test Picker first.
- A binary outcome (success / fail)
- Don't bend a t-test around proportions. Use a two-proportion z-test or chi-square; for A/B-style tests, the A/B Test Calculator handles it end-to-end.
- A ratio of two means or two rates
- The log-ratio with the delta method or a bootstrap is more honest than a t on the raw ratios. The Confidence Interval Calculator covers Poisson rate ratios.
- Count data with low rates
- Use a Poisson or negative-binomial test rather than a t-test on counts. With over-dispersion, even Welch will mis-state the SE.
- Three or more groups
- Don't fish for the lowest pairwise p-value. Run an ANOVA omnibus first, then follow up with a multiple-testing correction; the Multiple Testing Correction tool handles BH and Holm.
- Heavy-tailed continuous outcome where you want a p but not the parametric assumption
- A bootstrap p-value is robust to non-normality without leaving the means-difference framing.
- t-Tests in R – the long-form tutorial: assumptions, diagnostics, paired vs independent, every base-R idiom.
- Confidence intervals in R – the CI-flavoured companion to a t-test, with the same SE math.
- Statistical power analysis – pick n before you run the test; a t-test with low power and a non-significant p tells you almost nothing.
- Effect-size converter – turn d into r, OR, CLES, or NNT and back.
- Power analysis tool – the in-browser companion that computes n / d / power for the same designs.
Math: central-t CDF via regularised incomplete beta; Satterthwaite df for Welch; pooled-SD Cohen's d; Hedges' J small-sample correction; Tukey 1.5·IQR for the outlier flag.