How do I calculate the power of a t-test?

Power = 1 - beta, the probability of detecting a true effect. For a two-sample t-test, power depends on effect size (Cohen's d), sample size per group, and alpha. A standard target is 0.80. The calculator inverts the relationship: given any three of {n, d, alpha, power}, it solves for the fourth, same as R's pwr::pwr.t.test.

What is the minimum detectable effect (MDE)?

MDE is the smallest true effect size your study would detect with the desired power and alpha at a given sample size. If your MDE is 0.5 SD but you only care about effects above 0.2 SD, your study is underpowered and a non-significant result is uninformative. Always compute MDE before running an experiment.

Why does my study need so many subjects to detect a small effect?

Required sample size grows with the inverse square of the effect size. Halving the effect quadruples n. To detect Cohen's d = 0.2 (small) at 80% power needs about 200 per group; d = 0.5 (medium) needs 64; d = 0.8 (large) needs only 26. Run the math first to confirm the study is feasible.

Power Analysis

Statistical power is the probability your study will detect a real effect of a given size; too little power and you waste a study. Pick a design (t-test, ANOVA, proportions, correlation, chi-square), supply three of the four key inputs (effect, alpha, power, n), and the calculator solves for the fourth.

8 designs · 3 solve modes · Cohen-style benchmarks · Runs in your browser

Try a real-world example to load.

📝 two-sample t (d=0.5)

A typical two-arm trial: continuous outcome, medium effect, 80% power, alpha 0.05.

Result

RESULT

R code RUNNABLE

R Reproduce in R

Power curve INTERACTIVE

Inference

We computed the sample size you need to detect your specified effect at the requested power and significance level.

Read more Anatomy of a power calculation

power = P( T > t_crit | H1 ) T ~ noncentral t( df, ncp ) ncp = d · sqrt( n1 · n2 / (n1 + n2) )

Noncentral t (one- and two-sample, paired, correlation). Under the alternative, the t-statistic follows a noncentral t with df from the design and a noncentrality parameter (ncp) that grows with sqrt(n) and the standardized effect. Power is the area of that noncentral distribution beyond the critical value of the central t. We use the Sankaran (1959) approximation, accurate to ~3 decimals for df ≥ 4 and faster than series methods.

power = P( F > F_crit | H1 ) F ~ noncentral F( df1, df2, ncp ) ncp_anova = f² · k · n (one-way ANOVA) ncp_reg = f² · (u + v + 1) (regression)

Noncentral F (ANOVA, regression). The omnibus F under H1 is noncentral F. The Patnaik / Poisson-mixture series we use sums central F CDFs weighted by Poisson(ncp/2) probabilities; convergence is fast around the peak. df1 is the numerator (k-1 for ANOVA, u predictors of interest for regression), df2 is the denominator (k(n-1) or n-u-1).

h = 2 · ( arcsin sqrt(p2) - arcsin sqrt(p1) ) ncp_z = |h| · sqrt(n) (two-prop, equal n per arm) ncp_z = |h| · sqrt(n) (one-prop) power = 1 - Φ( z_crit - ncp_z )

Normal approximation for proportions. Cohen's h is the variance-stabilising arcsine transform applied to each proportion, then differenced. After the transform, the test statistic is approximately normal with unit variance, so power becomes a tail probability of a shifted standard normal. Matches pwr.2p.test and pwr.p.test. Falls apart when expected counts are tiny; switch to an exact test there.

z_r = 0.5 · ln( (1+r) / (1-r) ) (Fisher z) SE = 1 / sqrt(n - 3) ncp via t = r · sqrt(df) / sqrt(1 - r²)

Correlation power. The Fisher z transform is the textbook route, but for the actual test (H0: rho = 0) the t-statistic r · sqrt(n-2) / sqrt(1-r²) is more accurate, and its noncentral-t distribution gives the closed-form power. We use that direct formulation; results match pwr.r.test to 3+ decimals.

w = sqrt( sum( (p_obs - p_exp)² / p_exp ) ) ncp_chi = w² · n power = 1 - F_chi( x_crit | df, ncp_chi )

Chi-square noncentrality. Cohen's w is the population effect size for chi-square: a normalized RMS deviation between observed and expected cell probabilities. Multiplied by n it becomes the noncentrality parameter of a chi-square with the design's df (k-1 for goodness-of-fit, (r-1)(c-1) for contingency). Power is the upper tail of that noncentral chi-square beyond the critical value.

Caveats When this is the wrong tool

If you have…: Use instead
Clustered or hierarchical data (students in classes, patients in clinics): The independence assumption fails. Inflate your n by the design effect 1 + (m-1)ρ where m is cluster size and rho is the intraclass correlation, or run a simulation against the mixed model you actually plan to fit.
Time-to-event / survival outcome: The number of events drives power, not the number of patients. Use Schoenfeld's formula on the log hazard ratio, or simulate against a Cox model with realistic censoring; a dedicated survival power tool is on the roadmap.
Bayesian a priori sample size: Frequentist power asks "what's the long-run rejection rate?". A Bayesian sample size question (precision of a posterior, Bayes-factor design) is a different paradigm; use simulation against the prior + likelihood you actually plan to use.
Pilot-study sample size: Pilots are sized for feasibility (recruitment rate, dropout, instrument validity), not for hypothesis power. Cohen's tables don't apply; aim for ~12 per arm or follow a feasibility-specific guide.
Complex models with no closed-form (mixed effects, GLMM, nested ANOVA): The closed forms here assume a single fixed effect with a known sampling distribution. For mixed models or non-trivial covariate structures, use simulation: generate datasets under H1, fit the model, count rejections.

Power Analysis

How we got there