Rr‑statistics.co

Type I / II Error Visualizer

Every statistical test has two error types: rejecting a true null hypothesis (Type I, alpha) or missing a real effect (Type II, beta). Their trade-off is hard to picture from formulas. Drag the sliders to watch two sampling distributions slide apart and see how power, alpha, and effect size move together.

i New to alpha and beta? Read the 4-min primer

The two errors. A Type I error (false positive) is rejecting the null hypothesis when it is actually true; its rate is alpha and you set it. A Type II error (false negative) is failing to reject the null when an effect really exists; its rate is beta and you compute it. Power equals 1 minus beta: the probability your test catches a real effect.

How to read the picture. The dashed gray curve is the sampling distribution of the test statistic when the null is true. The solid blue curve is the sampling distribution when the alternative is true. The yellow shaded tails on the gray curve add up to alpha. The red hatched region under the blue curve, on the wrong side of the critical value, is beta.

Why dragging teaches. Move the effect slider; the blue curve shifts away from the gray, beta shrinks, power grows. Move n; both curves narrow, the overlap shrinks, power grows. Move alpha; the critical line slides, the red beta region resizes in the opposite direction. The four quantities (effect, n, alpha, power) form a tight quartet: pin any three, the fourth is determined.

When this is the wrong tool. If you want to solve for the sample size that hits a target power, use the Power Analysis calculator. This page is the picture; that page is the answer.

6 test types · alpha + beta + critical value live · Runs in your browser

Try a real-world example to load.

alpha (set)
0.0500
Type I error rate
beta (computed)
-
Type II error rate
power = 1 - beta
-
prob. of detecting the effect
critical value
-
decision threshold
How we got there
Null vs alternative INTERACTIVE
Null distribution (gray, dashed) and alternative (blue). Yellow tails are alpha. Red hatched region is beta.
Truth table (cells sized by probability)
H0 true
H1 true
Retain H0
Correct retention
-
1 - alpha
Type II error
-
beta
Reject H0
Type I error
-
alpha
Correct rejection
-
power
Power vs n
Power vs effect
Power vs alpha
Reproduce in R RUNNABLE
R Reproduce in R

    
Inference

Read more Anatomy of the two distributions
Test statistic ~ Null distribution under H0 ~ Alternative (noncentral) under H1
The two curves. Under H0, your test statistic follows a known reference (z, t with df, F with df1/df2, chi-square with df). Under H1, that distribution shifts; for t and F it picks up a noncentrality parameter (ncp) that grows with effect size and sample size. The two-sample t-test uses a noncentral t with df = 2n - 2 and ncp = d * sqrt(n/2).
critical = q_{1 - alpha/2}(null) (two-sided) critical = q_{1 - alpha} (null) (one-sided)
The critical value. Pick alpha. Look up the quantile of the null distribution. Anything past that quantile (in absolute value, for two-sided) is "significant". The chosen alpha pins down the area in the tails of the gray curve.
beta = P(|stat| < critical | H1 is true) power = 1 - beta
Beta and power. Plug the same critical value into the alternative (blue) distribution. The probability mass on the wrong side is beta. Power is everything else. Beta + power = 1, always.
d (Cohen) = (mu1 - mu0) / sigma h (Cohen) = 2 * (asin(sqrt(p1)) - asin(sqrt(p0))) f (Cohen) = sigma_means / sigma_within r = Pearson correlation
Standardized effect sizes. Each test family uses a different effect-size metric. d for means, h for proportions (variance-stabilized arcsine), f for ANOVA group separation, r for correlation. Standardization lets the same noncentrality story work across designs.
Pin any 3 of {effect, n, alpha, power} and the fourth is determined.
The four-way relationship. Larger effect or larger n shifts the alternative further from the null and shrinks beta. A stricter alpha pulls the critical value outward and inflates beta. Looser alpha shrinks beta but inflates Type I risk. There is no free lunch; the visualizer makes the trade explicit.
Caveats When this is the wrong tool
If you want to…
Use instead
Solve for the n that hits a target power
The Power Analysis tool. This page is the picture; that one is the calculator.
Check equivalence (TOST), not difference
Equivalence framing has different shading (two one-sided tests). Coming as a separate equivalence tool.
Use a Bayesian framing
Bayesian posterior overlap is a different framing; alpha and beta are frequentist constructs. Try a brms or rstanarm tutorial.
Plan a sequential / group-sequential trial
Sequential boundaries reshape the rejection region; out of scope here. See the A/B Test Calculator's sequential tab.
Compute "observed power" from a finished study
Don't. Post-hoc power is a one-to-one transform of the p-value (Hoenig & Heisey, 2001) and not informative.
Further reading

Numerical accuracy: noncentral t uses Cornish-Fisher; noncentral F and chi-square use Poisson-mixture series; results match R's pwr package to ~3 decimals across the calibrated test cases.