What is the difference between Type I and Type II error?

Type I error (alpha) is rejecting a true null hypothesis, a false positive. Type II error (beta) is failing to reject a false null, a false negative. Power is 1 - beta. They trade off: lowering alpha (stricter test) increases beta (lower power). The visualizer shows both distributions and how they shift as you change alpha, n, or effect size.

How does sample size affect Type II error?

Larger n shrinks the standard error of the test statistic, separating the null and alternative distributions and reducing the overlap region. Type II error falls as n grows. Doubling n roughly halves beta when other parameters are fixed. The visualizer makes this explicit, drag the n slider and watch beta drop.

Why is alpha conventionally set to 0.05?

It is a historical convention from R.A. Fisher, not a theoretical optimum. Choose alpha based on the cost of a false positive in your domain: 0.001 for high-stakes drug trials, 0.05 for most science, 0.1 for exploratory or pilot work. Pre-register the choice; don't shop for an alpha that flatters your result.

Type I / II Error Visualizer

Every statistical test has two error types: rejecting a true null hypothesis (Type I, alpha) or missing a real effect (Type II, beta). Their trade-off is hard to picture from formulas. Drag the sliders to watch two sampling distributions slide apart and see how power, alpha, and effect size move together.

6 test types · alpha + beta + critical value live · Runs in your browser

Try a real-world example to load.

Output

alpha (set)

0.0500

Type I error rate

beta (computed)

Type II error rate

power = 1 - beta

prob. of detecting the effect

critical value

decision threshold

How we got there

Null vs alternative INTERACTIVE

Null distribution (gray, dashed) and alternative (blue). Yellow tails are alpha. Red hatched region is beta.

Truth table (cells sized by probability)

H0 true

H1 true

Retain H0

Correct retention

1 - alpha

Type II error

beta

Reject H0

Type I error

alpha

Correct rejection

power

Power vs n

Power vs effect

Power vs alpha

Reproduce in R RUNNABLE

R Reproduce in R

Inference

We plotted the null and alternative distributions and shaded the regions where Type I (false positive) and Type II (false negative) errors occur.

Read more Anatomy of the two distributions

Test statistic ~ Null distribution under H0 ~ Alternative (noncentral) under H1

The two curves. Under H0, your test statistic follows a known reference (z, t with df, F with df1/df2, chi-square with df). Under H1, that distribution shifts; for t and F it picks up a noncentrality parameter (ncp) that grows with effect size and sample size. The two-sample t-test uses a noncentral t with df = 2n - 2 and ncp = d * sqrt(n/2).

critical = q_{1 - alpha/2}(null) (two-sided) critical = q_{1 - alpha} (null) (one-sided)

The critical value. Pick alpha. Look up the quantile of the null distribution. Anything past that quantile (in absolute value, for two-sided) is "significant". The chosen alpha pins down the area in the tails of the gray curve.

beta = P(|stat| < critical | H1 is true) power = 1 - beta

Beta and power. Plug the same critical value into the alternative (blue) distribution. The probability mass on the wrong side is beta. Power is everything else. Beta + power = 1, always.

d (Cohen) = (mu1 - mu0) / sigma h (Cohen) = 2 * (asin(sqrt(p1)) - asin(sqrt(p0))) f (Cohen) = sigma_means / sigma_within r = Pearson correlation

Standardized effect sizes. Each test family uses a different effect-size metric. d for means, h for proportions (variance-stabilized arcsine), f for ANOVA group separation, r for correlation. Standardization lets the same noncentrality story work across designs.

Pin any 3 of {effect, n, alpha, power} and the fourth is determined.

The four-way relationship. Larger effect or larger n shifts the alternative further from the null and shrinks beta. A stricter alpha pulls the critical value outward and inflates beta. Looser alpha shrinks beta but inflates Type I risk. There is no free lunch; the visualizer makes the trade explicit.

Caveats When this is the wrong tool

If you want to…: Use instead
Solve for the n that hits a target power: The Power Analysis tool. This page is the picture; that one is the calculator.
Check equivalence (TOST), not difference: Equivalence framing has different shading (two one-sided tests). Coming as a separate equivalence tool.
Use a Bayesian framing: Bayesian posterior overlap is a different framing; alpha and beta are frequentist constructs. Try a brms or rstanarm tutorial.
Plan a sequential / group-sequential trial: Sequential boundaries reshape the rejection region; out of scope here. See the A/B Test Calculator's sequential tab.
Compute "observed power" from a finished study: Don't. Post-hoc power is a one-to-one transform of the p-value (Hoenig & Heisey, 2001) and not informative.