Rr‑statistics.co

Bayes Theorem Calculator

Bayes' theorem updates what you believe when new evidence arrives. It is the math behind why a 99% accurate test for a rare disease can still produce mostly false alarms. Plug in your starting probability and how good your test is to get the updated probability, with a per-10,000 walkthrough that makes the surprise visible.

i New to Bayes' rule? Read the 4-min primer

What Bayes' rule says. Start with a belief, observe some evidence, and update. The update is mechanical: multiply your prior belief by the likelihood of the evidence under that belief, then renormalise across all the ways the evidence could have arisen. The output is the posterior probability that the belief is true given what you just saw.

The base-rate fallacy. A test that is 99% accurate sounds airtight, but if only 1 in 1,000 people have the disease, a positive result is overwhelmingly a false alarm. Most people skip the prior and treat 99% accurate as 99% chance of disease. Bayes forces the prior back into the answer, which is why low-prevalence screening intuition is so often wrong.

Medical screening intuition. Sensitivity is how often the test catches a sick person; specificity is how often it correctly clears a healthy one. The positive predictive value (PPV) is what you actually want: given a positive test, what is the chance you are sick? PPV depends on prevalence as much as on the test, which is why a great test in a rare disease still produces mostly false alarms.

Picking which mode. Use the medical screening mode if you have prevalence, sensitivity, specificity. Pick the false-positive paradox mode if you want the "out of 10,000 people" walkthrough. Spam mode if you have base rates and word likelihoods. Generic Bayes if you have raw P(D|H), P(D|~H), and a prior. Two-test chains a second test from the first's posterior.

5 modes · one engine · PPV · NPV · LR+ · LR- · mosaic chart · Runs in your browser

Try a real-world example to load.

🧬 HIV screening

A general-population HIV ELISA: prevalence about 0.1%, sensitivity 99%, specificity 95%. A random positive result - how worried should you be?

Pick a mode and enter inputs.
POSTERIOR PROBABILITY
-
enter inputs
R code RUNNABLE
R Reproduce in R

        
Mosaic of 10,000 tested INTERACTIVE
Disease columns vs. test rows. Green = true positive.
0.001
0.99
0.95
Inference

Read more Anatomy of Bayes' rule
Why low prevalence breaks intuition
A "99% accurate" test sounds decisive, but at 0.1% prevalence the false-alarm pool (5% of 9,990 healthy people = 499.5) dwarfs the true-positive pool (99% of 10 sick people = 9.9). Out of 509.4 positive results, only 9.9 are real. PPV = 9.9 / 509.4 = 1.94%. The math is simple; the intuition is the hard part.
P(H|D) = P(D|H) · P(H) / [P(D|H) · P(H) + P(D|~H) · P(~H)]
Generic Bayes. The numerator is the probability of seeing the evidence in worlds where the hypothesis is true. The denominator sums that over both worlds (true + false). Renormalising gives the posterior.
PPV = sens · prev / [sens · prev + (1-spec) · (1-prev)] NPV = spec · (1-prev) / [spec · (1-prev) + (1-sens) · prev]
Medical screening. Same skeleton, renamed: prior is prevalence, P(D|H) is sensitivity, 1 − specificity is the false-positive rate. PPV is what you want when the test came back positive; NPV when it came back negative.
LR+ = sens / (1 - spec) LR- = (1 - sens) / spec post-test odds = pre-test odds × LR posterior = odds / (1 + odds)
Likelihood ratios. Cleaner algebra: convert the prior to odds, multiply by LR, convert back. LR+ above ~10 is "strong evidence to rule in"; LR- below ~0.1 is "strong evidence to rule out". This formulation chains cleanly across multiple tests.
P(spam|word) = P(word|spam) · P(spam) / [P(word|spam) · P(spam) + P(word|ham) · P(ham)]
Spam classifier. Identical math, swapped names. Naive Bayes spam filters generalise this by assuming words are conditionally independent given the class, then multiplying many such updates together (in log space, to avoid underflow).
posterior_1 = Bayes(prior, sens_1, spec_1) posterior_2 = Bayes(posterior_1, sens_2, spec_2)
Two-test chaining. Use the first test's posterior as the prior for the second. Strictly correct only if the two tests are conditionally independent given the true state - a strong assumption that is often violated for tests measuring related features.
Caveats When this is the wrong tool
If you have...
Use instead
Bayes factors (model evidence, not single events)
Model comparison rather than belief updating. A future Bayes factor calculator covers t-tests, proportions, correlation with JZS priors.
Continuous likelihood ratios (e.g. biomarker level)
Slope of the LR with biomarker value matters; needs a different UI. Out of scope here.
Prevalence estimated from the same sample
You have a calibration / Bayesian latent-class problem; treat prevalence as uncertain rather than known.
Imperfect gold standard
Sensitivity / specificity assume the truth is known. If your "truth" is itself a test, use latent-class methods.
Two tests that are not conditionally independent
Chaining understates uncertainty. Either model the dependence explicitly or treat the pair as one combined test with empirically measured operating characteristics.
Further reading

Numerical accuracy: closed-form arithmetic. Stable down to prior = 1e-9; near the boundary the posterior is ~prior × LR+ for very low prevalence.