yardstick ppv() in R: Prevalence-Adjusted Predictive Value

The yardstick ppv() function in R computes positive predictive value, the share of predicted positives that are truly positive, with an optional prevalence argument that lets you reweight the score for a screening population whose base rate differs from your test data.

By Selva Prabhakaran · Published May 22, 2026 · Last updated May 22, 2026

⚡ Quick Answer

ppv(df, truth, estimate)                                  # basic two-class call
ppv(df, truth = obs, estimate = pred)                     # named arguments
ppv(df, class, .pred_class, prevalence = 0.01)            # adjust for 1% prevalence
df |> group_by(fold) |> ppv(class, .pred_class)           # by resample
ppv(df, class, .pred_class, estimator = "macro")          # multiclass macro
ppv(df, class, .pred_class, event_level = "second")       # flip positive class
ppv_vec(truth_vec, pred_vec, prevalence = 0.05)           # vector interface

Need explanation? Read on for examples and pitfalls.

📊 Is ppv() the right tool?

What ppv() measures

ppv() answers the diagnostic question: when the model says positive, how likely is the case truly positive? The function takes a data frame with observed and predicted class columns and returns a one-row tibble with .metric, .estimator, and .estimate. The estimate is true positives over true positives plus false positives.

Positive predictive value is the metric of choice in clinical screening, fraud triage, and any setting where a positive flag triggers expensive follow-up. Unlike accuracy, ppv() ignores the negative class. Unlike sensitivity, it tracks the user-facing meaning of a positive result.

The arithmetic of ppv() matches precision() on two-class data. The reason yardstick ships both names is the audience: ML teams reach for precision(), while clinicians and epidemiologists prefer ppv() because it exposes a prevalence argument that precision() does not.

Key Insight

PPV depends on prevalence, not just on the model. A test with 99 percent sensitivity and 99 percent specificity has a PPV of only 50 percent when the disease affects 1 percent of the population. yardstick lets you compute that adjusted score by passing the prevalence argument.

ppv() syntax and arguments

The signature matches the yardstick class-metric family, with one extra argument: prevalence.

Run live

Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.

Rppv generic signature

ppv(data, truth, estimate, prevalence = NULL, estimator = NULL, na_rm = TRUE, case_weights = NULL, event_level = "first", ...)

Argument	Description
`data`	A data frame with truth and estimate columns.
`truth`	Unquoted column name of observed class labels (a factor).
`estimate`	Unquoted column name of predicted class labels (factor with matching levels).
`prevalence`	Number between 0 and 1. Reweights using sens and spec from the data plus this prevalence. When `NULL`, ppv() equals precision() on two-class data.
`estimator`	Multiclass mode: `"binary"`, `"macro"`, `"macro_weighted"`, or `"micro"`.
`na_rm`	If `TRUE`, drop rows with missing values before scoring.
`event_level`	`"first"` or `"second"`; which factor level is positive.

Both columns must be factors with matching levels. The common error is passing class probabilities; feed the .pred_class column from parsnip::augment(), not .pred_<level>.

Score classifiers: four worked examples

The examples build a two-class prediction frame, then score it under several settings.

RTwo-class predictions tibble

library(yardstick) library(dplyr) set.seed(101) two_class <- tibble( obs = factor(sample(c("disease", "healthy"), 200, replace = TRUE, prob = c(0.30, 0.70)), levels = c("disease", "healthy")), pred = factor(sample(c("disease", "healthy"), 200, replace = TRUE, prob = c(0.28, 0.72)), levels = c("disease", "healthy")) ) head(two_class, 4) #> # A tibble: 4 x 2 #> obs pred #> <fct> <fct> #> 1 healthy healthy #> 2 healthy disease #> 3 disease healthy #> 4 healthy healthy

Example 1 calls ppv() with positional arguments. "disease" is the first factor level, so yardstick treats it as the positive class.

RTwo-class ppv score

ppv(two_class, obs, pred) #> # A tibble: 1 x 3 #> .metric .estimator .estimate #> <chr> <chr> <dbl> #> 1 ppv binary 0.339

A 0.339 estimate means roughly 34 percent of rows the model labeled disease were truly diseased. The other 66 percent are false alarms that would trigger unnecessary follow-up in a screening program.

Example 2 adjusts for a different population prevalence. The test-set prevalence is around 30 percent, but a real screening population might run closer to 1 percent.

RAdjust ppv for screening prevalence

ppv(two_class, obs, pred, prevalence = 0.01) #> # A tibble: 1 x 3 #> .metric .estimator .estimate #> <chr> <chr> <dbl> #> 1 ppv binary 0.011

At 1 percent prevalence the same classifier delivers only 1 percent PPV. The model is barely better than random when the disease is rare.

Example 3 groups scoring by resample fold. Cross-validated predictions pair with group_by() to give one score per group.

RPer-fold ppv

folded <- two_class |> mutate(fold = rep(paste0("fold", 1:5), each = 40)) folded |> group_by(fold) |> ppv(truth = obs, estimate = pred) #> # A tibble: 5 x 4 #> fold .metric .estimator .estimate #> <chr> <chr> <chr> <dbl> #> 1 fold1 ppv binary 0.273 #> 2 fold2 ppv binary 0.417 #> 3 fold3 ppv binary 0.333 #> 4 fold4 ppv binary 0.385 #> 5 fold5 ppv binary 0.273

Example 4 uses the vector interface for ad-hoc checks. ppv_vec() accepts two factors and returns a plain numeric scalar instead of a tibble.

RVector interface for quick checks

ppv_vec(two_class$obs, two_class$pred, prevalence = 0.05) #> [1] 0.05411765

Tip

Bind ppv with sensitivity and specificity for a screening report. metric_set(ppv, npv, sens, spec) builds a reusable function that returns all four diagnostic metrics from one pass over your predictions.

Adjusting PPV for population prevalence

The prevalence argument turns a test-set score into a population score. Without it, ppv() reports the PPV that holds at whatever class balance sits in your data. Pass it, and yardstick rescales using Bayes' rule: sens times prev over the sum sens times prev plus (1 minus spec) times (1 minus prev).

RShow ppv across a prevalence sweep

prev_grid <- c(0.001, 0.01, 0.05, 0.10, 0.30, 0.50) sapply(prev_grid, \(p) ppv_vec(two_class$obs, two_class$pred, prevalence = p)) #> [1] 0.001097 0.011050 0.055000 0.108000 0.300000 0.493000

Identical sens and spec, very different real-world PPV. A test that posts 90 percent PPV in a balanced lab dataset can collapse to single digits at population prevalence.

Prevalence	PPV (sens=0.99, spec=0.99)	What it means
50% (balanced study)	0.990	Lab-grade
10% (high-risk clinic)	0.917	Strong
1% (general population)	0.500	Half the positives are false
0.1% (mass screening)	0.090	One in eleven flagged is real

Report ppv() at the prevalence of the deployment population, not the training data.

ppv() is most useful next to its diagnostic siblings.

Metric	Best use case	Limitation
`ppv()`	Screening, rare-disease tests, prevalence reweighting	Needs a meaningful prevalence to interpret
`precision()`	Same arithmetic, ML-team conventions	No prevalence argument
`npv()`	Trust in negative results, rule-out tests	Hides false-negative cost
`sens()`	Find every true positive	Says nothing about false alarms
`spec()`	Avoid false alarms	Says nothing about missed positives
`bal_accuracy()`	Single imbalance-aware score	Treats both error types as equal

A full diagnostic report cites ppv, npv, sens, and spec together with the assumed prevalence stated.

Note

Coming from epidemiology textbooks? yardstick's ppv() matches the standard 2x2-table formula: TP divided by (TP plus FP). The prevalence argument implements the Bayes-theorem rescaling that most textbooks present as a separate calculation step.

Common pitfalls

Three mistakes account for most ppv() errors.

The first is passing probabilities instead of class labels. yardstick errors on numeric input, but coercing a probability column to a factor on the fly silently produces a meaningless score. Always pull .pred_class from augment(), not .pred_<level>.

RFix: use the class column, not the probability column

preds <- tibble( obs = factor(c("yes", "yes", "no", "no"), levels = c("yes", "no")), .pred_yes = c(0.9, 0.4, 0.3, 0.2), .pred_class = factor(c("yes", "no", "no", "no"), levels = c("yes", "no")) ) ppv(preds, obs, .pred_class) #> # A tibble: 1 x 3 #> .metric .estimator .estimate #> <chr> <chr> <dbl> #> 1 ppv binary 1

The second is forgetting that prevalence is the population prevalence, not the test-set prevalence. Passing the training-data prevalence is a no-op.

The third is reporting ppv() with no prevalence stated. A 0.85 PPV with no context is ambiguous; quote the prevalence next to the score, or note that the score uses the empirical class balance.

Warning

A single confident prediction can hit 1.0 ppv. If the model labels only one row positive and that row is truly positive, ppv equals 1 regardless of how many positives it missed. Always pair ppv with sens() and inspect conf_mat() before celebrating.

Try it yourself

Try it: Use the built-in two_class_example data from yardstick. Compute the default ppv score, then compute the prevalence-adjusted score assuming a 2 percent population prevalence. Save the second result to ex_ppv_adj.

RYour turn: adjust ppv for low prevalence

library(yardstick) data("two_class_example") # Try it: ppv at 2 percent prevalence ex_ppv_adj <- # your code here ex_ppv_adj #> Expected: one row, .estimator binary, .estimate near 0.094

Click to reveal solution

RSolution

library(dplyr) ex_ppv_adj <- two_class_example |> ppv(truth = truth, estimate = predicted, prevalence = 0.02) ex_ppv_adj #> # A tibble: 1 x 3 #> .metric .estimator .estimate #> <chr> <chr> <dbl> #> 1 ppv binary 0.0944

Explanation: The unadjusted ppv on two_class_example is around 0.82 because the test set is roughly balanced. Setting prevalence = 0.02 rescales using Bayes' rule, dropping the score to about 0.09, the PPV a clinician would expect in a low-prevalence screening population.

ppv() is one entry in the yardstick diagnostic-metric family. Reach for these neighbors when ppv alone is not enough:

npv() for the symmetric partner: negative predictive value
sens() and spec() for the true-positive and true-negative rates
bal_accuracy() averages per-class recall, robust to imbalance
precision() and recall() for the ML-team naming of the same arithmetic
kap() and mcc() for chance-corrected agreement scores
conf_mat() to see the full confusion matrix behind every score

For the full set, see the yardstick reference index.

FAQ

Is ppv() the same as precision() in R?

Arithmetically yes, in two-class problems. Both compute true positives over true positives plus false positives. The names exist for different audiences: precision() is the ML convention, ppv() is the clinical convention. The functional difference is that ppv() accepts a prevalence argument for Bayes-rule rescaling. Without that argument the two return identical scores.

What is the difference between ppv and sensitivity?

Sensitivity asks how many actual positives the model caught. ppv asks how many of the model's positive predictions were correct. On a rare disease a model can post 99 percent sensitivity and 5 percent ppv at the same time. Report both, never just one.

Why does ppv() need a prevalence argument when precision() does not?

ppv() targets clinical reporting where the deployment-population prevalence often differs from the test-set prevalence. The argument lets you compute the PPV a clinician sees in production using sens and spec from your data plus a stated prevalence. precision() omits the control because ML usually reports the test-set score directly.

Can ppv() handle multiclass classification?

Yes. yardstick defaults to estimator = "macro" on multiclass data, computing ppv per class against the rest and taking an unweighted mean. Pass "macro_weighted" or "micro" for other rules. The prevalence argument is only honored in two-class mode.

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

yardstick ppv() in R: Prevalence-Adjusted Predictive Value

What ppv() measures

ppv() syntax and arguments

Score classifiers: four worked examples

Adjusting PPV for population prevalence

Common pitfalls

Try it yourself

FAQ

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

yardstick ppv() in R: Prevalence-Adjusted Predictive Value

What ppv() measures

ppv() syntax and arguments

Score classifiers: four worked examples

Adjusting PPV for population prevalence

ppv compared with precision and related metrics

Common pitfalls

Try it yourself

Related yardstick metrics

FAQ