yardstick pr_curve() in R: Precision-Recall Curve Points
The yardstick pr_curve() function in R sweeps every probability cutoff your classifier produced and returns a tidy tibble of precision and recall pairs, which is the chart Google calls a PR curve and the diagnostic of choice when the positive class is rare.
pr_curve(df, truth, .pred_class1) # basic two-class call pr_curve(df, truth = obs, .pred_class1) # named truth column pr_curve(df, class, .pred_yes, event_level = "second") # flip positive class df |> group_by(fold) |> pr_curve(class, .pred_yes) # by resample pr_curve(df, class, .pred_a, .pred_b, .pred_c) # multiclass one-vs-all autoplot(pr_curve(df, class, .pred_yes)) # quick ggplot pr_curve(df, class, .pred_yes) |> ggplot(aes(recall, precision)) + geom_path()
Need explanation? Read on for examples and pitfalls.
What pr_curve() returns
pr_curve() converts probabilities into a precision-recall trade-off, not a single score. You give it a data frame with a factor truth column and one or more probability columns, and you get back a tibble with .threshold, recall, and precision, one row per unique cutoff. Each row reads as "if you label every observation above this probability as positive, here is the precision you keep and the recall you reach."
That tibble is the raw material for any PR chart. Pipe it into autoplot() when you want a ready-made plot, or hand it to ggplot2 when you want full styling. Because the output is tidy, it slots into group_by(), facet_wrap(), and dplyr verbs without reshaping.
pr_curve() syntax and arguments
The signature is identical to roc_curve(), with the same binary versus multiclass split. Binary problems take one probability column, multiclass problems take one column per class level.
| Argument | Description |
|---|---|
data |
Data frame with the truth column and probability columns. |
truth |
Unquoted column name of the observed labels (must be a factor). |
... |
Unquoted probability columns. One for binary, one per class for multiclass. |
na_rm |
If TRUE, drop rows where any column is missing before computing. |
event_level |
"first" or "second"; names which factor level counts as positive. |
case_weights |
Optional unquoted column for weighted curve points. |
The truth factor levels must align with the probability column names after the .pred_ prefix. For binary cases, the third argument is the probability column for the positive class, controlled by event_level.
Four worked examples on rare-event data
These examples use an imbalanced fraud frame so the PR curve actually has work to do. When the positive rate is below 10 percent, accuracy and ROC AUC look optimistic; the PR curve is the chart that shows you the truth.
Call pr_curve() to get the threshold sweep. Each row gives you the precision the model keeps at that cutoff and the recall it has reached so far.
The first row sits at the lowest cutoff: recall is 1 and precision matches the 7 percent base rate. The last row sits at the top, where only the most confident prediction stays positive.
Pipe the result into autoplot() for a styled ggplot.
+ labs(title = "Fraud classifier") or + scale_x_continuous(labels = scales::percent) to brand the chart without rebuilding it from the curve tibble.For full styling control, plot the raw points with ggplot. The PR curve uses recall on x and precision on y, the opposite axis convention from the ROC curve.
The dashed line at 0.07 marks the base rate, the precision a random classifier reaches. A curve that hugs that line is no better than guessing.
Group by a resample column to overlay one curve per fold.
pr_curve() vs neighbouring metrics
Pick the function that matches the question you are asking. The table below contrasts pr_curve() with the closest yardstick siblings.
| Function | Returns | When to use |
|---|---|---|
pr_curve() |
Tibble of thresholds with precision and recall | Plot or pick a cutoff on imbalanced data |
pr_auc() |
One row with average precision | Single ranking score for rare-event models |
roc_curve() |
Tibble of sensitivity and specificity points | Plot or pick a cutoff on balanced data |
f_meas() |
Single F1 score at one cutoff | You already chose a threshold |
precision() and recall() |
One row each at one cutoff | Report headline numbers after picking a threshold |
If the positive class is below 20 percent of your data, prefer pr_curve() and pr_auc() over the ROC pair. The ROC chart still works, but it tends to look better than the model performs.
Common pitfalls
Three issues show up in nearly every pr_curve() bug report.
First, passing event_level = "first" when the rare class is the second factor level produces a curve at the baseline. Always check levels(df$truth) and match it against the probability column name. yardstick treats the level the column refers to as the positive class, not whichever class is rarer.
Second, the last row often shows precision = NaN and recall just above zero. That is the highest threshold, where no observation is labelled positive; precision is 0 / 0. autoplot() handles it gracefully, but a custom ggplot that uses geom_smooth() or fits a curve will throw warnings. Drop the row with slice(-n()) or filter out is.nan(precision) before custom plotting.
Third, multiclass calls require one probability column per level. Passing only the positive-class column on a three-level factor triggers Error: A multiclass problem requires probability columns for all levels. Use tidyselect, for example .pred_a:.pred_c or starts_with(".pred_"), when classes share a prefix.
Try it yourself
Try it: Build the PR curve for the fraud tibble above and find the threshold that keeps precision at or above 0.5 with the highest possible recall. Save that threshold to ex_thr.
Click to reveal solution
Explanation: pr_curve() lists every threshold the classifier produced, so picking a cutoff is a dplyr operation. filter(precision >= 0.5) keeps only rows where the precision floor is met, then slice_max(recall) picks the row that still recovers the most positives.
Related yardstick functions
sklearn.metrics.precision_recall_curve(y_true, y_score) returns three arrays plus the threshold array; yardstick returns one tidy tibble with named columns instead, which removes the off-by-one length quirk you may have hit in Python.pr_auc()for the single-number average precision the curve summarises.roc_curve()androc_auc()for the sensitivity-specificity variant on balanced data.gain_curve()andlift_curve()for marketing-style cumulative charts.f_meas(),precision(), andrecall()once you have committed to a threshold.metric_set()to bundle pr_auc() with calibration metrics likebrier_class()in alast_fit()workflow.
See the yardstick reference for pr_curve() on tidymodels.org for the full argument list.
FAQ
What is the difference between pr_curve() and pr_auc() in yardstick?
pr_curve() returns a tibble of thresholds, precision, and recall, which is the data you plot. pr_auc() reduces that curve to a single average precision number. Both consume the same inputs, a truth factor and a probability column, so you typically call them side by side: one for the chart, one for the leaderboard score on imbalanced problems.
When should I use pr_curve() instead of roc_curve()?
Use pr_curve() when the positive class is rare, typically below 20 percent of the data. ROC curves rescale both axes by class size, so they stay optimistic on imbalanced data. PR curves anchor precision against the base rate, so a curve that hugs the bottom is honest about a weak model. For balanced binary problems, either chart works.
How do I plot a precision-recall curve from yardstick output?
Call autoplot() on the pr_curve() result for a ready-made ggplot. For custom styling, map recall to x and precision to y in ggplot2, then add geom_path(). Add a dashed horizontal reference line at the positive class rate so readers see the random-guess baseline; a useful chart compares the curve to that line, not to the top of the plot.
Does pr_curve() handle multiclass problems?
Yes. Pass one probability column per class using tidyselect, for example .pred_setosa:.pred_virginica. yardstick computes a one-vs-all PR curve per level and returns a tibble with a .level column. Pipe it to autoplot() for one panel per class, or filter the .level column to focus on a single class.
Why does my PR curve sit close to the bottom of the chart?
Either the model has no signal beyond the base rate, or event_level points to the wrong factor level. Check levels(df$truth) and confirm the .pred_* column refers to the rare class. A curve that sits at the positive class rate horizontally is a coin flip; a curve that climbs toward 1 on the precision axis at low recall is a model worth tuning.