caret prSummary() in R: Precision and Recall Summary Metrics
The prSummary() function in caret is the precision-recall summaryFunction that trainControl() calls on every resample when the outcome is imbalanced. It accepts a data frame with obs, pred, and class-probability columns, then returns a named vector of AUC (precision-recall AUC), Precision, Recall, and F (F1 score). Wire it into trainControl() with classProbs = TRUE when positives are rare and ROC overstates performance.
prSummary(df, lev = levels(df$obs)) # direct call trainControl(summaryFunction = prSummary, classProbs = TRUE) # wire-in train(..., metric = "F", maximize = TRUE) # optimise F1 train(..., metric = "AUC", maximize = TRUE) # optimise PR-AUC train(..., metric = "Precision") # optimise precision fit$resample[, c("AUC", "Precision", "Recall", "F")] # per-fold metrics levels(y)[1] # the positive class
Need explanation? Read on for examples and pitfalls.
What prSummary() does in one sentence
prSummary() is caret's precision-recall scoring contract. It is the summaryFunction you pass to trainControl() when the positive class is rare and the precision-recall curve carries more signal than the ROC curve. The body computes four numbers from data$obs, data$pred, and the probability column matching the first factor level: PR-AUC via MLmetrics::PRAUC(), plus Precision, Recall, and the F1 score F from caret's own helpers.
The first factor level is the "positive" class. caret looks up lev[1] to find the matching probability column, so the order of levels in your outcome variable controls which class precision and recall are computed against.
prSummary() syntax and arguments
The signature matches every caret summaryFunction. Three arguments, of which only data carries values.
The arguments:
data: a data frame withobs(factor truth),pred(factor predictions), and one column per class named for that level. caret hands the resample frame in unchanged.lev: a character vector of factor levels. The first element is the positive class. Passlev = levels(df$obs)when calling outsidetrain().model: the caret training method name. Ignored by prSummary; present only to match the API contract.
The return is a length-four named numeric vector with AUC, Precision, Recall, and F. Any of those four names is valid for metric = in train(). AUC here is precision-recall AUC, not ROC AUC; the column name is reused for backwards compatibility.
AUC returns NA while Precision, Recall, and F still compute. Install once with install.packages("MLmetrics").prSummary() examples by use case
Three patterns cover the common calls: wiring into trainControl, optimising on F1 or PR-AUC, and wrapping for a custom threshold. Each reuses the same scorer with different framing.
classProbs = TRUE is mandatory. Without it train() cannot produce the probability columns that prSummary reads, and the resample call errors before scoring.
metric = "AUC" here means precision-recall AUC because the summaryFunction is prSummary. The same code with summaryFunction = twoClassSummary would interpret "AUC" as a typo and abort, because twoClassSummary names its ROC column "ROC" instead.
The wrapper overrides pred from the probability column before calling prSummary, which is the right way to score at a custom threshold. Lower thresholds trade precision for recall; choose by what cost matters for your application.
lev[1] for the probability column lookup and as the positive target for precision and recall. The default alphabetical ordering rarely matches business intent. Use factor(y, levels = c("positive_class", "negative_class")) to force the order.prSummary() vs alternatives
Caret ships five summaryFunctions plus the postResample helper. Pick prSummary only when classes are imbalanced and the precision-recall curve is the relevant ranking.
| summaryFunction | Outcome type | Returned metrics | Pick when |
|---|---|---|---|
prSummary |
Two-class factor | AUC (PR), Precision, Recall, F | Imbalanced binary, positives rare |
twoClassSummary |
Two-class factor | ROC, Sens, Spec | Roughly balanced binary, ROC is the headline |
multiClassSummary |
Multi-class factor | Accuracy, Kappa, per-class metrics | Three or more classes |
defaultSummary |
Numeric (regression) | RMSE, Rsquared, MAE | Regression |
mnLogLoss |
Two- or multi-class | logLoss | Probability calibration matters most |
postResample |
Either | RMSE/Rsq/MAE or Accuracy/Kappa | Two-vector scoring outside the resample loop |
The deciding test is class balance. Under 10 percent positives, prSummary; between 10 and 40 percent, both are reasonable but twoClassSummary remains the convention; above 40 percent, twoClassSummary because ROC AUC reads more cleanly. For three or more classes, multiClassSummary regardless of balance.
Common pitfalls
Three mistakes cause most prSummary failures. Each has a quick fix.
prSummary reads data[, lev[1]] for the probability of the positive class. Without classProbs = TRUE, that column does not exist and train() aborts before scoring the first resample. Always set classProbs = TRUE alongside summaryFunction = prSummary.
With "Good" as the first level, prSummary now scores recall against the majority class and metrics look excellent while the actual problem (detecting "Bad" credit) gets ignored. Order the factor so the rare or business-critical class is first: factor(y, levels = c("Bad", "Good")).
The PR-AUC computation calls MLmetrics::PRAUC(). When the package is missing, prSummary returns NA for AUC and the other three metrics still compute. Run install.packages("MLmetrics") once if you tune on metric = "AUC"; otherwise switch the metric to "F".
AUC is precision-recall AUC, not ROC AUC. Reports that combine output from prSummary and twoClassSummary side by side will mislead readers who assume "AUC" means ROC. Rename the column in your reporting code, or document the metric explicitly in plots and tables.Try it yourself
Try it: Build a 5-fold CV pipeline on the imbalanced GermanCredit data predicting Class with logistic regression. Wire prSummary into trainControl() with classProbs = TRUE and tune on F. Save the mean F1 across folds to ex_f1.
Click to reveal solution
Explanation: classProbs = TRUE produces the probability columns prSummary reads; ordering levels with c("Bad", "Good") makes the rare "Bad" class the positive target so Precision, Recall, and F measure detection of bad credit. Averaging fit$resample$F gives the cross-validated F1.
Related caret functions
The precision-recall stack sits one call away:
twoClassSummary()for ROC, sensitivity, specificity on balanced binary. See caret twoClassSummary() in R.multiClassSummary()for three-class or more outcomes. See caret multiClassSummary() in R.defaultSummary()for the regression scorer. See caret defaultSummary() in R.confusionMatrix()for the full classification scorecard. See caret confusionMatrix() in R.trainControl()for swapping summaryFunctions and setting classProbs. See caret trainControl() in R.
For the upstream reference, see the caret package documentation.
FAQ
What does prSummary() return?
For a two-class problem with classProbs = TRUE, prSummary() returns a length-four named numeric vector: AUC (precision-recall AUC via MLmetrics::PRAUC), Precision, Recall, and F (the F1 score). caret rbinds one row per fold into fit$resample and averages columns into fit$results. Any of the four names is valid for metric = in train().
When should I use prSummary instead of twoClassSummary?
Switch to prSummary when the positive class is roughly under 10 percent of rows. ROC AUC under twoClassSummary is dominated by the abundant negative class, so a model that ranks the rare positives poorly can still report high AUC. Precision-recall AUC is sensitive to ranking on the minority class. For 10 to 40 percent positives, both are reasonable; above 40 percent twoClassSummary is conventional.
Why is AUC always NA in my output?
The AUC column comes from MLmetrics::PRAUC(). If the MLmetrics package is not installed, prSummary() returns NA for AUC and still computes Precision, Recall, and F. Install it with install.packages("MLmetrics") and re-run, or change metric = in train() to "F", "Precision", or "Recall", all of which are computed without MLmetrics.
How do I make prSummary score against the rare class?
Order the outcome factor so the rare class is the first level: factor(y, levels = c("rare_class", "common_class")). caret reads lev[1] as the positive class for the probability column lookup and for precision and recall. Alphabetical ordering rarely matches business intent, so set the levels explicitly when constructing the outcome.
Can I use prSummary with a custom decision threshold?
Yes. Wrap prSummary in a function that overwrites data$pred from the probability column before delegating: data$pred <- factor(ifelse(data[, lev[1]] > 0.3, lev[1], lev[2]), levels = lev); prSummary(data, lev, model). Pass the wrapper as summaryFunction in trainControl(). Lower thresholds raise recall and lower precision; the F column tracks the trade-off.