caret rocSummary in R: ROC AUC With twoClassSummary
caret rocSummary is the function name many R users type when they want ROC AUC as a resampling metric, but the caret package ships no function called rocSummary(). The right function is twoClassSummary() for binary outcomes, prSummary() for precision-recall, and multiClassSummary() for three or more classes. This page maps the search term to the actual function and shows the wiring that scores ROC AUC inside trainControl().
twoClassSummary(df, lev = levels(df$obs)) # binary ROC, Sens, Spec trainControl(summaryFunction = twoClassSummary, classProbs = TRUE) # wire-in for train() train(..., metric = "ROC") # tune on ROC AUC prSummary(df, lev = levels(df$obs)) # PR-AUC for imbalanced data multiClassSummary(df, lev = levels(df$obs)) # 3+ classes, per-class AUC pROC::roc(obs, prob) # raw ROC outside caret fit$resample$ROC # per-fold AUC after train()
Need explanation? Read on for examples and pitfalls.
What is "rocSummary" in caret?
There is no function named rocSummary in the caret package. The search term most often refers to twoClassSummary(), which is the binary-classification summaryFunction that trainControl() calls per fold and which returns ROC AUC together with Sensitivity and Specificity. A few tutorials and custom training scripts define their own rocSummary wrapper, but it is not a built-in caret export. If you typed caret::rocSummary and got Error in loadNamespace, that is the underlying reason.
twoClassSummary covers any binary outcome, multiClassSummary covers any multinomial outcome, defaultSummary covers regression. ROC AUC sits inside twoClassSummary and multiClassSummary rather than getting its own dedicated function.Use twoClassSummary() for binary ROC AUC
twoClassSummary() is the binary classifier scoring contract caret runs on every resample. It accepts a data frame with obs, pred, and one probability column per class, then returns a named numeric vector with ROC, Sens, and Spec. To wire it into a train() call, set summaryFunction = twoClassSummary and classProbs = TRUE in trainControl(), then pass metric = "ROC" to train() so tuning optimises AUC.
fit$results reports the mean ROC across folds plus the per-fold standard deviation (ROCSD). caret picks the row with the maximum mean ROC as the best tune; here the model has no tuning grid, so a single row is returned.
train() aborts with "use of ROC codes requires class probabilities."Tune a caret model on ROC AUC
Once twoClassSummary is wired in, any tunable model optimises ROC AUC automatically. Pass a tuneGrid (or tuneLength) and caret rbinds one row per (resample, hyperparameter) pair into fit$resample, then averages into fit$results. The row with the highest mean ROC becomes fit$bestTune.
The per-resample AUCs live in fit_rf$resample$ROC, useful for sanity-checking that the mean is not driven by one outlier fold.
Multi-class ROC AUC with multiClassSummary()
For three or more classes, multiClassSummary() returns per-class AUC plus a weighted mean. It lives in caret's extended metrics and requires classProbs = TRUE plus one probability column per factor level. The function reports Mean_AUC (one-vs-rest, averaged), prAUC, accuracy, and per-class precision, recall, and F1.
Mean_AUC is the macro-average of three one-vs-rest AUCs (setosa, versicolor, virginica) and is the multi-class analogue of ROC AUC. Tune on it by passing metric = "Mean_AUC".
rocSummary vs twoClassSummary vs prSummary
Map the search term to the right built-in by outcome type and class balance. All three are valid summaryFunction values; pick the one whose metric drives the tuning decision.
| Search term users type | Actual caret function | Returns | When to use |
|---|---|---|---|
rocSummary |
twoClassSummary |
ROC AUC, Sens, Spec | Binary outcome, balanced or moderately imbalanced |
roc auc multi-class |
multiClassSummary |
Mean_AUC plus per-class metrics | Three or more classes |
pr auc summary |
prSummary |
AUC (PR), Precision, Recall, F | Heavily imbalanced binary (under 10 percent positives) |
regression rmse |
defaultSummary |
RMSE, Rsquared, MAE | Numeric outcome |
log loss caret |
mnLogLoss |
logLoss | Probability calibration matters |
fit$resample before trusting the headline metric. Per-fold ROC variance reveals model fragility that the averaged number hides. A mean ROC of 0.92 with a per-fold range of 0.78 to 0.99 is much less reliable than one with mean 0.90 and range 0.88 to 0.92.Common mistakes when searching for rocSummary
Three patterns account for most of the trouble. Each has a one-line fix.
The function does not exist in caret. Replace rocSummary with twoClassSummary and the call resolves. Keep lev = levels(df$obs) so the positive class is read from the factor.
Add classProbs = TRUE to trainControl(). ROC AUC cannot be computed from hard predictions; the function needs a probability column for each factor level.
caret reads levels(obs)[1] as the event. Alphabetical ordering makes "No" the positive class, which inverts Sensitivity and Specificity from what the analyst expects. Set the order explicitly: factor(x, levels = c("Yes", "No")) so "Yes" is the event.
Try it yourself
Try it: Build a 5-fold CV pipeline on binary_iris (versicolor vs virginica) predicting Species from Sepal.Length and Sepal.Width with a logistic regression. Wire twoClassSummary into trainControl() and tune on ROC. Save the mean ROC across folds to ex_roc.
Click to reveal solution
Explanation: Sepal length and width alone produce a weaker classifier than the full four-predictor model, so ROC drops from near 1 to about 0.80. Averaging fit$resample$ROC reproduces the mean-of-folds AUC that fit$results$ROC reports.
Related caret functions
The full ROC and resampling toolchain sits one call away:
twoClassSummary()for the binary ROC AUC contract. See caret twoClassSummary() in R.prSummary()for precision-recall AUC on imbalanced binary data. See caret prSummary() in R.multiClassSummary()for per-class AUC across three or more classes. See caret multiClassSummary() in R.defaultSummary()for regression resample scoring. See caret defaultSummary() in R.trainControl()for swapping summaryFunctions and configuring resamples. See caret trainControl() in R.confusionMatrix()for the full binary classification scorecard. See caret confusionMatrix() in R.
For the upstream reference, see the caret measuring performance guide.
FAQ
Is rocSummary a real function in caret?
No. The caret package does not export a function named rocSummary(). Calls like caret::rocSummary or rocSummary(df) return "could not find function" because the symbol is undefined. The closest built-in is twoClassSummary(), which returns a named numeric vector with ROC, Sens, and Spec for binary outcomes. A few user-written scripts define a local rocSummary <- function(data, lev, model) ... wrapper, but that is custom code, not part of caret.
Which caret function should I use to score ROC AUC?
Use twoClassSummary() for binary classification (two-level factor outcome) and multiClassSummary() for three or more classes. Both go into trainControl(summaryFunction = ..., classProbs = TRUE), and you tune the model by passing metric = "ROC" or metric = "Mean_AUC" to train(). For heavily imbalanced binary data, switch to prSummary() so the metric reflects ranking on the rare positive class.
Why does my caret model give the wrong ROC?
Two checks. First, confirm classProbs = TRUE in trainControl(); without it, caret has no probability column and train() aborts. Second, check levels(your_outcome) and make sure the intended positive class is the FIRST level. caret treats levels(obs)[1] as the event, so an alphabetical factor like c("No", "Yes") puts "No" as positive and inverts Sens/Spec from what most analysts expect.
Can I compute ROC AUC without caret?
Yes. Use pROC::roc(response = obs, predictor = prob) and read the auc element, or PRROC::roc.curve(scores.class0 = prob, weights.class0 = (obs == "yes")) for a different implementation. Both accept raw vectors and do not require the caret resample frame. Inside caret, however, twoClassSummary is preferred because it produces per-fold AUCs that align with the rest of the fit$resample columns.
How do I tune on Sensitivity or Specificity instead of ROC?
Pass metric = "Sens" or metric = "Spec" to train() and keep summaryFunction = twoClassSummary. caret picks the hyperparameter row with the highest mean of the requested column. Tuning on Sens alone is risky because a model that always predicts the positive class scores Sens = 1; pair it with a Specificity floor in a custom summary, or stick with ROC for a balanced objective.