caret specificity() in R: Compute True Negative Rate
The caret specificity() function in R computes the true negative rate of a classifier: the share of actual negatives the model correctly labels negative. It accepts factor vectors or a precomputed contingency table, returns a single numeric value, and treats the second factor level as the negative class unless you override it.
specificity(pred, ref) # default, second level is negative specificity(pred, ref, negative = "no") # set negative class specificity(table(pred, ref)) # pass a precomputed table specificity(pred, ref, na.rm = TRUE) # drop NA before counting specificity(pred, ref, negative = "versicolor") # per-class in a multi-class problem 1 - sensitivity(pred, ref, positive = "yes") # false negative rate
Need explanation? Read on for examples and pitfalls.
What specificity() does in one sentence
specificity() is caret's single-number true-negative-rate metric. It counts negatives your classifier correctly labeled, then divides by the total negatives in the reference. The result lies between 0 and 1; 1 means every actual negative was left alone, 0 means none were. The function needs no fitted model, only two aligned vectors of class labels (or a contingency table).
caret defines specificity as TN / (TN + FP), identical to selectivity. For a full scorecard with accuracy, Kappa, sensitivity, and F1 in one call, use confusionMatrix() instead.
specificity() syntax and arguments
The function dispatches on the type of its first argument. Pass a factor and it expects a second factor as reference; pass a table and it reads negatives off the rows excluding the first.
The two calling styles are:
specificity(data, reference, negative = levels(reference)[-1], na.rm = TRUE, ...)
specificity(data, negative = rownames(data)[-1], ...)
data: predicted class labels (factor) or a precomputedtable()of predictions versus reference.reference: ground-truth class labels (factor). Same length and levels asdata.negative: which factor level counts as the negative class. Defaults to every level except the first, which collapses non-positive labels in a multi-class problem.na.rm: drop records where either prediction or reference isNAbefore counting. DefaultTRUE.
factor(..., levels = c("no", "yes")) before the call so the negative slot is the one you intend.specificity() examples by use case
1. Specificity from two factor vectors
The simplest call passes the prediction vector and the reference vector. caret extracts the negative level from the reference factor.
A value of 1 means every actual no record was predicted no. On harder problems specificity usually lands between 0.7 and 0.95.
2. Specificity from a precomputed table
If you already cross-tabbed predictions and references with table(), hand the matrix in directly. Rows after the first are treated as the negative class unless you override.
The table form bypasses the factor-level coercion that trips up the vector form, and it is the only option when predictions arrive as counts (e.g., aggregated logs).
3. Set the negative class explicitly
The default negative is every level except the first. For binary outcomes this is usually fine, but for multi-class problems you almost always want to set negative by hand so the metric matches the question you are asking.
In a binary problem, specificity for one class equals sensitivity for the other. On noisy data the numbers diverge and the negative choice flips the metric. Pick the class whose false positives cost the most: spam flags on legitimate mail, false fraud alerts, false-positive cancer screens.
no for every record scores a perfect specificity of 1.0, because it never raises a false alarm. The same model scores 0 on sensitivity. Always read specificity alongside sensitivity (or balanced accuracy) so the trivial-negative trick cannot fool you.4. Specificity for each class in a multi-class problem
For three or more classes, specificity is computed one class at a time, treating that class as positive and everything else as negative. Loop over the levels to get a per-class vector of the share of non-class records the model correctly excluded.
Setosa and virginica hit 1.0; versicolor drops to 0.967 because one non-versicolor flower was wrongly flagged as versicolor. This is the specificity column of confusionMatrix()$byClass; the loop form just skips the rest of the report.
5. Compare sensitivity, specificity, and precision side by side
caret pairs specificity() with sibling functions for the other corners of the confusion matrix. Calling them together gives a three-number summary without building the whole confusionMatrix object.
Specificity (TN / (TN + FP)) answers "of the negatives, how many did we leave alone?" Sensitivity (TP / (TP + FN)) answers "of the positives, how many did we catch?" Precision (TP / (TP + FP)) answers "of the predicted positives, how many were right?" Together they describe the classifier more honestly than accuracy.
specificity() vs alternatives
caret's specificity() is a one-metric extractor; pick a fuller tool when you want more than the true negative rate. The choice usually comes down to whether you also need sensitivity, accuracy, or F1 in the same call.
| Tool | Returns | Multi-class | Best for |
|---|---|---|---|
caret::specificity() |
Single numeric | One-class-at-a-time via negative= |
A single TNR number in scripts |
caret::confusionMatrix() |
List with overall and per-class metrics | Yes, all classes at once | Full classifier scorecard |
yardstick::spec() |
Tibble with .metric, .estimate |
Yes, with macro or micro estimator | Tidymodels pipelines |
MLmetrics::Specificity() |
Single numeric | No (binary only) | Drop-in replacement, no caret dependency |
Reach for specificity() when you only need TNR or are comparing it against sensitivity() and posPredValue() side by side. Use confusionMatrix() when you also want accuracy, Kappa, F1, and confidence intervals from the same call. See the official caret reference at topepo.github.io/caret/measuring-performance.html for the full metric family.
Common pitfalls
Pitfall 1: leaving negative on the default. caret picks every level except the first, collapsing labels into one negative bucket in a multi-class problem. Always pass negative explicitly so the metric matches the class you treat as negative.
Pitfall 2: passing character vectors instead of factors. caret coerces internally, but the resulting level order is alphabetical, not the order you typed; the negative class can silently flip. Wrap inputs in factor(x, levels = c("no", "yes")).
Pitfall 3: comparing specificity across imbalanced datasets. A 99 percent negative dataset can produce specificity 1.0 from a model that always predicts negative. Read it alongside sensitivity, or use balanced accuracy.
specificity() does not score probabilities. It needs hard class labels. For probability-based curves (ROC), use pROC::roc(ref, probs) and read specificity off the curve at your chosen threshold, or call twoClassSummary() inside trainControl to get ROC, sensitivity, and specificity per resample.Try it yourself
Try it: Compute specificity for the versicolor class on the multi-class iris classifier built in example 4. Use the existing pred3 and ref3 objects. Save the result to ex_spec_versicolor.
Click to reveal solution
Explanation: Binarising the labels into versicolor versus other lets caret count true negatives as records that are not versicolor and were not predicted versicolor. The single number is the specificity for versicolor in a one-vs-rest framing.
Related caret functions
After specificity(), these caret functions round out classification evaluation:
sensitivity(): true positive rate, the companion metric to specificityposPredValue(): positive predictive value (precision); the column-wise twin of sensitivitynegPredValue(): negative predictive value; precision for the negative classconfusionMatrix(): full classifier scorecard with both overall and per-class metricstwoClassSummary(): drop-insummaryFunctionfortrainControlthat returns ROC, sensitivity, and specificity per resample
FAQ
Is caret specificity the same as true negative rate?
Yes. caret defines specificity as TN / (TN + FP), the same formula epidemiologists call true negative rate or selectivity. The numbers are identical; only the field convention differs.
Why does caret specificity return 0 when my model looks fine?
Usually the negative class is set to the wrong level. caret defaults to every level except the first, so for c("no", "yes") it reads specificity off the "yes" row, which is empty in your positives. Pass negative = "no" and the number should jump.
How do I compute specificity for each class in a multi-class problem?
Binarise the labels per class, then call specificity(pred_bin, ref_bin, negative = "other"). caret's stock per-class output via confusionMatrix()$byClass[, "Specificity"] does the same thing in one call and is usually faster than the loop.
What is the difference between specificity and precision?
Specificity is TN / (TN + FP): of all true negatives, how many did we leave alone. Precision is TP / (TP + FP): of all positive predictions, how many were right. Both penalise false positives, but specificity normalises by the negative population while precision normalises by predictions, so they diverge sharply on imbalanced data.
Can I use specificity inside trainControl for model tuning?
Yes. Set summaryFunction = twoClassSummary and classProbs = TRUE in trainControl(), then pick metric = "Spec" in train(). caret will tune hyperparameters to maximise average specificity across resamples.