caret pickSizeTolerance() in R: Parsimonious RFE Picks
The caret pickSizeTolerance() function returns the smallest recursive feature elimination subset size whose resampled metric is within a stated percentage of the best size. Use it when a slightly worse but much smaller model beats the absolute top score on interpretability or deployment cost.
pickSizeTolerance(x, metric, tol = 1.5, maximize) # default 1.5 percent slack pickSizeTolerance(x, "RMSE", tol = 2, maximize = FALSE) # regression, tighter parsimony pickSizeTolerance(x, "ROC", tol = 1, maximize = TRUE) # classification, strict pickSizeTolerance(rfe_fit$results, "Accuracy", 3, TRUE) # call on a fitted rfe object rfeControl(functions = list(selectSize = pickSizeTolerance)) # wire as size selector identical(rfFuncs$selectSize, pickSizeBest) # TRUE; default is pickSizeBest, not tolerance caret::pickSizeTolerance # inspect source code
Need explanation? Read on for examples and pitfalls.
What pickSizeTolerance() does in one sentence
pickSizeTolerance() scans an rfe results table and returns the smallest subset size whose resampled metric is within tol percent of the best. It is a stateless integer-returning helper, not a model. You hand it the results data frame from a recursive feature elimination run, name the metric column, set a tolerance percentage, and tell it whether higher scores are better. It returns one integer: the chosen number of predictors.
The parsimony bias is the whole point. Two subsets that score 0.918 and 0.915 ROC are statistically indistinguishable, but the smaller one trains faster, ships smaller, and is easier to explain. pickSizeTolerance() formalizes that judgment.
pickSizeTolerance() syntax and arguments
The signature has four arguments and a tolerance default of 1.5 percent.
The full signature is:
Arguments:
x: a data frame of resampled results. Must contain a column namedVariables(the subset size tried) and a column whose name matchesmetric.metric: character string. Column name to optimize. Regression:"RMSE","MAE","Rsquared". Classification:"Accuracy","Kappa","ROC","Sens","Spec".tol: numeric percentage of slack from the best score. Default is1.5(1.5 percent). Larger values pick smaller subsets.maximize: logical.TRUEwhen higher is better (Accuracy, ROC, Rsquared).FALSEwhen lower is better (RMSE, MAE, logLoss).
The return value is one integer: the Variables value of the smallest qualifying row. Internally caret computes the percent loss per size, filters to rows with loss less than or equal to tol, and picks the row with the fewest variables. Ties on size are broken by the smaller index, which already corresponds to the smaller subset.
tol = 1.5 matches caret's built-in rfFuncs$selectSize recipe. If you swap pickSizeTolerance into rfeControl, the same 1.5 percent slack applies unless you pass a custom wrapper that fixes a different tol.pickSizeTolerance() examples by use case
The function takes one resampled table; the four examples below vary the metric, the tol, and the call context.
Example 1: regression with default tolerance
Default tol = 1.5 covers the most common regression workflow. Use it when RMSE is the headline metric and you want a sensible parsimony floor without thinking about the number.
Best RMSE is 2.48 at 16 variables. The 1.5 percent slack threshold is 2.48 * 1.015 = 2.517. Size 8 (RMSE 2.50) qualifies, size 4 (RMSE 2.62) does not, so 8 wins. You traded a 0.02 RMSE bump for half the predictors.
Example 2: classification with strict tolerance
Tighten tol to 1 when small ROC differences matter for downstream decisions. Clinical scoring, fraud screening, and credit risk often cannot afford the 1.5 percent default give-up.
Best ROC is 0.915 at 20 variables. The 1 percent slack floor is 0.915 * 0.99 = 0.906. Sizes 12 and 20 clear the bar; the smaller (12) wins. Drop tol to 0.5 and only size 20 qualifies.
Example 3: wired into rfe() through rfeControl()
Swap pickSizeTolerance into the selectSize slot of rfFuncs to change rfe behavior end-to-end. The change applies to optsize, optVariables, and any plot drawn from the fitted rfe object.
By copying rfFuncs and overriding only its selectSize slot, rfe() calls pickSizeTolerance() instead of the default pickSizeBest() when picking optsize. The resulting fit reports the parsimonious size in rfe_fit$optsize and stores the chosen variables in rfe_fit$optVariables.
Example 4: scan multiple tol values to see the trade-off
Sweep tol across a small grid before committing to one value. The output exposes the parsimony curve and gives you something concrete to defend in code review.
A tol sweep is the fastest way to pick a defensible value. The output maps tolerance to chosen size, so you can argue "1 percent slack still costs us 16 features; 3 percent halves it again" instead of guessing.
results_sweep[results_sweep$Variables %in% chosen_sizes, ] so the trade is visible, not hidden inside a single integer.pickSizeTolerance() vs pickSizeBest() and custom selectors
The two built-in selectors optimize different objectives; custom selectors handle anything else.
| Selector | Returns | Bias | Default tol |
When to use |
|---|---|---|---|---|
pickSizeBest() |
Size with absolute best metric | Performance only | n/a | Maximum accuracy matters, model size does not |
pickSizeTolerance() |
Smallest size within tol percent of best |
Parsimony with floor | 1.5 | Interpretability or deployment cost matters |
Custom selectSize |
Whatever you write | Anything | Anything | Combine size, score variance, and stability |
pickSizeBest() ignores the gap between sizes; a one-feature difference and a hundred-feature difference look identical if scores are equal. pickSizeTolerance() always prefers the smaller of two near-tied sizes. Custom selectors get the full results frame plus the metric and maximize flag, so you can rank by mean - sd, penalize sizes above a budget, or apply business rules.
tol from 0 (= pickSizeBest) to 5 and you slide along the bias-variance trade-off without rewriting any model code.Common pitfalls
The two failure modes are wrong-direction maximize and tol values on the wrong scale.
With maximize = TRUE the function treats 3.5 as the "best" RMSE and picks the size with the highest error. Always set maximize = FALSE for RMSE, MAE, logLoss, and maximize = TRUE for Accuracy, ROC, Rsquared. Read the metric direction once and write it down at the top of the script.
The second pitfall is tol scaling. tol is a percentage (1.5 means 1.5 percent), not a raw metric unit. Passing tol = 0.015 thinking it means "1.5 percent" gives you near-zero slack and reduces the call to pickSizeBest().
rfFuncs$selectSize <- pickSizeTolerance assignments persist for the R session. Reset with data(rfFuncs) or restart R; otherwise downstream rfe() calls quietly use tolerance selection when you expected best.Try it yourself
Try it: Given the resampled accuracy table below, pick the smallest subset size within 2 percent of the best ROC. Save the chosen size to ex_size.
Click to reveal solution
Explanation: Best ROC is 0.91 at size 15. The percent loss for size 9 is (0.91 - 0.892)/0.91 * 100 = 1.98, which is within tol = 2. Sizes 9, 12 and 15 all qualify, so the smallest (9) wins.
Related caret functions
For the rest of the rfe workflow, reach for these companions:
pickSizeBest(): pick the size with the absolute best resampled metric, no parsimony bias.pickVars(): return the list of variables held across resamples at the chosen size.rfe(): run the recursive feature elimination loop itself.rfeControl(): configure resampling, helper functions, and the size selector.tolerance(): the analogue ofpickSizeTolerance()fortrain()tuning grids.
FAQ
What does tol = 1.5 actually mean in pickSizeTolerance()?
tol = 1.5 means a 1.5 percent slack from the best resampled metric. If best accuracy is 0.90, the function accepts any size with accuracy at least 0.90 * (1 - 0.015) = 0.8865. For RMSE, the threshold flips: with best RMSE 2.00 and tol 1.5, any size with RMSE at most 2.00 * 1.015 = 2.03 qualifies. The smallest qualifying size wins.
When should I use pickSizeTolerance() instead of pickSizeBest()?
Pick tolerance when model size has downstream cost: scoring latency on a serving path, interpretability for stakeholders, or a feature-engineering budget. pickSizeBest() returns the score-maximizing size regardless of how many extra features it dragged in. pickSizeTolerance() formalizes the "I'd rather have a slightly worse, much smaller model" trade as a single percentage.
How do I wire pickSizeTolerance() into an rfe() call?
Pass a custom functions list to rfeControl() with selectSize = pickSizeTolerance. The simplest form is rfeControl(functions = c(rfFuncs, list(selectSize = pickSizeTolerance))), which inherits everything from rfFuncs and overrides only the size selector. The tol value defaults to 1.5; set a different default by wrapping the function: selectSize = function(...) pickSizeTolerance(..., tol = 3).
Does pickSizeTolerance() work with custom metrics?
Yes, as long as the results data frame has a column named exactly what you pass to metric. If your rfeControl(functions$summary) returns columns MeanAUC and MeanLogLoss, call pickSizeTolerance(x, "MeanAUC", tol = 2, maximize = TRUE). The function never inspects metric meaning; it only reads the column you name and applies the percent-loss rule.