caret rfe() in R: Recursive Feature Elimination
The caret rfe() function runs recursive feature elimination, a resampling-based search that fits a model, ranks predictors by importance, drops the weakest, and repeats until it finds the smallest subset of variables that does not lose predictive accuracy.
rfe(x, y, sizes = c(1,2,3), rfeControl = ctrl) # core call rfeControl(functions = rfFuncs, method = "cv") # random forest backend rfeControl(functions = lmFuncs, number = 5) # linear model backend predictors(rfe_fit) # the chosen variables rfe_fit$optVariables # same, as a vector rfe_fit$optsize # size of the best subset plot(rfe_fit, type = c("g", "o")) # performance vs size
Need explanation? Read on for examples and pitfalls.
What rfe() does in one sentence
rfe() is a wrapper method for feature selection. Unlike filter methods that judge each predictor in isolation, recursive feature elimination judges predictors by how a real model performs with them. It fits the model, ranks variables by importance, removes the least useful, and refits on the smaller set.
This loop runs inside a resampling scheme such as cross-validation, so every candidate subset size gets an honest out-of-sample score. The function then picks the subset size with the best resampled performance and reports which variables belong in it. Because the search is driven by the model you actually intend to use, rfe() tends to keep predictors that interact well together, something a one-at-a-time filter cannot see.
rfe() syntax and arguments
rfe() needs a predictor set, an outcome, candidate subset sizes, and a control object. The control object decides which model powers the search and how resampling is done.
The arguments that matter most are:
x: a data frame or matrix of predictors only. The outcome must not be inside it.y: the response vector, numeric for regression or a factor for classification.sizes: an integer vector of subset sizes to evaluate, for examplec(1, 2, 3, 5).rfeControl: the object returned byrfeControl(), which sets the backend model and resampling.metric: the score used to pick the winner, such as"RMSE"or"Accuracy".
rfe(). Recursive feature elimination resamples the data, so the selected subset can shift between runs. A set.seed() call directly before rfe() makes the chosen variables reproducible.A worked rfe() example
Build the control object first, then pass it to rfe(). Here lmFuncs powers the search with linear regression and method = "cv" requests 5-fold cross-validation.
The starred row marks the winning subset size. A model with just three predictors beats the full eight-variable model on cross-validated RMSE, so the extra columns were adding noise rather than signal.
The results data frame holds one row per subset size, which is exactly what plot(rfe_fit) draws as a performance curve. Use it to confirm the winner is a clear minimum and not a flat tie.
Choosing the rfeControl backend
The functions argument decides which model ranks and scores predictors. caret ships several ready-made backends, and the right one depends on your outcome type and how nonlinear the relationships are.
functions |
Model used | Best for |
|---|---|---|
lmFuncs |
Linear regression | Numeric outcome, roughly linear effects |
rfFuncs |
Random forest | Mixed predictors, nonlinear effects |
treebagFuncs |
Bagged trees | Robust nonlinear search, few tuning knobs |
nbFuncs |
Naive Bayes | Fast classification baseline |
caretFuncs |
Any train() model |
Custom model chosen via the method argument |
The other rfeControl() arguments tune the resampling: method picks the scheme ("cv", "repeatedcv", or "boot"), number sets the fold or resample count, and repeats applies when you use repeated cross-validation.
RFECV, which also wraps recursive elimination in cross-validation. The difference is that rfe() lets you swap the backend model through rfeControl() rather than passing the estimator directly.Common pitfalls
Three mistakes account for most broken rfe() runs. Each one is easy to spot once you know the symptom.
- Leaving the outcome inside
x. If the response column is still in the predictor matrix, the model fits it perfectly andrfe()selects only that column. Always buildxandyas separate objects. - Setting
sizeslarger than the column count. Any value insizesabovencol(x)is silently ignored, and the full size is always tested anyway. Keepsizeswithin the real range of predictors. - Treating the reported RMSE as a final test score. The resampled score guided the selection, so it is mildly optimistic. Estimate true performance on a separate hold-out set after selection.
rfe(). Pre-selecting variables using all the data leaks information into every resampling fold. Let rfe() do the selection inside its own resampling loop instead.Try it yourself
Try it: Run recursive feature elimination on the four iris measurements to classify Species, using the random forest backend and 5-fold cross-validation. Save the result to ex_rfe.
Click to reveal solution
Explanation: rfe() ranks the four iris measurements by random forest importance and finds that the two petal measurements alone match full-model accuracy, so it selects the smallest subset that does not lose accuracy.
Related caret functions
rfe() is one of several feature-selection tools in caret. Reach for a neighbour when the wrapper search is not the right fit:
sbf(): selection by filter, scores predictors one at a time.gafs(): genetic algorithm search over feature subsets.safs(): simulated annealing search over feature subsets.varImp(): ranks predictor importance without removing anything.nearZeroVar(): drops near-constant columns before any search.
FAQ
What is the difference between rfe() and varImp()?
varImp() only ranks predictors by importance for a single fitted model and never removes anything. rfe() uses that ranking inside a resampling loop: it drops the weakest predictors, refits, and scores each subset size on held-out folds. The result is an actual recommended subset, not just a ranking. Use varImp() to understand a model and rfe() to choose which variables to keep.
How does rfe() avoid overfitting during feature selection?
rfe() wraps the entire elimination process inside resampling. For every fold, it ranks and removes predictors using only the training portion, then scores the surviving subset on the held-out portion. Because selection happens separately within each fold, the reported curve reflects out-of-sample performance rather than the training fit, which keeps the chosen subset size honest.
What does the sizes argument do in rfe()?
sizes lists the candidate subset sizes that rfe() should evaluate, such as c(1, 2, 3, 5). For each size, the function keeps that many top-ranked predictors and scores the model through resampling. The full predictor count is always tested too. rfe() then selects the size with the best metric and reports it as optsize.
Can rfe() be used for classification?
Yes. Pass a factor as y and choose a classification backend such as rfFuncs or nbFuncs in rfeControl(). The metric switches to "Accuracy" or "Kappa" automatically, and rfe() reports the subset of predictors that maximizes classification performance across the resampling folds.