caret treebag in R: Bagging Decision Trees with train()
The caret treebag method fits a bagged ensemble of unpruned decision trees by drawing many bootstrap samples, growing one rpart tree on each, and averaging the predictions. You invoke it through train(method = "treebag"), which delegates to ipred::bagging() and gives you cross-validation, variable importance, and a single tidy model object in return.
train(mpg ~ ., data = mtcars, method = "treebag") # regression train(Species ~ ., data = iris, method = "treebag") # classification train(mpg ~ ., data = mtcars, method = "treebag", nbagg = 50) # 50 trees train(mpg ~ ., data = mtcars, method = "treebag", # 10-fold CV trControl = trainControl(method = "cv", number = 10)) predict(fit, newdata = mtcars[1:5, ]) # bag-averaged varImp(fit) # importance ipred::bagging(mpg ~ ., data = mtcars, nbagg = 25) # direct call
Need explanation? Read on for examples and pitfalls.
What treebag does in one sentence
treebag is caret's name for bootstrap-aggregated CART trees. When you pass method = "treebag" to train(), caret bootstraps the training rows nbagg times, fits an unpruned rpart tree on each resample, and stores all trees inside one object. Calling predict() later averages the regression outputs or majority-votes the classification labels, cutting the high variance a single tree shows on noisy data.
treebag syntax and arguments
There is no treebag() function; you call it through train(). caret routes method = "treebag" to ipred::bagging() under the hood and exposes the standard train() interface for tuning, resampling, and pre-processing.
Formula and matrix shapes both work:
train(formula, data, method = "treebag", trControl, ...)
train(x, y, method = "treebag", trControl, ...)
Arguments you actually tune:
formulaorx,y: target on the left, predictors on the right; numeric outcome for regression, factor for classification.data: data frame matching the formula.method = "treebag": tells caret to use bagged trees.trControl: the resampling plan, built withtrainControl(). Default is bootstrap with 25 reps.nbagg: number of bootstrap trees. Forwarded toipred::bagging(); default 25, push to 50 to 100 for noisy data.keepX: keep the predictor matrices on the fitted object; setFALSEto shrink memory....: extra args forwarded toipred::bagging()and then on torpart(control = rpart.control(...)).
treebag has zero tunable hyperparameters. modelLookup("treebag") returns one row with parameter = "parameter", which is caret's signal that there is nothing to grid-search. Cross-validation still gives you an honest performance estimate; it just won't sweep a tuning grid.treebag examples by use case
Four worked examples cover the calls you will reach for most. Each one runs in a fresh session and prints the resampled performance summary.
1. Regression on mtcars
2. Classification on iris
3. Tune the bag size with nbagg
4. Honest 10-fold cross-validation and variable importance
nbagg to your noise level, not your data size. Clean small data plateaus around 25 trees; noisy or imbalanced data keeps improving up to 100 or 200. Beyond that the marginal RMSE drop is usually smaller than the cross-validation standard error.treebag vs other ensembles
Pick the ensemble whose bias-variance trade matches the signal. All four below sit in caret with the same train() call shape, so swapping is a one-line edit.
| Method | What it does | Tuning surface | Best when |
|---|---|---|---|
treebag |
Bagged unpruned CART trees | None (just nbagg) |
Quick variance reduction over a single tree |
rf |
Bagged trees plus random feature subsets | mtry |
Many correlated predictors |
gbm |
Sequential boosted shallow trees | n.trees, interaction.depth, shrinkage |
Maximum predictive power on tabular data |
bagEarth |
Bagged MARS (piecewise linear) | degree, nprune |
Smooth nonlinearities, not step functions |
The decision rule is short: start with treebag to baseline what bagging buys over a single tree, then move to rf if you have many predictors and gbm if you have time to tune.
Common pitfalls
Three traps catch most first-time users. Each one shows up as a confusing error or a quietly worse score.
treebag is a no-tuning-grid model, so caret rejects the tuneGrid. To compare bag sizes you fit several models and compare resampling distributions with resamples().
Each call resamples 25 fresh bootstraps and grows different trees, so RMSE shifts run to run. Wrap every fit in set.seed() if you compare numbers in a report.
Try it yourself
Try it: Train a bagged-tree classifier on iris with 60 bootstrap trees and 10-fold cross-validation. Save the fitted model to ex_treebag and print its accuracy from ex_treebag$results.
Click to reveal solution
Explanation: method = "treebag" plus nbagg = 60 grows 60 bagged trees per resample; trainControl(method = "cv", number = 10) swaps the default bootstrap for 10-fold CV so you get an honest accuracy estimate.
Related caret functions
train()is the entry point for every caret model, includingtreebag. See thetraindeep-dive.trainControl()builds the resampling plan you pass totrain().varImp()extracts importance scores from the fitted bag.bagEarth()is the same bagging idea applied to MARS rather than CART trees.predict.train()returns bag-averaged predictions on new data.
FAQ
What is the difference between treebag and randomForest in caret? Both bag trees, but randomForest additionally samples a random subset of predictors at each split, which decorrelates the trees and usually improves accuracy on data with many correlated features. treebag keeps the full predictor set at every split, so its trees look more alike and the variance reduction plateaus sooner. Use treebag as a quick baseline and switch to method = "rf" when you have many predictors.
Can I tune the number of trees in treebag with tuneGrid? No. modelLookup("treebag") shows zero tunable hyperparameters, so caret blocks tuneGrid calls. To compare bag sizes, fit several models with different nbagg values, collect them with caret::resamples(), and compare the resampling distributions. The cost of growing more trees is linear, and most datasets stop improving past 100.
Does treebag handle missing values automatically? The underlying rpart trees can split on surrogate variables when a row has missing predictors, so a single tree tolerates NAs. ipred::bagging() passes data through unchanged, so the same surrogate behavior applies inside treebag. You still need to handle missing values in the outcome before calling train(), since caret drops those rows by default.
How do I extract feature importance from a treebag model? Call varImp(fit) on the fitted train object. caret averages the importance score across the bag of trees, where each tree contributes the sum of goodness-of-split improvements for every variable it used. The output is a data frame ranked 0 to 100, with the strongest predictor pinned at 100. Plot with plot(varImp(fit)) for a quick bar chart.
Is treebag suitable for very large datasets? It depends on memory rather than runtime. Each bootstrap fits a full unpruned tree, so the in-memory size grows roughly linearly with nbagg and with the number of predictors. For tens of thousands of rows on a laptop, 25 to 50 trees is usually safe; beyond that, consider ranger (a faster random forest) or xgboost through caret, both of which scale better and tune more aggressively.