parsnip bag_mars() in R: Build Bagged MARS Models
bag_mars() defines a bagged ensemble of MARS (Multivariate Adaptive Regression Splines) models in R. It fits many MARS models on bootstrap samples and averages them, which trims the variance of a single spline fit.
bag_mars(mode = "regression") # regression ensemble bag_mars(mode = "classification") # classification ensemble bag_mars() |> set_engine("earth") # set the engine bag_mars() |> set_engine("earth", times = 25) # 25 bagged members bag_mars(prod_degree = 2) # allow 2-way interactions bag_mars(num_terms = 10) # cap retained model terms bag_mars(prune_method = "none") # skip backward pruning fit(spec, mpg ~ ., data = mtcars) # fit the spec
Need explanation? Read on for examples and pitfalls.
What bag_mars() does
bag_mars() wraps MARS in a bagging ensemble. A single MARS model builds piecewise-linear hinge functions to capture nonlinear relationships, but it can be unstable: small changes in the data shift the chosen knots. Bagging fixes that by fitting the model on many bootstrap resamples and averaging the predictions.
The function comes from the baguette package, a parsnip extension. It returns a model specification, not a fitted model, so you pair it with set_engine() and fit() like any other parsnip spec. The only engine is earth, the package that implements MARS.
install.packages("baguette") and load it with library(baguette) before defining a spec, or parsnip cannot find the model.bag_mars() syntax and arguments
The signature has four model arguments plus an engine. Every argument defaults to NULL, which means the underlying earth engine picks a sensible value.
| Argument | What it controls |
|---|---|
mode |
"regression" or "classification". Required before fitting. |
num_terms |
Number of model terms (hinge functions) kept in each member. |
prod_degree |
Highest interaction degree: 1 is additive, 2 allows pairwise terms. |
prune_method |
Pruning passed to earth, such as "backward" or "none". |
engine |
Always "earth". The number of bagged members is set here. |
The count of bagged models is not a main argument. You pass it to set_engine() as times, which defaults to 11.
Fit a bagged MARS model: regression and classification
Build the spec, set the engine, then call fit(). This regression example predicts mpg from every other column of mtcars using a 25-member ensemble.
Switching to classification needs only a different mode. The earth engine handles factor outcomes by fitting a generalized linear model on the spline basis.
Read aggregated variable importance
baguette aggregates importance across every member. The fitted object stores a tidy table at $fit$imp, with the mean importance and its standard error over the bootstrap members.
The used column counts how many of the 25 members kept that predictor. A variable used in only a few members is a weak signal worth dropping.
bag_mars() vs mars() vs rand_forest()
Choose bag_mars() when a single MARS fit is too jumpy. All three handle nonlinear relationships, but they trade interpretability for stability differently.
| Model | Captures nonlinearity by | Best when |
|---|---|---|
mars() |
Hinge functions, one model | You want a readable, fast spline model |
bag_mars() |
Bagging many MARS fits | Single MARS is unstable on your data |
rand_forest() |
Bagging many decision trees | Relationships are step-like or high-order |
If a plain mars() model already gives stable cross-validated error, the ensemble adds compute for little gain. Reach for bag_mars() when resampling shows the single fit swinging between folds.
Common pitfalls
Three mistakes account for most bag_mars() errors. Each has a quick fix.
- Forgetting baguette.
library(baguette)must run beforebag_mars(). Loading onlytidymodelstriggers a "could not find function" error. - Setting times in the wrong place.
timesis an engine argument. Pass it insideset_engine("earth", times = 25), not insidebag_mars(). - Skipping the mode. A spec left at
mode = "unknown"fails atfit(). Setmode = "regression"ormode = "classification"explicitly.
times on a resample rather than setting it to a large number by reflex.Try it yourself
Try it: Fit a 30-member bagged MARS regression on mtcars predicting hp from disp, wt, and cyl. Save the fitted model to ex_fit.
Click to reveal solution
Explanation: bag_mars() defines the spec, set_engine("earth", times = 30) requests 30 bagged members, and fit() trains them on the formula. predict() returns one averaged prediction per row.
Related parsnip functions
- parsnip mars() defines a single, unbagged MARS model.
- parsnip bag_tree() bags decision trees instead of splines.
- parsnip rand_forest() builds a random forest ensemble.
- parsnip boost_tree() fits gradient boosted trees.
- parsnip fit() trains any model spec on a data frame.
See the official baguette bag_mars() reference for the full argument list.
FAQ
What package provides bag_mars()? bag_mars() ships in the baguette package, a parsnip model extension maintained by the tidymodels team. parsnip itself does not include it. Install baguette with install.packages("baguette") and load it alongside library(tidymodels). Once loaded, bag_mars() behaves like any built-in parsnip spec and works inside workflows and recipes.
How many models does bag_mars() bag? The ensemble size is the times engine argument, which defaults to 11. Set it with set_engine("earth", times = 25). More members reduce variance, but the benefit flattens after roughly 25 to 50 fits while training time keeps growing. Tune times on a resample if compute is a concern.
Can bag_mars() do classification? Yes. Set mode = "classification" and pass a factor outcome. The earth engine fits a generalized linear model on the MARS spline basis for class probabilities. Use predict(fit, new_data, type = "prob") for probabilities or the default type = "class" for hard labels.
bag_mars() vs mars(): which should I use? Start with mars(). It is faster and easier to inspect. Move to bag_mars() only when cross-validation shows the single MARS model swinging between resamples. Bagging averages away that knot-selection noise at the cost of extra compute and a less directly interpretable model.