parsnip fit() in R: Train a Model Specification
The parsnip fit() function in R trains a model specification on data and returns a fitted model object. You pass a parsnip spec, a formula, and a data frame, and fit() hands the work to the underlying engine.
fit(spec, y ~ ., data = df) # formula interface fit(spec, y ~ x1 + x2, data = df) # choose predictors fit_xy(spec, x = predictors, y = outcome) # matrix interface fit(workflow_obj, data = df) # fit a bundled workflow model_fit$fit # the raw engine object predict(model_fit, new_data = df) # use the fitted model extract_fit_engine(model_fit) # pull the engine fit
Need explanation? Read on for examples and pitfalls.
What fit() does
fit() is the verb that turns a specification into a trained model. A parsnip spec such as linear_reg() only records your intent. No data touches it until you call fit(), which estimates the parameters and returns a model_fit object you can predict from.
The function takes three core inputs: the model specification, a formula that names the outcome and predictors, and the data frame to learn from. It then translates your spec into a call to the chosen engine, such as stats::lm() or ranger::ranger(), and stores the result.
Because fit() is a generic, it also accepts a workflow object. That lets one call train a preprocessing recipe and a model together, which is the pattern most tidymodels projects use in production.
fit() on every resample or dataset without rewriting the model definition.library(tidymodels) or library(parsnip) makes it available. The method that handles model specs is fit.model_spec(), and a separate fit.workflow() method handles workflows.fit() syntax and arguments
fit() needs a spec, a formula, and data, with everything else optional. The remaining arguments tune how the fit is reported and weighted.
The object argument is the spec you built with verbs like set_engine() and set_mode(). The formula argument follows base R rules, so mpg ~ . means predict mpg from every other column. The data argument holds the training rows.
The control argument changes how fit() behaves on failure. By default a fitting error stops execution. Pass control_parsnip(catch = TRUE) to capture the error inside the result instead, which is useful when fitting many models in a loop.
Fit a model: four examples
Every example below uses a built-in R dataset. The mtcars data drives the regression examples and a factor version drives the classification example, so the code runs anywhere with no downloads.
Example 1: Fit a regression model with a formula
Build the spec, then fit it with a formula. The lm engine needs no extra package, so this is the simplest possible fit() call.
The printed result is a model_fit object wrapping the lm() output. The coefficients show fuel economy falls as weight and horsepower rise, which matches the data.
Example 2: Fit a classification model
Switch the spec to logistic_reg() and fit() trains a classifier. The outcome column must be a factor, so convert it first.
The same fit() call handles classification once the spec and the outcome type agree. Here the glm engine converged, so the model is ready for predict().
Example 3: Fit a workflow object
fit() also accepts a workflow, training preprocessing and model in one call. A workflow bundles a formula or recipe with a model spec.
Calling fit() on the workflow runs the formula step and the model fit together. The result is a fitted workflow, not a bare model_fit, so you predict from it the same way.
Example 4: Inspect the model_fit object
fit() returns a structured object, not just the engine output. The model_fit object stores the spec, timing, and the raw engine result side by side.
The fit element holds the underlying lm object, and extract_fit_engine() is the safe way to reach it. Use that helper rather than digging into $fit directly, since the internal layout can change.
summary(), diagnostics, or other native methods, extract_fit_engine(model_fit) returns the raw engine object cleanly. It works the same whether you fit a spec or a workflow.Compare fit() with fit_xy() and last_fit()
fit() is one of three training verbs in tidymodels. Each one trains a model but expects different inputs and produces a different result.
| Function | Inputs | Returns | Use when |
|---|---|---|---|
fit() |
spec, formula, data | model_fit |
Standard training with a formula |
fit_xy() |
spec, x, y |
model_fit |
Predictors and outcome are separate objects |
last_fit() |
spec, rsplit |
results tibble | Final fit on train, scored on test |
The decision rule is simple. Use fit() for everyday training with a formula, reach for fit_xy() when your predictors are already a matrix, and call last_fit() only for the one final evaluation on held-out test data. If you come from caret, the parsnip fit() plus predict() pair replaces caret::train().
Common pitfalls
Three mistakes catch most newcomers to fit(). Each one below shows the problem and the fix.
The most common is swapping the formula and data arguments. fit() expects the formula second and the data third, so a named data = argument with no formula fails.
The second pitfall is a missing mode. Models that can both classify and regress, such as decision_tree(), need set_mode() before fit() can dispatch. The third is a character outcome for classification, since parsnip requires the response column to be a factor, not plain text.
factor() or as.factor(), otherwise fit() cannot learn the class levels.Try it yourself
Try it: Fit a linear_reg() model with the lm engine that predicts mpg from disp and cyl on mtcars. Save the fitted model to ex_fit.
Click to reveal solution
Explanation: The spec sets the model type and engine, then fit() estimates the coefficients from the formula and data. The result is a model_fit object holding the trained lm model.
Related parsnip functions
fit() works alongside the rest of the parsnip workflow. These functions cover the neighboring steps in a tidymodels project.
fit_xy()trains a spec from separate predictor and outcome objects.predict()generates predictions from a fittedmodel_fit.set_engine()chooses the computational backend before fitting.set_mode()sets classification or regression for dual-mode models.extract_fit_engine()pulls the raw engine object out of a fit.
FAQ
What package is fit() in?
The fit() generic is defined in the generics package and re-exported by parsnip, so library(parsnip) or library(tidymodels) makes it available. The method that handles parsnip specifications is fit.model_spec(), and fit.workflow() handles workflow objects. You never call those methods by name; R dispatches the right one based on the object you pass.
What is the difference between fit() and fit_xy()?
Both train a parsnip spec, but they take different inputs. fit() uses a formula and a data frame, so it can apply factor encoding and other formula-driven preprocessing. fit_xy() takes predictors and outcome as separate objects and skips formula handling, which is faster when your data is already a numeric matrix. Most projects use fit() because the formula interface is clearer.
What does fit() return in R?
fit() returns a model_fit object. It is a structured list that stores the original specification, the elapsed fitting time, the outcome levels, and the raw engine result in the fit element. Reach the engine object with extract_fit_engine() rather than indexing $fit directly. When you call fit() on a workflow, the result is a fitted workflow instead.
Why does fit() say the mode is missing?
Some model types, such as decision_tree() and rand_forest(), can predict either a class or a number. parsnip cannot guess which you want, so fit() stops until you call set_mode("classification") or set_mode("regression"). Single-mode models like linear_reg() set the mode automatically, so the error only appears for dual-mode specifications.
How do I fit a model across cross-validation folds?
Use fit_resamples() from the tune package instead of fit(). It takes the spec and a resampling object such as vfold_cv(), fits the model on each fold, and collects the metrics. Plain fit() trains one model on one dataset, so it is the wrong tool for resampling. Use fit() for the final model once tuning is done.
For the full argument reference, see the parsnip fit() documentation.