workflows workflow() in R: Bundle Preprocessor and Model
The workflows workflow() function in R creates a tidymodels workflow object that bundles a preprocessor (formula, recipe, or variable set) with a parsnip model specification. Fitting the workflow trains the preprocessor and the model together, so the same object can be passed to predict(), tune_grid(), or last_fit() without rewiring the pipeline.
workflow() # empty workflow scaffold workflow() |> add_model(spec) # attach a parsnip model workflow() |> add_formula(mpg ~ wt + hp) # formula preprocessor workflow() |> add_recipe(rec) # recipes preprocessor workflow() |> add_variables(mpg, c(wt, hp)) # variables preprocessor wf |> fit(data = mtcars) # train the workflow predict(wf_fit, new_data = new_rows) # predict from fitted workflow extract_fit_parsnip(wf_fit) # pull the underlying parsnip fit
Need explanation? Read on for examples and pitfalls.
What workflow() does
workflow() is a container, not a model. It records two slots, one for a preprocessor and one for a model specification, and refuses to fit until both are filled. The container guarantees the preprocessor is estimated on each resample and applied identically at prediction time, which is the single biggest source of leakage when you wire preprocessing by hand.
A workflow holds the preprocessor, the parsnip model spec, and an optional case-weights column. When fit() runs, parsnip trains the model on the preprocessor's output; the trained preprocessor travels with the fitted object so predict() re-applies it at scoring time. Every tidymodels API (tune_grid(), fit_resamples(), last_fit(), workflow_set()) accepts a workflow.
workflow() syntax and arguments
workflow() takes three optional arguments and four setup verbs. The arguments let you build a workflow in one call; the verbs let you build one piece at a time.
The preprocessor argument accepts a formula, a recipe() object, or a workflow_variables() object. The spec argument accepts any parsnip model function such as linear_reg() or logistic_reg(). The case_weights argument names a column holding frequency or importance weights.
A workflow allows exactly one preprocessor. Stacking add_recipe() and add_formula() errors. Pick recipe for transformations, formula for one-line interfaces, variables when an engine refuses formulas (such as XGBoost).
Build a workflow: four examples
Every example below uses the built-in mtcars dataset. The outcome is mpg, and wt, hp, and cyl are the predictors, which keeps the focus on the workflow object rather than on the data.
Example 1: Formula preprocessor and lm model
Wire a formula and a parsnip spec, then fit. This is the lightest possible workflow and the right choice when no real preprocessing is needed.
The print method shows the workflow is not yet trained. Calling fit() flips the header to [trained].
Example 2: Fit and predict from the workflow
fit() trains, predict() scores; both go through the workflow object. The fitted workflow holds the model and the preprocessor, so prediction stays consistent with training.
Predictions arrive in a tibble with a single .pred column, ready to bind_cols() back onto the input. Naming matches a bare parsnip fit because the workflow delegates to parsnip.
Example 3: Recipe preprocessor with feature engineering
Swap the formula for a recipe to add transformations. A recipe is the right choice when you need scaling, dummy coding, imputation, or any step that should be re-estimated on each resample.
step_normalize() mean-centers and scales each predictor on the training data, so the coefficients are comparable. The centering and scaling parameters live inside the workflow and re-apply automatically at predict() time.
Example 4: Variables preprocessor for matrix engines
add_variables() bypasses a formula entirely. Use it when the engine expects raw column vectors, such as XGBoost or some keras setups, and you do not want R to materialize a dummy-coded model matrix.
Outcomes and predictors use tidyselect, so c(), starts_with(), and where() all work. No interactions or polynomial terms are created; the engine sees the columns as supplied.
workflow() vs raw fit() and recipes
Pick a workflow when any step beyond a single model call is involved. The table below covers the three common alternatives.
| Approach | When to use | Trade-off | ||
|---|---|---|---|---|
workflow() |
preprocessing + model, tuning, resampling | one extra object to track | ||
| `spec | > fit(formula, data)` | quick one-shot model, no transforms | no leak-safe preprocessing | |
| `recipe() | > prep() | > bake()` | feature engineering only, no model | manual rewiring at prediction |
workflow_set() |
comparing many preprocessor + model combos | heavier, returns a tibble of workflows |
A bare parsnip fit is fine for a quick check, but hand-rolled preprocessing is not re-estimated on resamples and silently leaks. The workflow is the smallest object that fixes that.
Common pitfalls
Three mistakes trip up most newcomers to workflow(). Each one below shows the broken pattern and the fix.
The first is mixing preprocessors. A workflow accepts one of add_formula(), add_recipe(), or add_variables(); calling two of them throws an error. If a recipe already encodes the formula, do not also call add_formula().
The second is calling predict() on raw test data instead of through the workflow. Skipping the workflow skips the recipe; the model sees unscaled inputs. Always pass new_data = ... to predict() on the fitted workflow.
The third is reaching for the inner model too early. Use extract_fit_parsnip() or extract_recipe(); touching wf$fit$fit$fit is brittle and breaks across versions.
butcher::butcher() to strip unused environments, then save with qs::qsave() or the standard saveRDS(). Re-loading restores the workflow with full fit and prediction support.Try it yourself
Try it: Build a workflow that uses a recipe to log-transform hp, fits a linear regression on mpg ~ wt + hp + cyl, then predicts mpg for the 15th row of mtcars. Save the prediction to ex_pred.
Click to reveal solution
Explanation: step_log() replaces the hp column with its natural log before the model sees it, so the workflow trains a linear regression on log(hp) instead of the raw value. The recipe's log transform is re-applied at prediction time, so row 15 (a heavy Cadillac with high hp) yields a slightly different mpg estimate than the untransformed model.
Related workflows functions
workflow() sits at the center of the workflows package family. These helpers cover the neighboring tasks.
add_recipe()attaches a recipes preprocessor.add_model()attaches a parsnip model specification.add_formula()attaches a one-line formula preprocessor.add_variables()attaches a tidyselect variable selector for matrix engines.update_recipe()andupdate_model()swap a single slot inside an existing workflow.extract_fit_parsnip()pulls the trained parsnip model out of a fitted workflow.workflow_set()builds a tibble of multiple workflows for batch comparison.
FAQ
What package is workflow() in?
workflow() ships in the workflows package and loads automatically with library(tidymodels). The function returns a workflow S3 object; methods for fit(), predict(), tune_grid(), and print() are defined in workflows and tune. Loading workflows on its own is only useful for a slimmer dependency set.
What is the difference between workflow() and recipe()?
A recipe describes feature engineering steps (centering, dummy coding, imputation) but has no model attached. A workflow binds a recipe to a parsnip spec so they fit and predict together. A bare recipe forces you to call prep(), bake(), and fit() manually, which is the bookkeeping the workflow eliminates.
Can a workflow hold both a formula and a recipe?
No. A workflow has one preprocessor slot, filled by add_formula(), add_recipe(), or add_variables(). Calling two of them errors. A recipe can express any formula already, so you rarely need both.
How do I tune hyperparameters inside a workflow?
Set tunable arguments to tune() in the model spec or recipe, then pass the workflow to tune_grid() with a resampling object such as vfold_cv(). The tuner returns a metrics tibble; select_best() and finalize_workflow() lock in the winning values for the final fit.
How do I extract the underlying lm or glmnet model from a fitted workflow?
Use extract_fit_parsnip() for the parsnip fit, then extract_fit_engine() for the raw engine object. The extractors stay stable across workflows versions; indexing the nested list manually does not.
For the full argument reference, see the workflows workflow() docs.