workflows add_recipe() in R: Attach a Preprocessing Recipe
The workflows add_recipe() function in R attaches a recipes object to a tidymodels workflow as the preprocessor. The recipe stays with the workflow, so resampling, tuning, and prediction all re-estimate the same preprocessing steps on each fold and keep data leakage out of the pipeline.
add_recipe(wf, rec) # attach a recipe to a workflow workflow() |> add_recipe(rec) |> add_model(spec) # full workflow in one pipe add_recipe(wf, rec, blueprint = bp) # custom hardhat blueprint update_recipe(wf, new_rec) # swap the recipe in place remove_recipe(wf) # detach the recipe extract_recipe(wf_fit) # pull the trained recipe back extract_recipe(wf_fit, estimated = FALSE) # pull the untrained recipe
Need explanation? Read on for examples and pitfalls.
What add_recipe() does
add_recipe() registers a recipes object as the preprocessor of a workflow. It does not run the recipe. It only records the recipe in the workflow's preprocessor slot, the same way add_model() records a parsnip spec in the model slot. The actual prep() and bake() calls happen later, when you call fit(), fit_resamples(), or tune_grid() on the workflow.
This is the plumbing trick at the centre of tidymodels. By keeping the recipe inside the workflow, the workflow re-estimates the recipe on the training half of each resample and re-applies it to the held out half. Means, standard deviations, dummy levels, and PCA loadings are learned from training rows only, never from validation rows. Hand-rolled pipelines that call prep() once on the full data leak summary statistics into every fold; add_recipe() removes that risk by construction.
bake() new data before predict(); the workflow does it for you.add_recipe() syntax and arguments
add_recipe() takes a workflow, a recipe, and an optional blueprint. The signature is short, and most users never touch the blueprint argument.
The x argument must be a workflow() object, usually piped in from workflow(). The recipe argument must be an unprepped recipes object; pass the raw output of recipe() plus any step_*() calls, not a prepped recipe. The blueprint argument is a hardhat::default_recipe_blueprint() object that controls how predictors with missing values, indicator variables, and intercepts are handled at prediction time; the default usually does the right thing.
add_recipe() returns a new workflow. The function is pure: it does not mutate its input. You always need to assign the result back to a variable or chain it into another pipe step.
add_recipe() is cheap because nothing is computed. The recipe is held alongside the model spec until the workflow is fit. If you want to inspect what the recipe will do before training, call prep() and bake() on the recipe directly, outside the workflow.add_recipe() is one of three preprocessor verbs in the workflows package. Knowing when to reach for each one is half the battle.
| Verb | Preprocessor input | When to use |
|---|---|---|
add_recipe() |
A recipe() object with steps |
Anything beyond a one-line formula (impute, scale, encode, PCA) |
add_formula() |
A model formula like y ~ x1 + x2 |
Lightweight workflow, no transformation needed |
add_variables() |
Outcome and predictor column names | Engines that refuse formulas (XGBoost matrix input) |
Pick add_recipe() whenever the preprocessing must be re-estimated on each resample, which is most production cases.
Build workflows with add_recipe(): four examples
Every example below uses the built-in mtcars and airquality datasets so the focus stays on the workflow rather than on the data.
Example 1: Normalize numeric predictors
Centering and scaling predictors is the canonical use case for a recipe. It takes two recipe steps and one add_recipe() call.
The workflow now stores both the recipe and the spec. Calling fit(wf_norm, data = mtcars) first estimates the means and standard deviations of wt, hp, and disp, applies the standardization, then fits the linear regression on the transformed columns.
Example 2: One-hot encode a categorical predictor
Recipes shines when you mix transformation types. Here step_dummy() converts the factor cyl into indicator columns before the model sees them.
Notice that predict() works on the raw mt data with cyl still a factor. The workflow re-applies step_dummy() automatically; you never call bake() by hand.
Example 3: Impute missing values, then dummy encode
Recipe steps run in the order you declare them. This matters for imputation: impute before any step that fails on NA, then encode.
extract_recipe() pulls the trained recipe back out of the fitted workflow, and tidy() shows the learned medians. Those medians were computed only from rows passed to fit(), so they are safe to use under cross-validation.
Example 4: add_recipe() vs the workflow constructor
The constructor argument and the verb form behave identically. Prefer the verb form because it reads like a pipeline.
Both objects fit and predict the same way. The verb form is what every tidymodels vignette uses, and it scales better when you reach for update_recipe() or update_model() later.
Common pitfalls
A workflow holds exactly one preprocessor and one untrained recipe. These are the three errors you will hit while learning add_recipe().
fit() on each resample. Passing a prepped recipe defeats leakage protection and is rejected outright by modern versions of workflows.Try it yourself
Try it: Build a workflow that uses a recipe to apply step_log() to Sepal.Length, dummy-encodes Species, then fits a linear regression of Petal.Length. Save the fitted workflow to ex_fit.
Click to reveal solution
Explanation: step_log() applies a natural-log transform to the named column. step_dummy() then one-hot encodes Species because linear regression cannot consume a factor directly. Wrapping both steps in add_recipe() ensures predict() re-applies them to new rows automatically.
Related tidymodels functions
add_recipe() rarely appears alone. These are the functions you will use alongside it.
workflow()creates the empty workflow thatadd_recipe()updates.recipe()builds the recipes object thatadd_recipe()expects.add_model()attaches the parsnip spec that pairs with the recipe.update_recipe()swaps the recipe inside an existing workflow.extract_recipe()pulls the trained recipe out of a fitted workflow.remove_recipe()detaches the recipe entirely.
See the workflows package reference for the full verb family.
FAQ
Does add_recipe() prep the recipe?
No. add_recipe() only stores the recipe in the workflow's preprocessor slot. Estimation happens inside fit(), fit_resamples(), or tune_grid(), which call prep() internally on the training data of each split. This is intentional: it is what gives the workflow its leakage-safe behaviour across resamples and why the workflow object is the right unit of work for cross-validation.
Can I use add_recipe() and add_formula() in the same workflow?
No. A workflow can hold only one preprocessor. Stacking add_recipe() and add_formula() raises an error. Choose add_recipe() when you want named transformation steps and want preprocessing to be re-estimated on each resample. Choose add_formula() when a one-line model formula is enough and you need no preprocessing.
How do I inspect what the recipe did inside the fitted workflow?
Call extract_recipe(wf_fit) on the fitted workflow. The returned recipe is trained, so tidy() on individual steps reveals the learned values, such as means from step_normalize() or medians from step_impute_median(). Pass estimated = FALSE if you want the original, untrained recipe back instead of the fitted one.
Is add_recipe() different from recipes::recipe()?
Yes. recipe() constructs the recipes object. add_recipe() attaches that object to a workflow. The two functions live in different packages (recipes and workflows) and serve different roles. You always build the recipe first, then attach it to a workflow with add_recipe() before adding a model spec.
How do I update a recipe attached to a workflow?
Use update_recipe(wf, new_recipe). This replaces the existing recipe without touching the model spec or refitting anything. Calling add_recipe() a second time on the same workflow fails because the preprocessor slot is already filled, so the workflows team built update_recipe() as the explicit replacement verb.