workflows add_variables() in R: Bare Columns, No Formula
The workflows add_variables() function in R attaches outcome and predictor columns to a tidymodels workflow using bare column names and tidy-select helpers. No formula is built, no dummy expansion is forced, so the engine sees the columns exactly as they sit in the data frame.
workflow() |> add_variables(mpg, c(wt, hp)) # bare outcome and predictors workflow() |> add_variables(mpg, c(wt, hp, cyl)) # multiple predictors as vector workflow() |> add_variables(mpg, everything()) # all other columns as predictors workflow() |> add_variables(mpg, starts_with("disp")) # tidy-select helper workflow() |> add_variables(c(mpg, qsec), c(wt, hp)) # multiple outcomes add_variables(wf, mpg, c(wt, hp), blueprint = bp) # custom hardhat blueprint update_variables(wf, mpg, c(wt, hp, disp)) # swap selection in place remove_variables(wf) # detach the selection
Need explanation? Read on for examples and pitfalls.
What add_variables() does
add_variables() registers a pair of column selections as the preprocessor of a workflow. It stores one tidy-select expression for the outcome and another for the predictors in the workflow's preprocessor slot. Nothing is computed. The actual column lookup runs inside fit(), fit_resamples(), or tune_grid(), when hardhat::mold() resolves the selections against the training data.
The selections are held as quosures, so anything tidyselect accepts works on either side: bare names, vectors, and helpers like starts_with(), where(), and everything(). Factors stay as factors, character columns stay as character, and no model.matrix() expansion runs unless the engine itself asks for one.
add_variables() syntax and arguments
add_variables() takes a workflow, an outcome selection, a predictor selection, and an optional blueprint. The function dispatches on tidyselect, so the second and third arguments accept anything dplyr::select() would accept.
The x argument is a workflow() object, usually piped in from workflow(). The outcomes argument accepts a bare name like mpg or a vector like c(mpg, qsec) for multi-outcome specs. The predictors argument accepts any tidy-select expression: c(wt, hp), everything(), starts_with("v_"), and where(is.numeric) all work. The blueprint argument is a hardhat::default_xy_blueprint() object that controls how missing rows are handled at prediction time; the default rarely needs overriding.
add_variables() returns a new workflow. Like every workflows verb, it is pure and does not mutate its input. Assign the result back to a variable or chain it into the next pipe step.
workflows::add_variables() is cheap. The two quosures are held alongside the parsnip spec until the workflow is fit. To preview which columns the selection will pick, evaluate the expressions with tidyselect::eval_select() on the data, outside the workflow.add_variables() is one of three preprocessor verbs in the workflows package. The right verb depends on how much expansion you want.
| Verb | Preprocessor input | When to use |
|---|---|---|
add_variables() |
Outcome and predictor selections | Engine handles encoding, or you want raw columns |
add_formula() |
A model formula y ~ x1 + x2 |
Need base R formula expansion (*, poly, .) |
add_recipe() |
A recipe() with explicit steps |
Need imputation, scaling, or custom transforms |
Reach for add_variables() when no transformation is needed and you want the engine to receive the columns exactly as they appear in the source data.
Build workflows with add_variables(): four examples
Every example below uses the built-in mtcars and iris datasets so the focus stays on the variable-selection plumbing rather than on the data.
Example 1: Bare outcome and predictor names
The simplest call lists the outcome first and the predictors as a vector. No quotes, no formula, just column names.
The workflow prints Preprocessor: Variables rather than Formula or Recipe, signalling that no formula was built. Calling fit(wf_var, data = mtcars) hands mpg, wt, and hp to lm() as plain columns; lm() then builds its own internal formula because that is what the engine needs, but the workflow itself stays formula-free.
Example 2: Tidy-select helpers for the predictor set
Predictor selection accepts every tidyselect helper. That makes column-pattern selection trivial without listing names by hand.
The starts_with("d") part grabs disp and drat. The where(is.numeric) & !mpg part adds every other numeric column, minus the outcome. The two halves combine into one predictor set at fit time. Because the selection is stored as a quosure, it re-evaluates inside each resample, so adding or renaming columns upstream is picked up automatically.
Example 3: An engine that prefers raw columns
Some engines refuse formula input and need bare columns. XGBoost is the canonical case: it expects a numeric matrix and handles its own categorical encoding, so a model.matrix() expansion gets in the way.
With add_variables(), XGBoost receives the four numeric columns directly and never sees a formula. Swapping in add_formula(mpg ~ wt + hp + disp + cyl) would also work for this engine, but the formula path adds an intercept column and pulls factors through model.matrix(), which is wasted work when the engine already handles encoding internally.
Example 4: Multiple outcomes in one workflow
add_variables() accepts more than one outcome column. That is unusual for add_formula(), where the left side is typically a single response, but tidyselect lets you list a vector.
Whether the model engine fits multi-outcome regression depends on the engine, not the workflow. With set_engine("lm") only the first outcome is used; engines built for multivariate response, like mgcv::gam with a list-formula, can consume the full set. The workflow itself happily records both outcomes and passes them along, leaving the rest to the engine.
Common pitfalls
A workflow holds exactly one preprocessor. These four errors show up most often when picking up add_variables() for the first time.
starts_with("v_") may pick five columns at fit time and three at predict time, which raises a hardhat::forge() error. Pin the exact column names with c(...) when the column set is stable; reserve helpers for cases where you control both the train and predict frames.Try it yourself
Try it: Build a workflow with workflows::add_variables() that uses Petal.Length as the outcome and the two sepal columns as predictors in the iris dataset, pairs it with a linear regression engine, and fits it. Save the fitted workflow to ex_fit.
Click to reveal solution
Explanation: add_variables() records the two sepal columns as predictors and Petal.Length as the outcome without building a formula. When fit() runs, hardhat::mold() resolves the selections against the iris data and hands plain columns to the lm engine. No interaction or expansion is added, so the linear regression uses exactly two predictors.
Related workflows functions
add_variables() is one verb in a small family. These functions sit beside it in almost every tidymodels script.
workflow()creates the empty workflow thatadd_variables()updates.add_model()attaches the parsnip spec that pairs with the variable selection.add_formula()swaps the bare selection for a base R formula preprocessor.add_recipe()swaps the bare selection for a full recipes preprocessor.update_variables()replaces the selection in place without rebuilding the workflow.remove_variables()detaches the selection entirely.
See the workflows package reference for the full verb family.
FAQ
When should I use add_variables() instead of add_formula()?
Reach for workflows::add_variables() whenever a formula would force expansion you do not want. Tree-based engines like XGBoost and LightGBM accept categorical predictors natively, so a model.matrix() path wastes work. Use add_variables() when you only need to name columns; pick add_formula() when you need interactions, polynomial terms, or the . shorthand.
Does add_variables() support tidyselect helpers?
Yes. Both outcomes and predictors accept any tidyselect expression: starts_with(), ends_with(), contains(), matches(), where(), everything(), and the &, |, ! operators all work. The selections are stored as quosures and resolved at fit time and again at predict time. Pin column names with c(...) when the column set must not drift between train and predict.
Can add_variables() handle multiple outcomes?
Yes, on the workflow side. Passing outcomes = c(y1, y2) records two outcome columns. Whether the engine fits a multi-outcome model depends on the engine. The lm engine uses only the first; engines built for multivariate response can consume the full vector.
How is add_variables() different from add_recipe()?
workflows::add_variables() records bare column selections and runs no transformations. add_recipe() records a recipe() with explicit step_*() operations, all of which are re-estimated on each resample. Use add_variables() when no preprocessing is needed; use add_recipe() when imputation, scaling, or other learned steps must stay leak-free across folds.
Can I swap the selection later?
Yes. Use update_variables(wf, new_outcome, new_predictors). It replaces the selection in the existing workflow without touching the model spec or any fitted state. Calling add_variables() twice on the same workflow fails because the preprocessor slot is already filled.