parsnip fit_xy() in R: Train Models With X/Y Matrices
The parsnip fit_xy() function trains a tidymodels model from a predictor matrix x and an outcome y, skipping the formula interface that fit() uses.
fit_xy(spec, x = preds, y = outcome) # core matrix-interface call fit_xy(spec, x = df[-1], y = df$target) # split predictors from a frame fit_xy(spec, x = as.matrix(preds), y = y) # pass a numeric matrix fit_xy(spec, x = preds, y = factor(labels)) # classification outcome fit_xy(spec, x, y, case_weights = w) # weighted fit predict(model_fit, new_data = preds) # score new rows
Need explanation? Read on for examples and pitfalls.
What fit_xy() does
fit_xy() is the matrix interface to model fitting in parsnip. You hand it a model specification, a data frame or matrix of predictor columns (x), and a vector of outcome values (y). It returns a fitted model_fit object, exactly the kind that fit() returns. Only the data handoff changes.
The function exists because not every workflow starts with a formula. Sometimes the predictors already sit in a clean numeric matrix, the output of a recipe, or a feature-engineering step. Writing a formula just so parsnip can take it apart again is wasted work. fit_xy() lets you skip that round trip.
fit() derives predictors from a formula, while fit_xy() takes them directly. Pick the interface that matches the shape of the data you already have.The example below splits mtcars into a predictor frame and an outcome vector, the two pieces fit_xy() expects.
The predictor frame holds three columns and no outcome. The outcome lives in its own vector. That separation is the whole idea behind the matrix interface.
fit_xy() syntax and arguments
The signature is short and positional. fit_xy(object, x, y, case_weights = NULL, control = control_parsnip(), ...). The first three arguments carry all the information the model needs.
| Argument | What it does |
|---|---|
object |
A parsnip model specification, such as linear_reg() or rand_forest(). |
x |
A data frame or matrix of predictor columns, with no outcome column. |
y |
A vector or one-column data frame of outcomes. A factor for classification. |
case_weights |
Optional vector of per-row weights, for engines that support them. |
control |
A control_parsnip() object that toggles verbosity and logging. |
Predictors go in x, the outcome goes in y. With the data already split, fitting is one call. The block below fits an ordinary linear regression with the lm engine.
parsnip wraps the predictors and outcome into an internal formula (..y ~ .) and passes them to lm. The fitted object is a model_fit, ready for predict().
Fit models with fit_xy(): two more examples
fit_xy() works for classification, not just regression. Switch the specification, set the mode, and pass a factor outcome. The block below trains a random forest on iris and scores three rows.
linear_reg() |> set_engine("lm") is just a blueprint for a model. You can pass the same object to fit() or fit_xy() without redefining it.The outcome y is the Species factor, so parsnip knows each class. predict() then returns a tibble with a .pred_class column, one prediction per supplied row.
fit_xy() vs fit(): the formula difference
The split comes down to formula preprocessing. fit() reads a formula, so it can build dummy variables, interactions, and inline transforms like log(hp). fit_xy() runs no formula, so it takes the predictor columns exactly as you supply them.
| Aspect | fit() |
fit_xy() |
|---|---|---|
| Input | formula plus data |
x predictors plus y outcome |
| Formula preprocessing | Applied (dummies, log(), interactions) |
None; predictors used as-is |
| Dummy variables | Created from factors via the formula | Left to the engine's default encoding |
| Best for | Transforms expressed in a formula | Predictors already in matrix form |
When no formula transforms are involved, both interfaces produce identical coefficients. The check below confirms it.
See the parsnip reference for fit_xy() for the full argument list and engine notes.
Common pitfalls
Forgetting to set the mode. A bare rand_forest() has no mode, and fit_xy() cannot reliably guess it from the outcome. Call set_mode("classification") or set_mode("regression") on the specification before fitting, or the call errors.
Expecting a formula to apply. fit_xy() ignores formulas entirely. A predictor you intend to transform, such as log(hp), must be built as a real column before the call. Use fit() instead when you want formula transforms computed for you.
Mismatched columns at predict time. predict() expects new_data to carry the same predictor column names that appeared in x. Rename or reorder new data to match the training predictors, or prediction fails with a column error.
Try it yourself
Try it: Use fit_xy() to fit a linear_reg() model that predicts qsec from hp and wt in mtcars. Save the fitted model to ex_fit.
Click to reveal solution
Explanation: fit_xy() takes the predictor columns as x and the outcome vector as y, so no formula is needed. The fitted lm has three coefficients: an intercept plus one per predictor.
Related parsnip functions
- fit() is the formula interface counterpart of fit_xy().
- predict() scores new data from a fitted model.
- set_engine() chooses the computational engine.
- set_mode() declares regression or classification.
- extract_fit_engine() reaches the underlying engine object.
FAQ
What is the difference between fit() and fit_xy() in parsnip?
fit() takes a formula and a data frame, reading the outcome and predictors from the formula and applying any preprocessing the formula describes, such as dummy variables and interactions. fit_xy() takes the predictors and outcome as separate objects, x and y, and applies no formula preprocessing. Both return the same model_fit object. Choose fit() when transformations live in a formula, and fit_xy() when your predictors are already a clean matrix or data frame.
Does fit_xy() create dummy variables for factor predictors?
Not on its own. Because fit_xy() runs no formula, factor columns are passed straight to the modeling engine. Some engines, such as ranger, accept factors directly. Others, such as glmnet or xgboost, need a fully numeric matrix and will error. The safe approach is to convert factors to indicator columns before calling fit_xy(), or to use a recipe with step_dummy() inside a workflow so the encoding is explicit and reproducible.
Can I use fit_xy() with a recipe or workflow?
Recipes and workflows use their own fitting path. A workflow built with add_recipe() and add_model() is fitted with fit(), not fit_xy(); the recipe handles preprocessing and produces the predictor matrix internally. Use fit_xy() for the simpler case of a standalone model specification with predictors you have already prepared yourself.
Why does fit_xy() fail for some models?
fit_xy() relies on the engine supporting a non-formula interface. A few model and engine combinations only register a formula method, so parsnip raises an error stating that fit_xy() is not available. The fix is to call fit() with a formula instead. You can also check parsnip's documentation or run show_engines() to see which interfaces an engine exposes before committing to one.
Should x be a matrix or a data frame in fit_xy()?
Either works for most engines. A data frame keeps column names, which makes predict() and model summaries easier to read, so it is the better default. Use a matrix only when the engine specifically expects one, such as glmnet. parsnip converts between the two as the engine requires, but starting from a named data frame gives the clearest output and the safest predict() behavior later.