parsnip augment() in R: Add Predictions to Data

The parsnip augment() function in R adds model predictions and residuals directly to your data frame, returning every original column plus new prediction columns in a single call.

By Selva Prabhakaran · Published May 18, 2026 · Last updated May 18, 2026

⚡ Quick Answer

augment(fit, new_data = df)            # append .pred column to df
augment(lm_fit, new_data = test)       # regression: adds .pred and .resid
augment(cls_fit, new_data = test)      # classification: adds .pred_class + probs
augment(fit, new_data = df[, -1])      # no outcome column means no .resid
augment(fit, new_data = df)$.pred      # pull only the predictions vector
augment(fit, new_data = df)$.resid     # pull residuals for diagnostics

Need explanation? Read on for examples and pitfalls.

📊 Is augment() the right tool?

What augment() does

augment() attaches predictions to your data instead of returning them separately. You pass a fitted parsnip model and a data frame, and augment() runs the model on that data, then column-binds the prediction columns onto the original rows. The output is a tibble with the same number of rows as the input.

This is the key difference from predict(), which returns only the new columns. With augment() you never have to manually bind_cols() predictions back to your data, so the result is ready for plotting, error analysis, or export.

The columns added depend on the model's mode. A regression model gains a .pred column, and a .resid column when the true outcome is present. A classification model gains a .pred_class column plus one probability column per class.

Key Insight

augment() is predict() plus your original data. Reach for it whenever the next step needs predictions sitting next to the predictors that produced them, such as residual plots or grouped accuracy checks.

augment() syntax and arguments

The signature is short because most behavior is inferred from the model. augment() is an S3 generic; the method for parsnip is augment.model_fit().

Run live

Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.

Raugment generic signature

augment(x, new_data, eval_time = NULL, ...)

Argument	Description
`x`	A fitted `model_fit` object produced by `fit()` or `fit_xy()`.
`new_data`	A data frame or matrix of predictors to generate predictions for.
`eval_time`	Evaluation times, used only for censored regression (survival) models.
`...`	Extra arguments passed through to `predict()`, such as `type`.

The argument is named new_data, with an underscore. Passing data = instead is the single most common mistake, covered in the pitfalls section below.

Augment a model: four examples

Each example below uses a built-in dataset so you can run it as-is. First, fit a linear regression model with the parsnip interface.

RFit a linear model with parsnip

library(parsnip) lm_fit <- linear_reg() |> set_engine("lm") |> fit(mpg ~ wt + hp, data = mtcars) lm_fit #> parsnip model object #> #> Call: stats::lm(formula = mpg ~ wt + hp, data = data) #> #> Coefficients: #> (Intercept) wt hp #> 37.22727 -3.87783 -0.03177

Example 1 augments regression data and adds residuals. Because mtcars still contains the mpg outcome, augment() returns both .pred and .resid.

RAugment regression predictions

aug <- augment(lm_fit, new_data = mtcars) aug[1:4, c("mpg", "wt", "hp", ".pred", ".resid")] #> # A tibble: 4 x 5 #> mpg wt hp .pred .resid #> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 21 2.62 110 23.0 -1.97 #> 2 21 2.88 110 22.0 -0.98 #> 3 22.8 2.32 93 25.3 -2.53 #> 4 21.4 3.22 110 21.1 0.30

Example 2 augments a classification model. A fitted classifier adds .pred_class and one probability column per outcome level.

RAugment classification predictions

tree_fit <- decision_tree(mode = "classification") |> set_engine("rpart") |> fit(Species ~ ., data = iris) augment(tree_fit, new_data = iris)[c(1, 60, 120), c("Species", ".pred_class", ".pred_setosa", ".pred_versicolor")] #> # A tibble: 3 x 4 #> Species .pred_class .pred_setosa .pred_versicolor #> <fct> <fct> <dbl> <dbl> #> 1 setosa setosa 1 0 #> 2 versicolor versicolor 0 0.907 #> 3 virginica virginica 0 0.022

Example 3 augments fresh data with no outcome column. When new_data lacks the response variable, augment() can still predict but omits .resid.

RAugment new data without the outcome

new_cars <- mtcars[1:3, c("wt", "hp")] augment(lm_fit, new_data = new_cars) #> # A tibble: 3 x 3 #> wt hp .pred #> <dbl> <dbl> <dbl> #> 1 2.62 110 23.0 #> 2 2.88 110 22.0 #> 3 2.32 93 25.3

Example 4 turns the augmented frame into a residual diagnostic. Because predictions and residuals share one tibble, plotting them takes a single step.

RPlot residuals from augmented data

aug <- augment(lm_fit, new_data = mtcars) plot(aug$.pred, aug$.resid, xlab = "Predicted mpg", ylab = "Residual") abline(h = 0, lty = 2)

augment() vs predict(): which to use

Use augment() when you want context, and predict() when you want only the numbers. Both call the same underlying model, so predictions are identical; they differ in what comes back.

Aspect	`augment()`	`predict()`
Returns	Original data plus prediction columns	Only prediction columns
Row count	Same as `new_data`	Same as `new_data`
Residuals	Adds `.resid` when outcome is present	Never
Best for	Diagnostics, plotting, grouped error	Feeding predictions into another step

The decision rule is simple. If your next line of code needs the predictors or the true outcome alongside .pred, call augment(). If you only need a clean predictions tibble to bind elsewhere, predict() is leaner.

Tip

Chain augment() straight into yardstick. Since the augmented tibble holds both the truth and .pred, you can pipe it directly into metric functions like rmse(aug, truth = mpg, estimate = .pred) with no extra joins.

Common pitfalls

Three mistakes account for most augment() errors. Each one below shows the failing pattern and the fix.

The first is using data = instead of new_data =. The parsnip method only recognizes new_data, so the wrong name triggers an unused-argument error.

RWrong argument name fails

# augment(lm_fit, data = mtcars) #> Error: unused argument (data = mtcars) augment(lm_fit, new_data = mtcars) # correct

The second is expecting .resid on data that has no outcome column. Residuals need the true value, so scoring data without the response returns predictions only. Include the outcome column in new_data if you need residuals.

The third is calling augment() on a raw engine object instead of a parsnip model_fit. A plain lm() result dispatches to broom::augment(), which uses different column names. Always augment the object returned by parsnip::fit().

Warning

Column names can collide. If new_data already contains a column called .pred or .resid, augment() overwrites it without warning. Rename pre-existing dotted columns before augmenting.

Try it yourself

Try it: Fit a linear_reg() model of mpg on disp using the lm engine, then augment mtcars and save the result to ex_aug.

RYour turn augment mtcars

# Try it: fit, then augment ex_fit <- # your code here ex_aug <- # your code here ncol(ex_aug) #> Expected: 13 columns (11 original + .pred + .resid)

Click to reveal solution

RSolution

ex_fit <- linear_reg() |> set_engine("lm") |> fit(mpg ~ disp, data = mtcars) ex_aug <- augment(ex_fit, new_data = mtcars) ncol(ex_aug) #> [1] 13

Explanation: mtcars has 11 columns, and augment() appends .pred plus .resid because the mpg outcome is present, giving 13 columns total.

augment() works alongside the rest of the parsnip prediction toolkit. These functions cover the steps before and after augmenting.

predict() returns only the prediction columns, without the original data.
fit() trains the model specification that augment() later consumes.
fit_xy() trains a model from separate predictor and outcome objects.
set_engine() chooses the computational engine behind the model.
translate() shows the exact engine call parsnip will run.

See the official augment.model_fit reference for engine-specific notes.

FAQ

What is the difference between augment() and predict() in parsnip?

Both functions generate the same predictions from a fitted model. predict() returns a tibble containing only the new prediction columns, so you must bind it back to your data yourself. augment() does that bind for you, returning the full new_data with .pred (and .resid when the outcome is present) appended. Use augment() for diagnostics and plotting, and predict() when you only need the prediction values.

Why does augment() not return a .resid column?

The .resid column appears only when new_data includes the true outcome variable. Residuals are computed as the observed value minus the predicted value, so without the observed value there is nothing to subtract. If you augment holdout or scoring data that lacks the response column, you get .pred but no .resid. Add the outcome column to new_data to get residuals back.

Can I use augment() on a workflow object?

Yes. The workflows package provides its own augment.workflow() method that behaves like the parsnip one, applying any recipe preprocessing first. Calling augment(wf_fit, new_data = df) on a fitted workflow returns the original data with prediction columns added, and recent versions also return .resid under the same conditions as augment.model_fit().

What columns does augment() add for classification models?

A classification model_fit gains a .pred_class column holding the predicted label. It also adds one probability column per outcome level, named .pred_<level>, such as .pred_setosa and .pred_versicolor. The probability columns sum to 1 across each row, and .pred_class is the level with the highest probability.

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

parsnip augment() in R: Add Predictions to Data

What augment() does

augment() syntax and arguments

Augment a model: four examples

augment() vs predict(): which to use

Common pitfalls

Try it yourself

FAQ

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

parsnip augment() in R: Add Predictions to Data

What augment() does

augment() syntax and arguments

Augment a model: four examples

augment() vs predict(): which to use

Common pitfalls

Try it yourself

Related parsnip functions

FAQ