tune collect_predictions() in R: Inspect Tuning Predictions
The tune collect_predictions() function in R extracts out-of-fold predictions from a tune_results object, returning a tidy tibble with one row per resampled observation (per candidate) so you can plot residuals, compute custom metrics, or build ensembles.
collect_predictions(res) # all candidates, averaged collect_predictions(res, summarize = FALSE) # per-resample rows collect_predictions(res, parameters = best) # filter to one candidate collect_predictions(last_fit_obj) # test set predictions collect_predictions(res) |> filter(.config == "Preprocessor1_Model1") collect_predictions(res) |> mutate(resid = Sale_Price - .pred) collect_predictions(cls_res) |> select(.pred_class, .pred_yes, truth)
Need explanation? Read on for examples and pitfalls.
What collect_predictions() does in one sentence
collect_predictions() unpacks the prediction column of a tune_results tibble into a flat, joinable table of held-out predictions. When tune_grid(), fit_resamples(), tune_bayes(), or last_fit() runs with save_pred = TRUE, each row of the returned object stores a nested list column called .predictions. collect_predictions() walks that column, optionally averages predictions across resamples for repeats, and returns a tibble with .pred (or .pred_class for classification), .row, .config, the candidate parameter columns, and the response variable.
You reach for this function any time you need predictions, not metrics. Residual plots, calibration curves, custom yardstick metrics, stacking and ensembling, and per-observation error analysis all start here.
Save predictions during tuning
collect_predictions() returns nothing unless predictions were saved at tune time. The save flag lives on the control object you pass to tune_grid(). The block below tunes an elastic net on Ames housing with predictions retained.
Without control_grid(save_pred = TRUE), .predictions stays empty and collect_predictions(res) returns a zero-row tibble. The same flag is control_resamples(save_pred = TRUE) for fit_resamples() and control_race(save_pred = TRUE) for the finetune racing engines.
nrow(fold) predictions per candidate per metric_set evaluation, which can balloon memory on large grids. Enable save_pred deliberately on small-to-medium grids, or drop columns from the recipe before tuning if memory is tight.collect_predictions() syntax and arguments
The signature is short, but the parameters argument is the one most people miss.
x is the tuning result. summarize = TRUE averages predictions when the same observation appears in multiple repeats of a repeated CV scheme; with simple v-fold CV it has no effect because each row appears in exactly one held-out fold. parameters accepts a one-row tibble (typically from select_best()) and keeps only predictions from that candidate, which is how you turn a 6-candidate, 5-fold result into a single 1873-row tibble of one model's out-of-fold predictions.
The four input shapes return different column sets. The table below shows what to expect.
| Input type | Returns | Key columns |
|---|---|---|
| tune_grid / tune_bayes (regression) | one row per held-out obs per candidate | .pred, .row, params, .config, response |
| tune_grid / tune_bayes (classification) | one row per held-out obs per candidate | .pred_class, .pred_<level>, .row, params, .config, response |
| fit_resamples | one row per held-out obs | .pred, .row, .config, response |
| last_fit | one row per test-set obs | .pred (or .pred_class), .row, response |
Filter to one candidate with parameters
select_best() plus the parameters argument is the standard handoff. Pick the winning candidate on a metric, then ask collect_predictions() to return only that candidate's held-out predictions.
Without parameters, you get predictions for all six candidates stacked together, which is 6 x 1873 rows. Always filter when you only care about one model.
.pred column is the prediction made when that row was in the assessment fold, not the analysis fold. That makes yardstick::rmse(preds_best, Sale_Price, .pred) directly comparable to what collect_metrics() reported, with the bonus that you can swap in any custom metric or residual diagnostic.Aggregated vs per-resample output
summarize = TRUE averages predictions across repeats; with simple v-fold CV the output is identical to FALSE. The difference shows up only when you use vfold_cv(repeats = 5) or a Monte Carlo scheme that revisits the same row.
Both return 11,238 rows here (1,873 obs x 6 candidates) because every observation lands in exactly one held-out fold. Switch to vfold_cv(repeats = 3) and the FALSE form triples while the TRUE form stays at 11,238.
.row. The .row column indexes into the data you passed to tune_grid(). train |> mutate(.row = row_number()) |> inner_join(preds_best, by = ".row") gives you every original column alongside the prediction, which is what you need for grouped error analysis or to plot residuals against an unmodeled covariate.Common pitfalls
Three failure modes account for most of the friction.
- Forgetting
save_pred = TRUE. Without it,.predictionsis empty and collect_predictions() returns zero rows. The fix is at tune time, not at collect time; re-tune withcontrol_grid(save_pred = TRUE). - Skipping
parametersand getting all candidates. Calling collect_predictions(res) on a 50-candidate grid gives a 50x table; downstream code that expects one row per observation silently averages or duplicates. Passparameters = select_best(res, metric = "rmse")whenever you want a single model. - Confusing
.rowwith the row position in the assessment fold..rowindexes the FULL training tibble, not the fold. Use it to join back totrain, not toassessment(folds$splits[[1]]).
Try it yourself
Try it: Pull the held-out predictions for the best RMSE candidate in res, compute residuals, and save the result to ex_resid. The tibble should include Sale_Price, .pred, and a new column resid = Sale_Price - .pred.
Click to reveal solution
Explanation: select_best() returns the one-row tibble of winning hyperparameters; passing it to parameters filters collect_predictions() to that single model. mutate() adds the residual column, and select() trims to the three columns the prompt asked for.
Related tune functions
- collect_metrics() returns summarized metrics from the same tune_results object.
- select_best() returns the winning candidate as a one-row tibble, perfect for
parameters. - show_best() prints the top-N candidates ranked on one metric.
- last_fit() finalizes a workflow and returns a tune_results-shaped object with test predictions.
- augment() on a fitted workflow appends
.predcolumns to new data, the post-tuning analog.
For the official reference, see the tune package documentation.
FAQ
Why does collect_predictions() return zero rows?
The .predictions list column is empty because you tuned without save_pred = TRUE. Re-run tune_grid() with control = control_grid(save_pred = TRUE). The flag is opt-in to keep memory predictable on large grids; it costs roughly nrow(train) * n_candidates * 8 bytes for a regression with default metric sets.
What is the difference between collect_predictions() and predict()?
collect_predictions() gives you held-out predictions from cross-validation, made on data the model never saw during fitting. predict() gives you predictions from a single fitted model on new data, which can be train, test, or arbitrary input. Use collect_predictions() to assess generalization during tuning; use predict() (or augment()) on the final fitted workflow for deployment.
Can I use collect_predictions() with classification models?
Yes. The returned tibble includes .pred_class plus one .pred_<level> column per class containing predicted probabilities. Pass it to yardstick metrics like roc_auc(preds, truth, .pred_yes) for AUC or conf_mat(preds, truth, .pred_class) for a confusion matrix.
Does the parameters argument accept multiple candidates?
Yes. You can pass any tibble whose columns are a subset of the tuning parameters. Multiple rows filter to multiple candidates, which is useful for comparing the top three models or for building a stacking ensemble across the top-K rows of show_best().