workflowsets rank_results() in R: Rank Tuned Workflows
The workflowsets rank_results() function in R sorts every workflow in a populated workflow_set by a chosen performance metric, returning a tibble with one row per tuning configuration so you can read off the winner without writing a custom collect_metrics() filter.
rank_results(res) # rank by the first metric, all configs rank_results(res, rank_metric = "rmse") # rank by a named metric rank_results(res, rank_metric = "roc_auc") # rank by a classification metric rank_results(res, select_best = TRUE) # one best row per workflow rank_results(res, rank_metric = "rmse", select_best = TRUE)# best per workflow by rmse rank_results(res, eval_time = 1.0) # survival models, pick eval time rank_results(res) |> filter(.metric == "rmse") # drop the other metrics manually
Need explanation? Read on for examples and pitfalls.
What rank_results() does
rank_results() turns a fitted workflow_set into a leaderboard. It walks the result column that workflow_map() populated, collects every (.metric, .config) pair from every workflow, sorts the chosen metric in the direction that signals "better" (smaller for rmse, larger for roc_auc), and stamps a rank column on the output. The same tibble also carries wflow_id, .metric, mean, std_err, n, preprocessor, and model, so one glance tells you which model and which preprocessor sit at rank 1.
The function is shape-preserving in its mental model: workflow set in, long tibble out, one row per metric per tuning configuration per workflow. Ties on the rank metric are broken by std_err, smallest first, which is the workflowsets default for "more stable wins."
result column is empty because you forgot workflow_map(), rank_results() errors out before it can rank anything. The fit happens upstream; the ranking is the cheap part.rank_results() syntax and arguments
rank_results() takes the populated workflow set and a metric to rank by. Every other argument is a switch that trims the output.
The x argument is the workflow set returned by workflow_map(). Pass an empty workflow set and rank_results() throws an "object has not been fit" error. The rank_metric argument names one of the metrics you computed during tuning; the default picks the first metric in the tune_results metric_set. The select_best argument collapses every workflow down to its single best tuning row, which is what you want when comparing models head-to-head rather than scanning all tuning configurations. The eval_time argument only applies to censored regression (survival) models where metrics are measured at multiple time horizons.
rank_results() returns a tibble with one row per (workflow, .config, .metric). When select_best = TRUE, the row count drops to (workflows times metrics computed).
Three rank_results() examples
rank_results() pays off once you have a workflow set with two or more tuned workflows. Each example below stays in browser memory by using mtcars and a small grid.
The leaderboard is sorted by mean rmse ascending. The basic_lm workflow takes rank 1 with no tuning needed, because linear_reg() has no tune-able parameters. The norm_knn workflow appears at multiple ranks because each neighbors value is its own configuration.
select_best = TRUE collapses tuning configurations to one row per workflow. This is the right shape for a model-versus-model comparison.
Notice basic_lm and norm_lm produce identical rmse: the recipe step has no effect on linear_reg() because lm rescales internally. rank_results() still keeps both rows so you see the duplication rather than hiding it.
For classification, swap the metric name and the direction flips. rank_results() knows roc_auc is "larger is better" and sorts descending without any extra argument.
The output keeps every metric row (you asked tune_grid for two), but rank is computed off roc_auc alone. Logit wins.
rank_results() compared with siblings
Three workflowsets helpers read the same result column; pick by output shape.
| Helper | Output shape | When to reach for it |
|---|---|---|
rank_results() |
Long tibble sorted with a rank column |
You want a leaderboard you can sort, filter, or print |
collect_metrics() |
Long tibble, all configs, no rank | You want raw metric numbers to plot or aggregate yourself |
autoplot() |
ggplot showing metric vs config per workflow | You want a visual comparison rather than a table |
Use rank_results() when the question is "which workflow won?" Use collect_metrics() when the question is "what do all the numbers look like?" Use autoplot() when you need the answer in a slide.
slice_head() for a one-line top-N table. rank_results(res, rank_metric = "rmse", select_best = TRUE) |> slice_head(n = 3) gives you the three best models in print-ready order, ready for gt() or kable().Common pitfalls
rank_results() fails fast when the workflow set is unfit. Three traps catch most beginners.
- Forgetting
workflow_map()before rank_results(). The set'sresultcolumn is empty until you map a tune function across it. The error reads "object has not been fit" and points at the unfit row. Fix: pipews |> workflow_map("tune_grid", resamples = cv)first.
- Asking for a metric you did not compute. If your
workflow_map()call used the default metric set (rmse + rsq for regression), passingrank_metric = "mae"errors out. Fix: pass ametric_set()toworkflow_map()if you want a non-default ranking metric:workflow_map(ws, "tune_grid", resamples = cv, metrics = metric_set(mae, rmse)).
- Trusting rank 1 on rmse alone without checking
std_err. Two workflows can sit one decimal apart onmeanbut overlap fully on standard error, which means the resamples cannot tell them apart. Always printstd_errand prefer the simpler model when intervals overlap.
.iter from tune_bayes() output by design. The Bayesian tuner records one row per iteration, but rank_results() collapses iterations to the final configurations. If you want the per-iteration trace, call collect_metrics() instead.Try it yourself
Try it: Build a workflow_set with one recipe and two regression models, tune across 3 folds, then rank by rmse keeping only the best config per workflow. Save the leaderboard to ex_leader.
Click to reveal solution
Explanation: workflow_map("tune_grid", ...) populates the result column for both workflows. Passing select_best = TRUE collapses the three tuning configurations from base_tree down to its single best row, giving you a two-row leaderboard with linear_reg() on top.
Related workflowsets functions
rank_results() sits at the read end of the workflowsets pipeline. These helpers wrap around it:
workflow_set()builds the unfit candidate tibble that rank_results() eventually scores.workflow_map()fills theresultcolumn that rank_results() reads from.collect_metrics()returns the same metric rows without sorting or arankcolumn.autoplot()plots the leaderboard as facets of metric vs tuning parameter per workflow.extract_workflow_set_result()pulls one workflow's tune_results back out so you can finalize it.
resamples() plus summary() on a list of train objects. Workflowsets handles both the fitting bookkeeping (workflow_map) and the leaderboard sort (rank_results) in one tibble; caret splits those across resamples() and bwplot().FAQ
Why does rank_results() return more rows than I expected?
By default, rank_results() returns one row per (workflow, tuning config, metric). A workflow set with 2 workflows tuned over 5 grid points with 2 metrics returns 20 rows. Pass select_best = TRUE to collapse to one config per workflow per metric. Filter by .metric to keep one row per (workflow, config). Both operations are safe to chain with dplyr::filter().
Can rank_results() rank workflows tuned with different metrics?
No. Every workflow in the set must share the same metric_set, because rank_results() needs a common column to sort on. If you tuned workflow A with rmse and workflow B with mae, the columns will not line up and ranking is undefined. Re-run workflow_map() with a unified metrics = metric_set(...) call so every row carries identical metrics.
How does rank_results() break ties on the rank metric?
Ties on mean are broken by std_err ascending: the workflow with the tighter resample variance wins. If both mean and std_err are equal, rank_results() uses the row order from the workflow set, which is the order you supplied to workflow_set(). To force a different tiebreaker, sort the output manually with arrange() after ranking.
Do I need select_best = TRUE for unsupervised tuning?
No. select_best = TRUE only makes sense when each workflow had multiple tuning configurations to compare. For untuned workflows (fit_resamples calls), every workflow contributes a single row already, so select_best is a no-op. Leave it FALSE (the default) for those sets.
Does rank_results() respect a custom metric direction?
Yes. If you wrote a custom yardstick metric and registered it with direction = "maximize" or "minimize", rank_results() reads that attribute and sorts in the right direction. Stock metrics like rmse, mae, roc_auc, and accuracy already declare their direction, so you never set it by hand.