tune tune_race_win_loss() in R: Win-Loss Racing
The tune tune_race_win_loss() function in R, from the finetune package, races tidymodels hyperparameter candidates with pairwise head-to-head matchups across resamples and drops losers using a log-rank style win/loss tally instead of an ANOVA fit.
tune_race_win_loss(wf, resamples = folds) # default win/loss racing tune_race_win_loss(wf, resamples = folds, grid = 30) # larger candidate grid tune_race_win_loss(wf, resamples = folds, grid = my_grid) # explicit candidate tibble tune_race_win_loss(wf, resamples = folds, metrics = mset) # custom metric set tune_race_win_loss(wf, resamples = folds, control = ctrl) # control_race(): burn_in, alpha tune_race_win_loss(wf, resamples = folds, param_info = params) # custom parameter ranges tune_race_win_loss(spec, recipe, resamples = folds) # spec + recipe shortcut plot_race(res) # visualize candidate eliminations
Need explanation? Read on for examples and pitfalls.
What tune_race_win_loss() does in one sentence
tune_race_win_loss() runs a pairwise tournament between hyperparameter candidates and drops losers. You hand it a workflow with at least one tune() placeholder, a resample object, and a grid. The function scores every candidate on a burn-in set of folds, then for each new fold counts how many head-to-head matchups each candidate wins against every other. A log-rank style test compares those win counts; candidates whose tally falls significantly behind the current leader are dropped, and the next fold only fits the survivors. The number of model fits drops sharply when one parameter combination clearly dominates.
The function lives in the finetune package, shipped separately from the tidymodels metapackage. Install it once with install.packages("finetune") and load it alongside tidymodels.
When to choose win-loss over ANOVA racing
Pick win-loss racing when your metric distribution is skewed, bounded, or non-normal. The companion function tune_race_anova() runs a repeated-measures ANOVA on per-resample metric values, so it leans on roughly normal residuals. Win/loss racing only needs to count which candidate beat which, so it is robust to:
- Bounded metrics near the ceiling (accuracy at 0.98 across folds, ROC AUC near 1.0).
- Heavy-tailed loss metrics where one bad fold skews ANOVA badly.
- Classification metrics that vary in chunks rather than a smooth scale.
Set up a tunable workflow
Build the same three pieces you would for tune_grid(): spec, recipe, resamples. Racing changes how fits are scheduled, not what they are.
Racing methods benefit from more resamples, not fewer. Use 10 folds minimum so the win/loss tournament has enough matchups to detect losers early.
Three tunable parameters give a candidate grid wide enough that racing actually saves work. Two-parameter grids rarely benefit, because the burn-in already covers most candidates.
tune_race_win_loss() syntax and arguments
The signature mirrors tune_grid() with one extra control knob.
Key arguments:
| Argument | What it controls |
|---|---|
grid |
Integer asks dials to build a space-filling grid; a tibble lets you pass an explicit candidate set. |
metrics |
A metric_set(); the first metric ranks candidates. |
control |
A control_race() object: burn_in, alpha, num_ties, randomize, verbose_elim. |
param_info |
Override default dials ranges for any tune() placeholder. |
burn_in sets resamples evaluated before elimination starts, alpha is the p-value cutoff, and num_ties controls how many ties to allow before declaring a winner.Examples by use case
Race a default grid with default settings first. Get a baseline timing before tuning the control object.
show_best() displays only the survivors. Candidates dropped during racing show fewer than 10 resamples in their n column when inspected with collect_metrics().
The plot_race() chart is the fastest way to see how aggressive the elimination was. A wall of short stumps means racing eliminated most candidates early; many long lines means the candidates were close in performance.
alpha = 0.10 is more aggressive than the default 0.05; weaker candidates drop after the burn-in folds. The verbose_elim flag logs the elimination count per fold, which is helpful when debugging why racing finished too fast.
The end-to-end flow is identical to tune_grid(): select, finalize, last_fit. Only the search step changed.
tune_race_win_loss() versus alternatives
| Function | Mechanism | Best for | Avoid when |
|---|---|---|---|
tune_race_win_loss() |
Pairwise win/loss tally per fold, log-rank test | Classification, bounded metrics, skewed losses | Few candidates, very similar performance |
tune_race_anova() |
Repeated-measures ANOVA on per-fold metric | Regression with normal residuals | Heavy-tailed or bounded metrics |
tune_grid() |
Score every candidate on every fold | Small grids, when you want full diagnostics | Large grids and expensive fits |
tune_bayes() |
Gaussian-process surrogate, iterative | Continuous params, expensive single fits | Mostly-categorical hyperparameters |
tune_sim_anneal() |
Simulated annealing search | Bumpy loss surfaces, slow models | Cheap fits where a grid is fine |
tune_bayes() or tune_sim_anneal() when you need to search continuous space, then race a fine grid around the best region.Common pitfalls
Skipping the burn-in. Setting burn_in = 1 looks tempting but lets a single lucky fold eliminate strong candidates. Three to five burn-in folds keeps elimination decisions stable.
Treating low n_survivors as a bug. If only two candidates remain after racing, racing did its job. Inspect with collect_metrics(race, summarize = FALSE) to confirm losers had clear evidence against them.
Forgetting parallel registration. Racing is embarrassingly parallel across folds and survivors. Without a registered backend, win/loss racing on 10 folds runs single-threaded.
tune_grid() for grids of size two.Try it yourself
Try it: Race a decision_tree(cost_complexity = tune(), tree_depth = tune()) workflow against iris_bin with grid = 15 and an aggressive alpha = 0.10. Save the result to ex_race.
Click to reveal solution
Explanation: The call mirrors the random forest race. The decision_tree() spec carries the two tune() placeholders; grid = 15 builds a space-filling design across both, and the aggressive alpha = 0.10 prunes losers quickly. show_best() returns only the survivors with their per-fold scores.
Related tidymodels functions
tune_race_anova(): ANOVA-based racing for regression and well-behaved metrics.tune_grid(): exhaustive grid search; the baseline you race against.tune_bayes(): Bayesian optimization for continuous hyperparameters.control_race(): burn-in, alpha, and tie controls for racing methods.plot_race(): visualize per-fold candidate survival.
Official finetune racing reference
FAQ
When should I use tune_race_win_loss() instead of tune_race_anova()?
Use win/loss when the metric is bounded, skewed, or categorical-like, classification accuracy, ROC AUC near 1.0, and Brier scores all qualify. ANOVA wants approximately normal per-resample residuals, which regression RMSE usually meets but classification accuracy rarely does. Empirically, win/loss eliminates fewer false losers when metrics cluster near a ceiling.
Does tune_race_win_loss() need a parallel backend?
It runs single-threaded by default. Racing scales well across cores because each fold fits independent models for each survivor. Register a backend with doParallel::registerDoParallel() before the call to use multiple cores. On a 10-fold race with 20 candidates, parallel typically cuts wall time by 60 to 80 percent.
Why does plot_race() show some lines stopping mid-axis?
Each line traces one candidate's per-fold metric. A line that stops mid-axis means racing eliminated that candidate at the corresponding fold; only survivors run the full length. The visualization is the fastest way to confirm racing eliminated candidates aggressively rather than dragging losers through every fold.
Can I race a custom grid instead of a generated one?
Yes. Pass a tibble of candidate values to the grid argument; the column names must match the tune() placeholders. This is useful when you have prior knowledge of a sensible parameter range and want to skip the dials space-filling design.
Does tune_race_win_loss() support multi-metric scoring?
It accepts a metric_set() with multiple metrics, but only the first metric drives elimination decisions. The remaining metrics are recorded for survivors and shown by collect_metrics(). Order the set deliberately, the metric you put first is the one racing optimizes.