tune tune_race_anova() in R: ANOVA Hyperparameter Racing
The tune tune_race_anova() function in R, from the finetune package, runs a tidymodels grid against an initial set of resamples, then uses repeated-measures ANOVA to drop losing candidates so the remaining folds only score the survivors.
tune_race_anova(wf, resamples = folds) # default racing with grid = 10 tune_race_anova(wf, resamples = folds, grid = 25) # larger grid to race down tune_race_anova(wf, resamples = folds, grid = my_grid) # explicit candidate tibble tune_race_anova(wf, resamples = folds, metrics = mset) # custom metric set tune_race_anova(wf, resamples = folds, control = ctrl) # control_race(): burn_in, alpha tune_race_anova(wf, resamples = folds, param_info = params) # custom parameter ranges tune_race_anova(model_spec, recipe, resamples = folds) # spec + recipe shortcut plot_race(res) # visualize candidate survival
Need explanation? Read on for examples and pitfalls.
What tune_race_anova() does in one sentence
tune_race_anova() prunes losing hyperparameter candidates after a burn-in set of resamples. You hand it a workflow with at least one tune() placeholder, a resample object, and a grid. The function scores every candidate on the first few folds (the burn-in), then fits a repeated-measures ANOVA on the per-resample metric. Candidates whose 1-sided confidence interval lies entirely below the current leader get dropped, and the next fold only scores survivors. The total number of model fits drops sharply when the parameter range has clear winners and losers.
The function lives in the finetune package, which is part of the tidymodels family but shipped separately. Install it once with install.packages("finetune") and load alongside tidymodels.
Set up a tunable workflow
You need the same three pieces as tune_grid(): spec, recipe or formula, and resamples. Racing only changes how fits are scheduled, not what they are.
Racing benefits from more folds, not fewer. Use 10-fold or higher so the ANOVA has degrees of freedom to detect losers early.
tune_race_anova() syntax and arguments
Most arguments mirror tune_grid(); the racing knobs live in control_race().
| Argument | Description |
|---|---|
object |
A workflow or a model spec. If a spec, pass preprocessor next. |
resamples |
An rset such as vfold_cv(). Need at least burn_in + 2 folds. |
grid |
Integer = space-filling design; tibble = explicit candidates. Larger grids benefit racing the most. |
metrics |
A metric_set(); the FIRST metric in the set drives the race. |
control |
A control_race() object with burn_in, num_ties, alpha, verbose_elim. |
metric_set(rmse, rsq, mae) races on RMSE; the others ride along for diagnostics. If you want to race on R-squared, list it first.The four control_race() knobs are worth knowing by name:
burn_in = 3: how many resamples score every candidate before testing begins.num_ties = 10: candidates within this many resamples of the leader survive even if the ANOVA would drop them.alpha = 0.05: significance level for the elimination test.verbose_elim = FALSE: set to TRUE to log each elimination round.
Examples by use case
Start with a moderately large grid; racing pays off as the grid grows. The following races 25 glmnet candidates across 10 folds.
After racing, show_best(), select_best(), and collect_metrics() work exactly as they do for tune_grid(). The losers simply have fewer fold entries.
For a visual sense of who survived how long, pipe results into plot_race() from finetune.
tune_race_anova() versus alternatives
Pick by how candidates compare across folds, not by raw speed.
| Function | When to reach for it |
|---|---|
tune_race_anova() |
Many candidates, ANOVA-friendly metric, runtime per fit matters. Drops losers using a parametric test. |
finetune::tune_race_win_loss() |
Same racing idea, non-parametric. Robust when metrics are skewed or candidates are highly correlated. |
tune_grid() |
Small grids (<= 10), or when you want every candidate on every fold for downstream stacking. |
tune_bayes() |
Continuous parameters, expensive fits, no need to score a fixed candidate list. |
Racing is a wrapper around the same fitting machinery as tune_grid(); the return object is interchangeable with downstream helpers like select_best(), finalize_workflow(), and last_fit().
Common pitfalls
Three issues account for most failed races.
- Too few folds. Racing needs
burn_in + 2folds minimum, and elimination is unreliable below 10. Bumpvfold_cv(v = 10)rather thanv = 5when you adopt this function. - Tiny grid. A 5-candidate grid rarely produces statistically distinct losers. Racing only pays off above roughly 15 candidates, so size up the grid when switching from tune_grid().
- Wrong leading metric. Racing eliminates against the FIRST metric in
metric_set(). Reorder when the metric you care about is not first; otherwise the wrong candidates survive.
set.seed() ahead of tune_race_anova(), two runs can produce different survivors even on the same data and grid.Try it yourself
Try it: Race a 12-point knn grid on iris using 10-fold cross-validation and accuracy. Use control_race(burn_in = 3) and pick the best k.
Click to reveal solution
Explanation: Twelve neighbor values plus 10 folds would be 120 fits with tune_grid(). Racing drops the obviously bad k values (1, 25) after the burn-in, leaving a handful of mid-range survivors. The accuracy metric leads the race because it is first in the metric set.
Related tidymodels functions
Racing sits next to a short stack of helpers.
tune_grid()for the unraced baseline.tune_bayes()for surrogate-driven search when fits are expensive.finetune::tune_race_win_loss()for the non-parametric racing variant.finetune::tune_sim_anneal()for iterative simulated annealing search.control_race()to set burn_in, alpha, num_ties, and verbose_elim.plot_race()for the candidate-survival ggplot.last_fit()andfinalize_workflow()to lock in the survivor on full train and test.
External reference: the finetune package documentation at finetune.tidymodels.org.
FAQ
How is tune_race_anova() different from tune_grid()?
tune_grid() scores every candidate on every resample, so its total work is candidates x folds fits. tune_race_anova() scores every candidate on the first burn_in resamples, then runs a repeated-measures ANOVA after each subsequent fold. Candidates whose 1-sided confidence interval falls below the current leader are eliminated; remaining folds only score the survivors. On grids of 20 or more with a clearly varying metric, racing typically does 30 to 60 percent of the fits with the same winning candidate.
When should I use tune_race_anova() versus tune_race_win_loss()?
tune_race_anova() is parametric; it assumes the metric distribution across resamples is roughly normal. That holds for RMSE, R-squared, log loss, and accuracy on moderate-size folds. tune_race_win_loss() is a non-parametric ranking test (binomial sign test on pairwise comparisons). Prefer it when the metric is skewed, when fold sample sizes vary a lot, or when ANOVA assumptions are obviously violated.
Does racing change the winning candidate compared to tune_grid()?
In well-conditioned grids, no. Racing is designed to drop candidates that are statistically unlikely to beat the leader, so the survivor matches what tune_grid() would have chosen. Near-tied candidates occasionally swap rank because they are eliminated before the late folds add evidence. Lower alpha (for example alpha = 0.01) to be more conservative, or raise num_ties to keep close runners alive longer.
What control_race() defaults should I tune first?
burn_in is the highest-leverage knob. A burn-in of 3 works on standard benchmark data; bump to 5 when the metric is noisy across folds. alpha = 0.05 is sensible for exploration; tighten to 0.01 when you want to be slow to eliminate. Leave num_ties = 10 alone unless your grid has many near-identical candidates, in which case raise it so close runners survive into the late folds.
Can I parallelize tune_race_anova()?
Yes, and it benefits more than tune_grid() because surviving candidates dominate later folds. Register a parallel backend with library(doFuture); plan(multisession, workers = 4) before the call. Set control_race(parallel_over = "everything") for grids large enough that fold-level parallelism alone leaves cores idle. Racing detects the backend automatically.