tune tune_bayes() in R: Bayesian Hyperparameter Search
The tune tune_bayes() function in R runs iterative Bayesian optimization over a tidymodels workflow, fitting a Gaussian process surrogate on past results to propose smarter candidates each round instead of scoring a fixed grid.
tune_bayes(wf, resamples = folds) # default 5-point init + 10 iter tune_bayes(wf, resamples = folds, iter = 25) # longer search tune_bayes(wf, resamples = folds, initial = 15) # bigger initial design tune_bayes(wf, resamples = folds, initial = init_res) # reuse a prior tune_grid() tune_bayes(wf, resamples = folds, metrics = mset) # custom metric set tune_bayes(wf, resamples = folds, objective = exp_improve()) # expected improvement tune_bayes(wf, resamples = folds, param_info = params) # custom ranges tune_bayes(wf, resamples = folds, control = ctrl_bayes) # verbose, no_improve, parallel
Need explanation? Read on for examples and pitfalls.
What tune_bayes() does in one sentence
tune_bayes() learns where the metric is best, then samples there. You give it a workflow whose model spec contains tune() placeholders, a resample object, and an iteration budget. The function first scores a small space-filling initial design, fits a Gaussian process to the metric values, then repeatedly maximizes an acquisition function (expected improvement by default) to pick the next candidate. Each iteration adds one row to the surrogate and refits it, so the search concentrates on promising regions instead of treating every grid point equally.
The return value is a tune_results object identical in shape to tune_grid(), which means collect_metrics(), show_best(), select_best(), and finalize_workflow() work without changes. The difference is fits per run: a 15-point grid scored on 10 folds is 150 fits, while a tune_bayes() call with 5 initial points and 10 iterations is 150 fits scattered intelligently across the parameter space.
Set up a tunable workflow
You need three pieces before calling tune_bayes(). A model spec with tune() placeholders, a recipe or formula, and a resample object that splits the training data. Continuous parameters with broad ranges benefit most from Bayesian search.
The example below tunes a random forest via the ranger engine. Three knobs vary at once: mtry, min_n, and trees. With nine possible values of mtry, ten of min_n, and a continuous trees range, an exhaustive grid quickly explodes; Bayesian search stays tractable.
extract_parameter_set_dials(wf) |> finalize(train) or pass param_info so ranges are concrete before search begins.tune_bayes() syntax and arguments
The signature is short, but a few arguments do most of the work.
| Argument | Description |
|---|---|
iter |
Bayesian iterations after the initial design. Each iteration adds one candidate. |
initial |
Integer = size of the initial space-filling design; a prior tune_grid() result reuses its rows as starting data. |
objective |
Acquisition rule. exp_improve() balances exploration and exploitation; conf_bound(kappa) tilts toward exploration. |
param_info |
A parameters() object with finalized ranges. Required when any range depends on the data, such as mtry. |
control |
control_bayes() options. Notable ones: no_improve, verbose_iter, save_pred, parallel_over. |
Examples by use case
Start with the simplest Bayesian call, then refine. Finalize the parameter set first so mtry has a concrete upper bound.
control_bayes(seed = 99) so the acquisition optimizer is also deterministic across machines.Once results are back, the helpers behave exactly as with grid search.
Reuse a prior tune_grid() call as the initial design when you already have grid results, which avoids paying for a fresh space-filling sample.
Switch the acquisition function when the search keeps revisiting the same neighborhood. conf_bound() with a larger kappa pushes the next pick away from current best estimates.
tune_bayes() versus tune_grid() and tune_sim_anneal()
Pick by parameter type and fit cost, not by sophistication.
| Function | When to reach for it |
|---|---|
tune_bayes() |
Two or more continuous parameters and fits that take seconds or minutes. Surrogate amortizes the search cost. |
tune_grid() |
A handful of discrete settings or when you want every candidate scored for reporting. Easiest to reason about. |
finetune::tune_sim_anneal() |
Mixed continuous and integer parameters where the surface looks rugged. Cheaper per iteration than the GP fit. |
finetune::tune_race_anova() |
Large fixed grids where most candidates are obviously bad. Drops losers after a few folds. |
Bayesian search shines when fits are slow and ranges are wide. With a 50ms fit, tune_grid(grid = 50) beats tune_bayes(iter = 30) on wall clock; with a 30s fit, the calculus flips.
Common pitfalls
Three errors account for most failed Bayesian runs.
- Unfinalized parameter set. Calling
tune_bayes()without finalizingmtryor any other data-dependent range throws! Some parameter values are unknown. Always passparam_infobuilt withfinalize(train). - Too few initial points. A surrogate fit on three candidates is meaningless; the search wanders. Use at least one initial point per tunable parameter, ideally more.
- Confusing iterations with fits.
iter = 25means 25 single candidates added on top ofinitial. Total fits is(initial + iter) * folds, notiter * folds. Budget accordingly.
control_bayes(no_improve = 10) is set and no improvement appears for 10 iterations, the search exits early. Inspect collect_notes(res) to confirm the run stopped on the rule, not on a fit error.Try it yourself
Try it: Tune an xgboost classifier on the iris data with Bayesian search. Vary trees and learn_rate over 10 iterations, score by accuracy, and identify the best candidate.
Click to reveal solution
Explanation: extract_parameter_set_dials(ex_wf) builds the parameter set; update() widens learn_rate since the default range (10^-10 to 10^-1) is too narrow at the lower end. Six initial points plus ten Bayesian iterations score sixteen candidates against five folds.
Related tidymodels functions
Most tune_bayes() runs sit inside a short chain of helpers.
tune_grid()to score a fixed candidate set or seed a Bayesian warm start.fit_resamples()to resample a workflow with no tunable arguments.last_fit()to refit the finalized workflow on the full train and score the test set.finalize_workflow()andselect_best()to lock in the winning hyperparameters.extract_parameter_set_dials()andfinalize()to discover and resolve parameter ranges.control_bayes()to toggle verbose iteration logs, early stopping, parallel backend, and prediction saving.
External reference: the official tune package documentation at tune.tidymodels.org.
FAQ
How many iterations should I set for tune_bayes()?
Budget the initial design first, then add iterations as patience allows. A good rule is five initial points per tunable parameter and fifteen to thirty iterations on top. If control_bayes(no_improve = 10) is set, the search exits early when the surrogate stops finding gains, so over-budgeting is mostly free. With one-second fits and ten folds, sixty total candidates take roughly ten minutes; scale linearly from there.
What is the difference between exp_improve() and conf_bound()?
exp_improve() weights candidates by expected gain over the current best metric. It exploits when the surrogate is confident and explores otherwise. conf_bound(kappa) adds kappa standard deviations to the surrogate mean, so larger kappa forces exploration. Default is exp_improve(); switch to conf_bound(kappa = 2) when the search collapses to a single neighborhood and you suspect a better region exists elsewhere.
Can I run tune_bayes() in parallel?
Yes, but iteration logic is sequential by nature. Register a backend before the call (doFuture plus plan(multisession, workers = 4)) and the function parallelizes folds within each iteration, not iterations themselves. Set control_bayes(parallel_over = "everything") only when the inner per-iteration grid is large; for typical single-candidate iterations, "resamples" is fastest.
Why does the initial design matter so much?
The surrogate's quality depends on how well the initial points cover the space. Too few points or a clustered design produce a misleading mean surface, and the acquisition function chases noise. Use at least one point per parameter, prefer space-filling designs over random ones, and reuse a previous tune_grid() result via initial = prior_res when you have one.
Does tune_bayes() always beat tune_grid()?
No. With one or two discrete parameters and cheap fits, an exhaustive grid is faster and reports more uniformly. Bayesian search wins when the parameter space is continuous, the dimension is at least three, fits cost seconds or more, and you accept that results are no longer a uniform map of the metric surface. Prototype with tune_grid(grid = 10), then escalate to tune_bayes() once the ranges feel right.