dials learn_rate() in R: Tune Boosting Learning Rate

The dials learn_rate() function in R defines the numeric hyperparameter for the per-iteration shrinkage applied in boosted tree and neural network models. It defaults to a log10-transformed range of 10^-10 to 10^-1, which is the search space almost every boosting library expects.

By Selva Prabhakaran · Published May 23, 2026 · Last updated May 23, 2026

⚡ Quick Answer

learn_rate()                                  # default log10 range -10 to -1
learn_rate(range = c(-3, -1))                 # narrower band, still log10
learn_rate(range = c(0.001, 0.3), trans = NULL) # raw scale instead of log
update(params, learn_rate = learn_rate(c(-4, -1)))  # override in param set
grid_regular(learn_rate(c(-3, -1)), levels = 5)     # 5 log-spaced points
boost_tree(trees = tune(), learn_rate = tune())     # tune the pair together
finalize(params, train)                       # no-op for learn_rate, still safe

Need explanation? Read on for examples and pitfalls.

📊 Is learn_rate() the right tool?

What learn_rate() does in one sentence

learn_rate() returns a dials parameter object describing the shrinkage applied to each boosting iteration or the step size used in gradient descent. It is the knob you tune when you mark learn_rate = tune() inside boost_tree(), mlp(), or bart(). Smaller values force the model to take many small steps and usually need more trees to compensate. Larger values converge faster but risk overshooting the minimum and overfitting on the residual structure.

The function is part of the same dials family as trees(), tree_depth(), and min_n(). The one thing that sets it apart is the default log10 transform, which means the range you pass and the values the model actually sees live on different scales.

learn_rate() syntax and arguments

The signature is two arguments with a non-obvious default.

Run live

Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.

Rlearn_rate signature and defaults

library(dials) learn_rate(range = c(-10, -1), trans = transform_log10()) #> Learning Rate (quantitative) #> Transformer: log-10 [1e-100, Inf] #> Range (transformed scale): [-10, -1]

Argument	Description
`range`	Two-element numeric vector. Defaults to `c(-10, -1)` on the log10 scale, i.e. 10^-10 to 10^-1 on the natural scale.
`trans`	A transformation from the scales package. Default `transform_log10()`. Pass `NULL` to search on the raw scale instead.

The return is a quant_param S3 object. Print it to inspect the range, call value_seq() to draw points, or hand it to grid_*() helpers.

RInspect the parameter object

p <- learn_rate(range = c(-3, -1)) p #> Learning Rate (quantitative) #> Transformer: log-10 [1e-100, Inf] #> Range (transformed scale): [-3, -1] value_seq(p, 5) #> [1] 0.001000000 0.005623413 0.031622777 0.177827941 1.000000000

Note

The default range looks tiny but is enormous on the natural scale. c(-10, -1) expands to [10^-10, 0.1]. Few boosting problems benefit from learning rates below 0.001, so most authors tighten the range to c(-3, -1) (0.001 to 0.1) before tuning.

Examples by use case

Start with a tunable boosted tree, pair learn_rate with trees, then build a small grid.

RTunable xgboost spec with learn_rate

library(tidymodels) data(ames, package = "modeldata") ames <- ames |> mutate(Sale_Price = log10(Sale_Price)) set.seed(42) split <- initial_split(ames, prop = 0.8, strata = Sale_Price) train <- training(split) xgb_spec <- boost_tree( trees = tune(), learn_rate = tune(), tree_depth = 6 ) |> set_engine("xgboost") |> set_mode("regression") wf <- workflow() |> add_formula(Sale_Price ~ Gr_Liv_Area + Year_Built + Bldg_Type + Neighborhood) |> add_model(xgb_spec)

Extract the parameter set and tighten the learn_rate range from the wide default to something practical.

ROverride learn_rate range in a param set

params <- extract_parameter_set_dials(wf) |> update(learn_rate = learn_rate(range = c(-3, -1))) params #> Collection of 2 parameters for tuning #> identifier type object #> trees trees nparam[+] #> learn_rate learn_rate nparam[+]

A regular grid over the pair samples five learn_rate values on a log scale and three tree counts on a linear scale.

RRegular grid over trees and learn_rate

xgb_grid <- grid_regular(params, levels = c(trees = 3, learn_rate = 5)) xgb_grid #> # A tibble: 15 x 2 #> trees learn_rate #> <int> <dbl> #> 1 100 0.001 #> 2 1000 0.001 #> 3 2000 0.001 #> 4 100 0.00562 #> 5 1000 0.00562 #> 6 2000 0.00562 #> 7 100 0.0316 #> 8 1000 0.0316 #> 9 2000 0.0316 #> 10 100 0.178 #> ...

The grid's learn_rate column is on the natural scale, even though we passed the range on the log10 scale. dials handles the back-transform for you.

RRaw-scale variant when you do not want log

lr_raw <- learn_rate(range = c(0.01, 0.3), trans = NULL) value_seq(lr_raw, 4) #> [1] 0.01000000 0.10666667 0.20333333 0.30000000

Key Insight

learn_rate and trees move in opposite directions. A small learn_rate of 0.01 typically needs 1000+ trees to fit the same signal that learn_rate 0.1 catches in 200. When you tune both together, expect the optimum corner of the grid to land in the small-learn_rate + large-trees region, then trim the wasteful cells in the next refinement pass.

learn_rate() versus the raw boost_tree(learn_rate = 0.1) argument

Pick by whether you are tuning or fitting one model.

Form	What it does	When to use
`learn_rate()`	Returns a parameter object for tune_grid() to sample	Hyperparameter tuning, parameter set construction
`boost_tree(learn_rate = 0.1)`	Fixes the rate at 0.1 for a single fit	Production, after tuning has chosen a value
`boost_tree(learn_rate = tune())`	Marks the slot as tunable, leaves the range to dials	Inside a workflow you will pass to tune_grid()

Behind the scenes, tune() is a placeholder. extract_parameter_set_dials() walks the spec, sees the placeholder, and asks dials for the default learn_rate() object. You only call learn_rate() directly when you want to override the default range or transform.

Common pitfalls

Four mistakes catch most learn_rate tuning runs in their first iteration.

Reading the default range as natural-scale. c(-10, -1) looks like a tiny set of rates; it is actually 10 orders of magnitude on the natural scale. A 5-point regular grid hits 10^-10, 10^-7.75, 10^-5.5, 10^-3.25, 10^-1, which is useless for any real model.
Tuning learn_rate without tuning trees. A fixed trees = 100 plus a small learn_rate yields a model that barely moves off the intercept. Always tune the pair together, or fix trees high enough (say 1500) and use stop_iter() to find the right ensemble size during fit.
Mixing log and raw scales in a custom grid. A tibble grid with learn_rate = c(0.001, 0.01, 0.1) works only if the workflow's parameter has trans = NULL. If the default log10 transform is still in place, dials interprets those values as exponents and silently samples 10^0.001, 10^0.01, 10^0.1.
Calling finalize() and expecting it to do something. finalize() only fills in unknown() bounds. learn_rate has both bounds set by default, so finalize is a no-op. Harmless to call, but it is not the missing piece if your tune_grid still errors.

Warning

Different engines interpret learn_rate differently at the extremes. xgboost clamps to [0, 1]. lightgbm allows higher values up to 2 or 3 in practice. h2o has its own scaling. Always check the engine's accepted range before pushing the upper bound above -1 on the log10 scale.

Try it yourself

Try it: Build a tunable boost_tree on mtcars (regression on mpg), tighten learn_rate to the practical range 0.005 to 0.2 with log10 transform, and produce a regular grid with 4 learn_rate values and 3 tree counts. Print the grid.

RYour turn: tune learn_rate on mtcars

# Try it: tune learn_rate on mtcars library(tidymodels) ex_spec <- boost_tree(trees = tune(), learn_rate = tune(), tree_depth = 4) |> set_engine("xgboost") |> set_mode("regression") ex_wf <- workflow() |> add_formula(mpg ~ .) |> add_model(ex_spec) ex_params <- # your code here ex_grid <- # your code here ex_grid #> Expected: a 12-row tibble with columns trees and learn_rate

Click to reveal solution

RSolution

ex_params <- extract_parameter_set_dials(ex_wf) |> update(learn_rate = learn_rate(range = c(log10(0.005), log10(0.2)))) ex_grid <- grid_regular(ex_params, levels = c(trees = 3, learn_rate = 4)) ex_grid #> # A tibble: 12 x 2 #> trees learn_rate #> <int> <dbl> #> 1 100 0.005 #> 2 1000 0.005 #> 3 2000 0.005 #> 4 100 0.0189 #> 5 1000 0.0189 #> 6 2000 0.0189 #> 7 100 0.0712 #> 8 1000 0.0712 #> 9 2000 0.0712 #> 10 100 0.2 #> 11 1000 0.2 #> 12 2000 0.2

Explanation: Wrapping the raw bounds in log10() keeps the parameter on the log10 scale so the four sampled values are spaced geometrically between 0.005 and 0.2. Crossed with three tree counts, the grid has 12 candidates.

learn_rate() rarely tunes alone; the typical call lives inside a small cluster.

trees() to set the ensemble size that pairs with it. Small learn_rate needs many trees.
tree_depth() to control individual tree complexity in boosting.
stop_iter() to halt boosting once validation stops improving, which protects you from oversized tree counts.
loss_reduction() to tune the minimum gain required to split a node (gamma in xgboost).
extract_parameter_set_dials() to pull every tunable parameter from a workflow at once.
grid_regular(), grid_space_filling(), grid_max_entropy() to expand the parameter set into a candidate tibble.

External reference: the official dials documentation at dials.tidymodels.org.

FAQ

What is a good default learning rate for xgboost in R?

For most tabular regression and classification problems, a learn_rate between 0.01 and 0.05 paired with 500 to 2000 trees lands close to the optimum. Smaller rates like 0.005 work when the signal is subtle and you can afford the compute. Rates above 0.1 tend to overfit on small datasets and skip past the minimum on noisy data. xgboost's own default is 0.3, which is tuned for speed not accuracy; lower it before you tune anything else.

Why does learn_rate() use a log scale by default?

Because the practically useful range spans three to four orders of magnitude, from roughly 0.001 to 0.3. A linear grid would waste most candidates in the upper half of the range, where models overfit or fail to converge. A log10 grid spaces candidates geometrically, so a 5-point grid hits 0.001, 0.003, 0.01, 0.03, 0.1 instead of 0.001, 0.075, 0.15, 0.225, 0.3. The log transform makes the search efficient even when the optimal rate is close to the lower end.

How is dials learn_rate() different from setting learn_rate in xgboost directly?

dials learn_rate() builds a parameter object that tune_grid() can sample across many candidate values. Setting boost_tree(learn_rate = 0.1) fixes the rate at one value for a single fit. The dials version is what you reach for during hyperparameter search. The direct value is what you write into your production spec once tuning has chosen a winner.

Do I need finalize() with learn_rate() like I do with mtry()?

No. learn_rate() ships with both bounds set, so is_unknown() returns FALSE on both endpoints. finalize() only replaces unknown bounds with data-derived values, so calling it on learn_rate is harmless but does nothing. mtry() is different because its upper bound depends on the predictor count, which dials cannot know until you hand it data.

Can I tune learn_rate for a neural network instead of boosting?

Yes. mlp(learn_rate = tune()) with set_engine("keras") accepts the same dials learn_rate() parameter. The default log10 range still applies, though neural networks often want a narrower band like c(-4, -2). The mechanics are identical: extract the parameter set, optionally update the range, build a grid, fit through tune_grid().

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

dials learn_rate() in R: Tune Boosting Learning Rate

What learn_rate() does in one sentence

learn_rate() syntax and arguments

Examples by use case

learn_rate() versus the raw boost_tree(learn_rate = 0.1) argument

Common pitfalls

Try it yourself

FAQ

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

dials learn_rate() in R: Tune Boosting Learning Rate

What learn_rate() does in one sentence

learn_rate() syntax and arguments

Examples by use case

learn_rate() versus the raw boost_tree(learn_rate = 0.1) argument

Common pitfalls

Try it yourself

Related tidymodels functions

FAQ