dials grid_max_entropy() in R: Maximally Spread Tuning Grid
The dials grid_max_entropy() function in R draws a space-filling hyperparameter sample that maximises the minimum distance between candidates, returning a tibble of tuning combinations whose joint coverage exceeds Latin hypercube and random search at the same row budget.
grid_max_entropy(penalty(), size = 25) # one parameter, 25 candidates grid_max_entropy(penalty(), mixture(), size = 30) # two parameters grid_max_entropy(penalty(range = c(-4, 0)), size = 20) # custom log10 range grid_max_entropy(extract_parameter_set_dials(wf), size = 50) # from a workflow grid_max_entropy(mtry(c(1, 10)), trees(c(100, 500)), size = 25) # tree-model knobs grid_max_entropy(pset, size = 40, variogram_range = 0.6) # tune the spread grid_max_entropy(pset, size = 40, original = FALSE) # transformed-scale draws set.seed(1); grid_max_entropy(penalty(), size = 10) # reproducible sample
Need explanation? Read on for examples and pitfalls.
What grid_max_entropy() does in one sentence
grid_max_entropy() picks candidates that maximise the minimum pairwise distance in the parameter space. You pass dials parameter objects (or a parameters() set), say how many rows you want, and the function returns a size-by-p tibble whose points are pushed as far apart as the bounding box allows. The same size value drives the grid regardless of parameter count.
It lives in dials and is re-exported by tidymodels. Reach for it whenever every model fit is expensive enough that a near-duplicate candidate would hurt. The trade-off versus grid_latin_hypercube() is a few seconds of design work for tighter joint spread.
How maximum-entropy sampling works
The algorithm draws an oversample, then keeps a subset whose pairwise distances are maximised. Internally dials samples a Latin hypercube pool, then picks points that maximise an entropy criterion based on the determinant of the spatial correlation matrix. The result is size points whose Voronoi cells are as equal as the box permits.
Compared to grid_latin_hypercube() at the same size, the design has fewer near-collisions and more even spacing. The improvement is modest in 2D and grows with parameter count.
grid_max_entropy() syntax and arguments
The signature follows the rest of the dials grid family with one extra knob.
| Argument | Description |
|---|---|
x, ... |
Parameter objects: penalty(), mixture(), mtry(), or a parameters() set. |
size |
How many candidates to draw. Each is selected from an internal oversample to maximise pairwise spread. |
original |
If TRUE, returns natural-scale values; if FALSE, returns transformed-scale values useful for plotting the design geometry. |
variogram_range |
Controls the spatial correlation length used by the entropy criterion. Lower values push points further apart; the default 0.5 works for most designs. |
Max entropy has no filter argument because dropping rows would undo the optimised spacing. Set parameter ranges first and let the algorithm fill them.
Examples by use case
Start with a small two-parameter grid where the spread is visible. A 16-row max-entropy design covers the [10^-4, 1] x [0, 1] box without the diagonal clustering Latin hypercube sometimes produces.
The 16 candidates cover the rectangle with a larger minimum pairwise distance than a Latin hypercube of the same size. Worth the optimisation overhead when each elastic-net fit takes a minute.
Reading from a workflow keeps parameter identifiers in sync with the model spec.
Pass the tibble straight to tune_grid(); only the geometry differs from a random or LHS call.
The payoff shows up in higher-dimensional sweeps. With six knobs a levels = 4 regular grid hits 4,096 candidates; a 60-row max-entropy design covers the same box with 68x fewer fits while keeping the minimum distance large.
grid_max_entropy() vs grid_latin_hypercube() vs grid_random()
Choose by how badly clustered candidates would hurt, not by what sounds rigorous.
| Function | Coverage style | When to reach for it |
|---|---|---|
grid_random() |
Independent uniform draws | Quick prototyping, very high dimensions, no concern about clustering. |
grid_latin_hypercube() |
Stratified, one sample per axis bin | Sensible default for 3 to 6 continuous knobs when fits are cheap. |
grid_max_entropy() |
Post-optimised for maximum joint spread | Expensive fits where every candidate must be far from the others. |
grid_regular() |
Exhaustive cartesian product | Two-parameter sweeps you intend to plot as a heatmap. |
Pick max entropy when the cost of a wasted candidate exceeds the cost of running the optimiser. Elastic-net or k-NN sweeps that fit in seconds rarely need it; boosted trees on a million rows where each fit takes five minutes benefit many times over.
Common pitfalls
Three traps account for most max-entropy surprises.
- Treating size as a free parameter. The optimiser scales with
size^2, so asize = 1000request takes minutes to design. For 3 to 6 parameters, keepsizebetween 25 and 100. - Unfinalized data-dependent parameters.
mtry()has no default upper bound; pass an explicit range or callfinalize(mtry(), train)before sampling, or the function errors. - Confusing variogram_range with parameter range.
variogram_rangeis the spatial correlation length, not the parameter bounds. Lower values push points apart more aggressively; values outside[0.1, 0.9]destabilise the design.
set.seed() with the grid call in the same chunk so reviewers can reproduce the candidate set.grid_space_filling(pset, size = N, type = "max_entropy") as an umbrella that dispatches to max entropy, Latin hypercube, or random. The direct grid_max_entropy() still works and reads more clearly when the design was chosen upfront.Try it yourself
Try it: Build a max-entropy grid for an elastic-net workflow with penalty on the log10 range [-3, 0] and mixture on [0, 1], drawing 15 candidates with set.seed(7). The result should have 15 rows and 2 columns.
Click to reveal solution
Explanation: grid_max_entropy() takes one size regardless of parameter count, so 15 rows of two columns means the optimiser picked 15 points whose pairwise minimum distance in the [-3, 0] x [0, 1] box is as large as the entropy criterion allows. The seed pins the oversample and the selection so the tibble is identical across sessions.
Related dials functions
grid_max_entropy() sits inside a family; reach for the right neighbour when maximal spread is the wrong objective.
grid_latin_hypercube()when stratified one-per-axis coverage is enough and design time matters.grid_random()when the budget is large and clustering is acceptable.grid_regular()for an exhaustive cartesian product on 2 or 3 parameters you plan to plot.grid_space_filling()as the new umbrella in dials 1.2+, with atypeargument that selects max entropy, Latin hypercube, or random.parameters()andextract_parameter_set_dials()to assemble a tuning specification from a workflow.finalize()to pin data-dependent ranges such asmtry()against a training set before grid construction.update()to override one parameter's range inside an existing parameter set without rebuilding it.
External reference: the official dials package documentation at dials.tidymodels.org/reference/grid_max_entropy.html.
FAQ
What does "maximum entropy" mean in plain terms?
Entropy here measures how spread out a set of points is. A maximum-entropy grid is the configuration that makes the points as informative as possible about the response surface, given the budget. Dials maximises the determinant of the spatial correlation matrix; intuitively, this pushes candidates apart so no two carry redundant information about the same neighbourhood.
How is grid_max_entropy() different from grid_latin_hypercube()?
Both return a size-by-p tibble for tune_grid(). Latin hypercube guarantees one sample per axis stratum but lets the joint pattern cluster. Max entropy starts from a Latin hypercube oversample and picks a subset whose pairwise distances are maximised. Use max entropy when each fit is expensive; Latin hypercube when fits are cheap.
How many candidates should I draw with grid_max_entropy()?
The optimiser scales roughly with size^2, so design time grows fast above size = 200. For 2 to 3 parameters, 20 to 30 is plenty. For 4 to 6 parameters, 50 to 100. Above that, switch to tune_bayes() because model-based search recycles earlier fits more efficiently than any one-shot grid.
Can I tune qualitative parameters with grid_max_entropy()?
Yes, with caveats. Qualitative parameters such as weight_func() enter the entropy calculation through integer encoding, so the design treats them as ordinal. If the levels are unordered, the spread guarantee weakens. A practical fix is to tune qualitative parameters separately with grid_regular() and use max entropy only for the continuous knobs.
Should I prefer grid_max_entropy() or grid_space_filling()?
Both produce the same sample when grid_space_filling(type = "max_entropy") is used; the umbrella lets one call switch designs by changing an argument. New code comparing designs benefits from the umbrella; code that picked max entropy upfront reads more clearly with the direct call.