recipes step_range() in R: Scale Predictors to a 0-1 Range
The recipes step_range() function in R rescales each numeric predictor to a fixed interval, 0 to 1 by default, using the minimum and maximum learned from the training set. You add it to a recipe(), estimate the ranges with prep(), and apply them with bake().
step_range(rec, all_numeric_predictors()) # rescale all predictors to 0-1 step_range(rec, mpg, hp) # rescale named columns step_range(rec, all_numeric(), min = -1, max = 1) # custom target range step_range(rec, contains("score")) # rescale by name pattern step_range(rec, all_numeric_predictors(), clipping = FALSE) # allow out-of-range output prep(rec) |> bake(new_data = NULL) # estimate ranges, then apply tidy(prep(rec), number = 1) # inspect the learned min/max
Need explanation? Read on for examples and pitfalls.
What step_range() does in R
step_range() linearly rescales a column so its smallest value becomes the target minimum and its largest becomes the target maximum. During prep() it records the minimum and maximum of each selected column. During bake() it applies the formula (x - min) / (max - min), then stretches the result to the requested interval. With the defaults, every column lands between 0 and 1.
This transformation is often called min-max scaling or normalization. It is useful when a model expects bounded inputs, when you want predictors on a common 0-to-1 footing for plotting, or when an algorithm such as a neural network trains more smoothly on a compact range. Unlike standardizing, it preserves the exact shape of the distribution and simply relabels the axis.
step_range() stores the training extremes inside the prepped recipe. When you bake() new data, it reuses those stored values, so test rows are rescaled with training statistics and no information leaks across the split.step_range() syntax and arguments
step_range() attaches a rescaling operation to a recipe. You pass the recipe first, then a set of columns selected with tidyselect helpers.
The arguments you will actually touch:
| Argument | Purpose |
|---|---|
recipe |
The recipe object the step is added to. |
... |
Columns to rescale, chosen with selectors like all_numeric_predictors(). |
min |
Lower bound of the target interval. Default 0. |
max |
Upper bound of the target interval. Default 1. |
clipping |
If TRUE (default), new data outside the training range is clamped to [min, max]. |
ranges |
Filled in by prep(); holds the estimated minimum and maximum per column. |
skip |
If TRUE, the step is ignored when baking new data. Leave FALSE for rescaling. |
Rescaling predictors: worked examples
Build the recipe, prep it, then bake. A recipe is just a plan until prep() estimates the statistics from data. The first example rescales every numeric predictor in mtcars to the default 0-to-1 range.
The outcome mpg is untouched because all_numeric_predictors() excludes it. To confirm the step worked, check the range of each result column.
Every column now runs from exactly 0 to 1. To see the learned extremes, call tidy() on the prepped recipe with the step number.
The min and max arguments change the target interval. Setting min = -1 and max = 1 rescales each column into a symmetric range around zero, a layout many neural network and signal-processing pipelines prefer.
The fourth example shows clipping in action. A new observation whose hp exceeds the training maximum of 335 would scale above 1, but with clipping = TRUE the output is clamped to the interval.
With clipping = FALSE, the same row would return about 1.23, because the linear formula is applied without a bound. Clipping is the safer default for production scoring, where stray extreme values should not push a predictor outside its trained interval.
step_range() vs step_normalize() vs step_scale()
Pick the step that matches the transformation you need. Range scaling, normalizing, and scaling are related but distinct, and recipes gives each its own step.
| Step | What it does | Resulting column |
|---|---|---|
step_range() |
Rescales to a fixed interval | Bounded, default 0 to 1 |
step_normalize() |
Centers and scales together | Mean 0, SD 1, unbounded |
step_scale() |
Divides by the standard deviation | SD 1, original center |
step_center() |
Subtracts the mean | Mean 0, original spread |
Use step_range() when you need predictors inside hard bounds, for example before a model that assumes inputs in [0, 1]. Reach for step_normalize() instead when an algorithm cares about spread relative to the mean, such as regularized regression or principal component analysis. The two are not interchangeable: range scaling is bounded but sensitive to outliers, while normalizing is unbounded but robust to a single extreme value stretching the column.
step_YeoJohnson() or step_BoxCox() first when predictors are heavily skewed, then rescale. Squeezing a skewed column into 0-to-1 leaves the shape exactly as lopsided as before, with most points bunched near one edge.Common pitfalls with step_range()
Watch your column selection and your outliers. The most frequent mistakes come from rescaling the wrong columns or ignoring how a single extreme value behaves.
- Rescaling the outcome.
all_numeric()includes the response variable. Useall_numeric_predictors()so the model still trains and predicts on the original target scale. - Forgetting to prep. Calling
bake()on a recipe that was never prepped throws an error, because the minimum and maximum have not been estimated yet. - Outlier domination. One extreme value sets the maximum, so every other observation gets squashed near 0. Range scaling has no defense against outliers the way standardizing partly does; inspect or cap extremes first.
prep() use training data only.Try it yourself
Try it: Rescale only the hp and wt columns of mtcars to a 0-to-1 range in a recipe, prep it, and save the baked result to ex_ranged.
Click to reveal solution
Explanation: Passing bare column names to step_range() limits the step to just hp and wt. After prep() learns their training minimum and maximum and bake() applies the linear formula, the hp column runs from 0 to 1.
Related recipes steps
step_range() is one of several recipes preprocessing steps. These pair naturally with it in a tidymodels workflow:
- step_normalize() centers and scales to mean 0 and SD 1.
- step_scale() divides each column by its standard deviation.
- step_center() subtracts the mean only.
- step_YeoJohnson() reduces skew before rescaling.
- step_zv() drops zero-variance columns that cannot be rescaled.
step_range() is scikit-learn's MinMaxScaler, or (df - df.min()) / (df.max() - df.min()). The recipes version differs by learning the extremes on training data and reapplying them automatically, with optional clipping for new data.FAQ
What is the difference between step_range() and step_normalize()?
step_range() rescales a column to a fixed interval, 0 to 1 by default, using the training minimum and maximum. The result is bounded but sensitive to outliers, since one extreme value sets an endpoint. step_normalize() instead subtracts the mean and divides by the standard deviation, leaving the column with mean 0 and SD 1. Its output is unbounded but more robust to a single extreme value. Choose step_range() for hard bounds and step_normalize() when spread relative to the mean matters.
What does the clipping argument do in step_range()?
The clipping argument controls what happens when new data falls outside the range learned from training. With clipping = TRUE, the default, any baked value below min or above max is clamped to that bound, so a predictor never leaves its trained interval. With clipping = FALSE, the linear formula is applied without limits, so out-of-range inputs can produce values below 0 or above 1. Keep clipping on for production scoring, where stray extreme values should not escape the interval.
What range does step_range() scale to by default?
By default step_range() rescales each selected column to the interval from 0 to 1, because min = 0 and max = 1. You can change the target interval by passing different values, for example min = -1, max = 1 for a symmetric range around zero. The transformation is linear: the column minimum maps to min, the column maximum maps to max, and every value in between is placed proportionally.
When should I use step_range() instead of step_scale()?
Use step_range() when a model or downstream step needs inputs inside hard bounds, such as a neural network expecting values in [0, 1] or a plot that compares predictors on a common axis. Use step_scale() when you want unit variance while keeping each column's original mean, which suits distance-based and penalized models. Range scaling fixes the endpoints; scaling fixes the spread. They answer different needs and should not be swapped casually.