yardstick mase() in R: Scale-Free Forecast Error
The yardstick mase() function in R returns the mean absolute scaled error: your forecast's MAE divided by a naive seasonal forecast's MAE on the same data. A value below 1 means your model beats the baseline, and the score compares cleanly across series of any scale.
mase(df, truth, estimate) # basic, lag 1 baseline mase(df, truth, estimate, m = 12) # monthly seasonal lag mase(df, obs, pred, mae_train = train_mae) # pin in-sample baseline mase(df, obs, pred, m = 7) # daily data, weekly lag df |> group_by(series) |> mase(obs, pred, m = 12) # per series mase(df, obs, pred, na_rm = TRUE) # drop missing rows mase_vec(truth_vec, pred_vec, m = 12) # vector interface
Need explanation? Read on for examples and pitfalls.
What mase() measures
mase() rescales your forecast error by the error of a naive seasonal forecast. You pass a data frame with the observed outcome, predicted values, and integer lag m, and the function returns a one-row tibble with .metric, .estimator, and .estimate. The estimate is unit-free: 0.7 means 30 percent better than the seasonal naive baseline, 1.5 means 50 percent worse.
Because the denominator is a naive forecast on the same series, MASE survives outcomes that cross zero and series with different scales, the two situations that break MAPE. Hyndman and Koehler proposed it in 2006 as the headline for cross-series forecasting reports.
mase() syntax and arguments
The signature adds two arguments to the standard yardstick numeric pattern. m is the seasonal lag, and mae_train pins the denominator to the training set instead of the test data.
| Argument | Description |
|---|---|
data |
A data frame with the truth and estimate columns. |
truth |
Unquoted column name of the observed numeric outcome. |
estimate |
Unquoted column name of the predicted numeric values. |
m |
Seasonal period (1 for non-seasonal, 12 for monthly, 7 for daily). |
mae_train |
Optional in-sample naive MAE, computed from training data. |
na_rm |
If TRUE, drop rows where either column is missing before scoring. |
Truth and estimate must both be numeric, and rows must be ordered in time. Without mae_train, mase() computes the naive MAE from the test data, which biases the score toward 1 on short holdouts.
MASE in action: four worked examples
The examples fit a simple linear-trend model to AirPassengers and score the holdout. Load the package and build a train and test pair first.
Example 1 calls mase() with the default lag of 1. With m = 1 the denominator is the random-walk forecast, which is a low bar for monthly airline data.
A 2.83 reads as "this linear-trend forecast is about 2.8 times worse than predicting the previous month's value", expected on highly autocorrelated monthly data where lag-1 is a hard baseline.
Example 2 switches to a monthly seasonal lag. Setting m = 12 swaps the denominator for the seasonal naive forecast, the right baseline for monthly data with yearly seasonality.
A MASE of 1.42 still means the linear-trend model loses to "same month last year"; adding seasonality to the model is the obvious next step.
Example 3 pins the denominator with mae_train. Production-grade reports compute the naive MAE on the training set and pass it in, so the metric stays comparable across holdouts.
The 1.18 differs from 1.42 because 132 training months make a sturdier denominator than 12 holdout months. For any side-by-side model comparison, always pass mae_train.
Example 4 scores several series in one call. When forecasts for multiple products or regions live in the same tibble, group_by() plus mase returns one score per series.
Grouping puts both forecasts in one frame: linear-trend beats the global mean by a wide margin, though both still lose to the seasonal naive baseline.
m to match the dominant seasonality, not the forecast horizon. For monthly data with yearly cycles, m = 12; for daily retail data, m = 7; for quarterly economic series, m = 4. Setting m = 1 on strongly seasonal data inflates the score and hides genuine model gains.When to pick mase() over its neighbors
MASE is the headline metric for cross-series forecasting reports. The table maps the other yardstick regression metrics to the situations where they outperform MASE.
| Metric | Best use case | Limitation |
|---|---|---|
mase() |
Cross-series, scale-free, baselined to naive forecast | Needs ordered data and a sensible m |
mape() |
Single-percent reading per series | Explodes near zero, asymmetric |
smape() |
Symmetric percentage error | Loses the simple "percent of truth" reading |
mae() |
Outlier-robust error in outcome units | Not comparable across series with different scales |
rmse() |
Penalises large misses harder than small ones | Sensitive to outliers, not scale-free |
rsq() |
Unit-free 0-to-1 goodness-of-fit | Can mask large systematic bias |
The Hyndman heuristic: MASE as the leaderboard, MAE for the engineering view in raw units, RMSE when large misses need extra weight.
Common pitfalls
Three small mistakes cover most mase() failures.
The first is leaving m = 1L on seasonal data. Monthly, weekly, or daily series with strong periodicity make the lag-1 random walk a weak baseline, which inflates MASE and hides model gains. Set m to match the season length.
The second is computing MASE without mae_train. On a short holdout, the test-set denominator wobbles between resamples and scores stop being comparable. Compute the naive MAE once on the training series and reuse it:
The third is feeding mase() rows out of order. yardstick does not reorder by date, so a tibble shuffled by arrange() produces a meaningless denominator. Sort by time before scoring.
Try it yourself
Try it: Use the train and test tibbles from above. Fit a second model that adds a sinusoidal yearly term (sin(2*pi*t/12) and cos(2*pi*t/12)), generate predictions on test, and compute MASE with m = 12 and mae_train. Save the result to ex_mase_seasonal.
Click to reveal solution
Explanation: Sin and cos terms encode the yearly cycle the trend-only model missed, so the forecast now beats the seasonal naive baseline by more than half. Any MASE below 1 means "better than the free baseline".
Related yardstick metrics
mase() lives in the yardstick numeric-metric family. Reach for these neighbors when MASE is not the right scorecard:
mape()for a single-percent error reading per seriessmape()for a symmetric percentage that handles small truth valuesmae()for outlier-robust error in the outcome's original unitsrmse()for a penalty that punishes large misses harder than small onesrsq()for a unit-free 0-to-1 goodness-of-fit scoremetrics()to compute several regression scores in a single call
For the full set, see the yardstick reference index.
FAQ
What is a good MASE value?
The hard threshold is 1: below 1 means the model beats the naive seasonal forecast, above 1 means it loses. Top competition models on monthly data sit in the 0.6 to 0.9 range; under 0.5 is unusual outside very smooth series. Report MASE next to MAE for context.
How does mase() differ from mean(abs(y - yhat)) / mean(abs(diff(y, lag = m)))?
They return the same number on clean inputs. yardstick wraps the formula with input validation, NA handling, optional mae_train, case_weights support, and a tidy tibble that integrates with metrics() and group_by().
Should I pass mae_train or let mase() compute it?
Pass mae_train for any comparison that crosses model versions, holdouts, or cross-validation folds. Letting mase() compute the denominator from the test set is a quick exploratory shortcut, but the metric stops being comparable once the holdout window changes size.
Can mase() handle multiple series in one call?
Yes, with group_by(). Group by the series identifier before piping into mase and yardstick returns one score per series. For unequal lengths, compute each series's mae_train separately and join it on before scoring.
Summary
mase() is the cross-series forecasting scorecard in yardstick's regression family. Reach for it when one number has to compare forecasts across products, regions, or scales, set m to match the dominant seasonality, and pin the denominator with mae_train whenever the comparison crosses holdouts or model versions. Pair it with mae() for an engineering view in raw units, and switch to mape() when stakeholders want the headline as a percent.