yardstick smape() in R: Symmetric Percentage Error Metric
The yardstick smape() function in R returns the symmetric mean absolute percentage error of a regression model, dividing each absolute residual by the average of the truth and estimate, so the result is bounded near 0 to 200 percent and stays stable when the outcome sits near zero.
smape(df, truth, estimate) # basic call smape(df, truth = obs, estimate = pred) # named arguments smape(df, sales, forecast) # forecast columns df |> group_by(series) |> smape(obs, pred) # by group or series smape(df, obs, pred, na_rm = TRUE) # drop missing rows smape_vec(truth_vec, pred_vec) # vector interface smape(df, obs, pred, case_weights = w) # weighted SMAPE
Need explanation? Read on for examples and pitfalls.
What smape() measures
smape() averages the absolute residual scaled by the half-sum of truth and estimate. You pass a data frame with the observed numeric outcome and the predicted values, and the function returns a one-row tibble with .metric, .estimator, and .estimate. The estimate reads as a percent, and the value is bounded near 0 to 200 percent.
The denominator is (|truth| + |estimate|) / 2, not truth alone. That swap makes SMAPE symmetric, and a model predicting zero against a small positive truth no longer produces an unbounded percentage. The trade is that the result is no longer "percent of truth", so the business reading drifts away from MAPE's.
smape() syntax and arguments
The signature matches the rest of yardstick's numeric metrics. Once you know the shape, the same call works for mae(), rmse(), mape(), and the rest of the regression family.
| Argument | Description |
|---|---|
data |
A data frame holding the truth and estimate columns. |
truth |
Unquoted column name of the observed numeric outcome. |
estimate |
Unquoted column name of the predicted numeric values. |
na_rm |
If TRUE, drop rows where either column is missing before scoring. |
case_weights |
Optional column of row weights for survey or importance-weighted data. |
Both columns must be numeric. Output is scaled as a percent (22.4 means 22.4 percent) and can exceed 100 when one side of the pair is much larger than the other.
SMAPE in action: four worked examples
The examples fit a simple lm on mtcars and score in-sample predictions. Build the prediction frame first.
Example 1 calls smape() with positional arguments. The function locates truth and estimate by position and returns the tidy summary.
The .estimator is standard because SMAPE has no binary or multiclass variant. The 11.7 reads as 11.7 percent and sits close to MAPE on this dataset because truth values stay well away from zero, the regime where the two metrics agree.
Example 2 shows what happens when truth approaches zero. Adding a row where truth is 0.01 leaves SMAPE stable, where MAPE would spike.
The headline moved from 11.7 to 17.4, not to 160 the way MAPE did on the same row. The bounded denominator absorbed the small truth instead of dividing by it.
Example 3 groups scoring across folds or product lines. When cross-validation predictions or per-segment forecasts live in one tibble, group_by() plus smape returns one percentage per group.
Example 4 uses the vector interface for quick checks. Inside map() calls or unit tests, smape_vec() returns a plain scalar instead of a one-row tibble.
Use the vector form for scalar thresholds; otherwise stay with the data-frame form to bind, group, or plot.
When to pick smape() over its neighbors
SMAPE is the metric for percent error when truths sit near zero. The table picks the right neighbor otherwise.
| Metric | Best use case | Limitation |
|---|---|---|
smape() |
Bounded percent error stable near zero | Less intuitive than MAPE; rewards equal-magnitude misses oddly |
mape() |
Plain percent-of-truth headline, truth far from zero | Explodes near zero; asymmetric to under and over-prediction |
mae() |
Outlier-robust error in outcome units | Not comparable across targets with different scales |
rmse() |
Punishes large misses harder than small ones | Sensitive to outliers; not scale-free |
mase() |
Scale-free comparison across many time series | Requires a naive baseline forecast |
rsq() |
Unit-free 0-to-1 goodness-of-fit | Can mask large systematic bias |
A common pairing is SMAPE plus MAE: SMAPE for a bounded percent that survives small truths, MAE for raw error in the outcome's own units.
Common pitfalls
Three SMAPE mistakes show up repeatedly in forecasting reports. Each has a one-line fix.
The first is reading SMAPE as "percent of truth". It is not. A SMAPE of 40 percent does not mean predictions are 40 percent off the actual value, because the denominator is the average of truth and estimate. Report SMAPE next to MAE so readers anchor on unit-correct error.
The second is treating SMAPE as fully symmetric. Swapping truth and estimate gives the same score, but constant under- and over-prediction of equal size still produce different scores when scale differs. Pair SMAPE with mean(estimate - truth) to expose direction bias.
The third is comparing SMAPE across tools. Some textbooks divide by (|truth| + |estimate|) instead of half that sum, which halves the headline. yardstick uses the bounded-near-200 percent convention; confirm the formula before copying values across tools.
|truth| + |estimate| shrinks the denominator relative to the residual and the metric inflates. Inspect sign agreement before trusting a high SMAPE reading.Try it yourself
Try it: Use the mtcars lm fit from above. Build a small forecast tibble with one row where actual = 0.5 and pred = 0.6, append it to preds, and compute both SMAPE and MAPE. Save the comparison to ex_smape_vs_mape.
Click to reveal solution
Explanation: One small-truth row barely moves SMAPE (11.7 to 12.0) while MAPE rises faster because it divides by the small truth directly.
Related yardstick metrics
smape() sits inside yardstick's regression family. Reach for these neighbors when SMAPE is not the right fit:
mape()for percent-of-truth when truths stay well away from zeromae()for outlier-robust error in the outcome's unitsrmse()to punish large misses harder than small onesmase()for scale-free comparison across many seriesrsq()for a unit-free 0-to-1 goodness-of-fit scoremetrics()to compute several scores in one call
For the full set, see the yardstick reference index.
FAQ
What is a good SMAPE value?
Bands are looser than for MAPE because SMAPE is bounded near 200 percent. Roughly: below 10 percent is excellent, 10 to 30 percent is good for noisy targets, 30 to 80 percent is workable, and above 100 percent usually signals sign-flips or scale misses. Anchor SMAPE with MAE for unit-correct context.
How is smape() different from mape()?
MAPE divides each absolute residual by truth, so a small truth blows up the headline and over-prediction stays unbounded. SMAPE divides by the average of |truth| and |estimate|, which bounds the metric and keeps it stable when truth approaches zero. Use SMAPE when truths cross or sit near zero; use MAPE for a cleaner percent-of-truth reading when truths stay well away from zero.
Does yardstick scale smape() to 0 to 100 or 0 to 200?
yardstick divides by the half-sum of |truth| and |estimate|, putting the upper bound near 200 percent. Some textbooks divide by the full sum and report half that value, so confirm the formula before comparing across tools.
Can smape() be used with negative truth values?
It runs, and the absolute values in the denominator avoid dividing by a signed near-zero quantity. But when truth and estimate sit on opposite sides of zero, SMAPE inflates quickly because the residual grows while the denominator stays small. For sign-flipping series, prefer mae() for error in raw units.
Summary
smape() is the bounded, symmetric percentage scorecard in yardstick's regression family. Reach for it when truths sit near or cross zero, pair it with mae() to anchor magnitude, and remember the reading is not "percent of truth" but a stability-friendly cousin. With group_by() it gives per-segment percentages; with metrics() it slots into a multi-metric report.