yardstick mae() in R: Outlier-Robust Regression Scoring
The yardstick mae() function in R returns the mean absolute error of a regression model, accepting a tibble with truth and estimate columns and producing a tidy one-row summary in the same units as the outcome, with no penalty inflation from outliers.
mae(df, truth, estimate) # basic call mae(df, truth = obs, estimate = pred) # named arguments mae(df, solubility, prediction) # tidymodels columns df |> group_by(fold) |> mae(obs, pred) # by resample mae(df, obs, pred, na_rm = TRUE) # drop missing rows mae_vec(truth_vec, pred_vec) # vector interface mae(df, obs, pred, case_weights = w) # weighted MAE
Need explanation? Read on for examples and pitfalls.
How mae() scores a regression model
mae() takes the absolute value of each residual and averages them. You pass a data frame with the observed numeric outcome and the predicted values, and the function returns a one-row tibble with .metric, .estimator, and .estimate. The estimate is non-negative and reads in the same units as the outcome, so an MAE of 0.5 on a solubility score means typical predictions miss by half a unit.
Because absolute value treats a small miss and a large miss in proportion, MAE is less sensitive to outliers than RMSE. That makes it the right choice when a handful of unusual observations should not dominate the metric, and it pairs naturally with quantile regression at the median.
mae() syntax and arguments
The signature matches every other yardstick numeric metric. Once you know the shape, the same call works for rmse(), rsq(), mape(), and the rest of the regression family.
| Argument | Description |
|---|---|
data |
A data frame with the truth and estimate columns. |
truth |
Unquoted column name of the observed numeric outcome. |
estimate |
Unquoted column name of the predicted numeric values. |
na_rm |
If TRUE, drop rows where either column is missing before scoring. |
case_weights |
Optional column of row weights for survey or importance-weighted data. |
Truth and estimate must both be numeric; factor, character, or logical inputs raise an error. If you fitted a classification model, reach for accuracy() or roc_auc() instead.
MAE in action: four worked examples
The examples below use yardstick's built-in solubility_test data, which ships a real regression prediction set. Load the package and inspect the data first.
Example 1 calls mae() with positional arguments. The function locates truth and estimate by position and returns the tidy summary.
The .estimator is standard because MAE has no binary or multiclass variant. The estimate of 0.524 sits in log-solubility units and is smaller than the RMSE of 0.722 on the same data, which is the expected ordering whenever residuals contain any spread.
Example 2 contrasts MAE and RMSE under an injected outlier. Adding one large miss inflates RMSE far more than MAE, which is exactly why MAE survives heavy-tailed errors.
MAE rises from 0.524 to 0.556 (about 6 percent), while RMSE jumps from 0.722 to 0.917 (about 27 percent). The same outlier costs RMSE roughly four times what it costs MAE.
Example 3 groups scoring by resample fold. When cross-validation predictions live in one tibble, group_by() plus mae returns one score per fold for instant per-resample diagnostics.
Example 4 uses the vector interface for quick checks. Inside map() calls or unit tests, mae_vec() returns a plain scalar instead of a one-row tibble.
Use the vector form when you need a scalar for thresholds or unit tests; otherwise stay with the data-frame form so you can bind, group, or plot scores.
When to pick mae() over its neighbors
MAE is the outlier-robust default in the yardstick regression family. The table below picks the metric you want when MAE is not the right fit.
| Metric | Best use case | Limitation |
|---|---|---|
mae() |
Errors in outcome units, robust to outliers | Ignores whether a big error matters more than a small one |
rmse() |
Errors in outcome units, large misses matter | Heavily penalises outliers |
rsq() |
Need a unit-free 0-to-1 goodness-of-fit | Can mask large systematic bias |
mape() |
Communicate relative error to non-technical readers | Explodes when truth is near zero |
mase() |
Compare scale-free across multiple time series | Requires a naive baseline forecast |
huber_loss() |
Want MAE-style robustness with smooth gradient near zero | Adds a delta hyperparameter to tune |
A safe default is MAE as the robust headline, RMSE for outlier sensitivity, and R-squared for cross-model comparison on the same target.
Common pitfalls
Three small mistakes account for most mae() failures. Each one has a one-line fix.
The first is passing a factor column. yardstick rejects factor inputs because a factor implies classification, not regression. The fix is to cast to numeric before scoring:
The second pitfall is comparing MAE across different target transformations. An MAE of 0.2 on log-price and 30 on raw-price are not on the same scale. Back-transform predictions before scoring, or stick to one target representation.
The third pitfall is silent NA handling. With na_rm = TRUE (the default), yardstick drops rows where either column is missing, which changes the denominator. Pre-filter the prediction frame when comparing models that handle missingness differently.
mean(estimate - truth) to detect bias.Try it yourself
Try it: Use the built-in solubility_test data. Add +5 to the first prediction, then compute both MAE and RMSE on the perturbed frame. Bind the two results into a single comparison tibble and save it to ex_mae_vs_rmse.
Click to reveal solution
Explanation: A single +5 outlier moves MAE only slightly because absolute errors are averaged linearly, while RMSE jumps because squared residuals amplify the spike. The size of the gap is a quick diagnostic for tail-heavy errors.
Related yardstick metrics
mae() is one entry in the yardstick numeric-metric family. Reach for these neighbors when MAE alone is not enough:
rmse()for an error in outcome units that punishes large missesmape()for percentage error reporting to non-technical stakeholdersrsq()for a unit-free 0-to-1 goodness-of-fitmase()for scale-free comparison across multiple time serieshuber_loss()for a metric that switches from MAE-style to RMSE-style at a thresholdmetrics()to compute several regression scores in a single call
For the full set, see the yardstick reference index.
FAQ
What is a good MAE value?
There is no universal threshold. MAE is in the units of your outcome, so a "good" value depends on the spread of that outcome. Compare MAE against the standard deviation or interquartile range of the truth column: if MAE is much smaller, the model is adding signal; if it is close, the model is barely better than predicting the median. Always interpret alongside rsq() and a residual plot.
How is mae() different from mean(abs(y - yhat))?
They return the same number. yardstick wraps the formula in a function that validates inputs, handles missing values, supports case_weights, and returns a tidy tibble that integrates with metrics() and group_by(). Use mae() everywhere for consistency with the tidymodels workflow.
Can mae() handle case weights?
Yes. Pass a case_weights column to mae(). Common uses are survey weights, sample weights produced by parsnip::fit(), or importance weights when some rows matter more than others. Weights must be non-negative; the function returns a weighted average of |residual| in the same units as the outcome.
Why does mae() return a tibble instead of a number?
The tidy return shape is the yardstick convention. Every metric returns the same three columns: .metric, .estimator, .estimate. That uniformity lets you bind_rows() calls or pipe into group_by() with no reshape step. Call mae_vec() when you only need the scalar.
Summary
mae() is the outlier-robust scorecard in yardstick's regression family. Use it as the headline when residuals are heavy-tailed, partner it with rmse() to expose how much outliers move the number, and switch to mae_vec() when you need a scalar. Combined with group_by() it scales to any resampling scheme, and with metrics() it produces a clean multi-metric report.