ggplot2 stat_smooth() in R: Stat-Side Trend Smoothing
The stat_smooth() function in ggplot2 is the stat-side twin of geom_smooth(): same fitted curve, same confidence ribbon, but written from the statistical-transformation angle. Reach for it when you want to pair the smoother with a non-default geom or surface its fitted values via after_stat().
ggplot(df, aes(x, y)) + stat_smooth() # default loess ggplot(df, aes(x, y)) + stat_smooth(method = "lm") # linear fit ggplot(df, aes(x, y)) + stat_smooth(method = "lm", se = FALSE) # no CI band ggplot(df, aes(x, y)) + stat_smooth(geom = "line") # custom geom ggplot(df, aes(x, y)) + stat_smooth(geom = "ribbon", alpha = 0.3) # band only ggplot(df, aes(x, y)) + stat_smooth(method = "lm", fullrange = TRUE) ggplot(df, aes(x, y)) + stat_smooth(aes(weight = w), method = "lm")
Need explanation? Read on for examples and pitfalls.
What stat_smooth() does in one sentence
stat_smooth() fits a smoother to (x, y) data and returns a fitted line plus a confidence ribbon, identical in output to geom_smooth(). The two functions share the same layer constructor under the hood; they differ only in which side of the stat geom pairing you name first.
For most exploratory work, geom_smooth() reads more naturally because you are usually thinking "add a trend line". You reach for stat_smooth() in three situations: when you want the statistical transformation visible in the call so reviewers see the model intent, when you swap the default geom (replace line plus ribbon with a step, a ribbon-only band, or fitted points), or when you need to surface the computed columns (y, ymin, ymax, se) through after_stat() for downstream mapping.
ggplot2 documents stat_smooth() and geom_smooth() as a paired layer, both calling StatSmooth$compute_group(). Anything you can pass to one, you can pass to the other. The arguments method, formula, se, span, level, n, fullrange, and na.rm mean the same thing in both calls.
Syntax
stat_smooth() takes the same aesthetics as geom_smooth(), plus a geom argument that overrides the default visual.
The full signature:
The geom = "smooth" default produces the familiar line plus ribbon. Override it to redirect the fitted output into any geom that accepts the computed columns.
stat_smooth() and geom_smooth() build the exact same layer. Choose by what reads better at the call site; performance and output are identical.Examples that suit stat_smooth() specifically
These six patterns lean on the stat-first phrasing, especially the geom argument. Each one is harder to express cleanly with geom_smooth().
1. Default linear fit, stat-first phrasing
Identical visual to geom_smooth(method = "lm"). The stat-first phrasing reads "add a smooth transformation, draw with the default geom" and is the convention in stat-heavy reports where the modeling step is the focus.
2. Replace the default geom with a step
Passing geom = "step" reroutes the smoothed predictions through geom_step() instead of geom_smooth()'s default. Useful when the underlying data is naturally stepped (counts, rankings) but you still want a model-based summary on top.
3. Ribbon-only confidence band
geom = "ribbon" keeps the confidence band and hides the line. Useful when you overlay multiple bands or when the fitted line is drawn separately for emphasis later in the layer stack.
4. Access fitted values via after_stat()
stat_smooth() exposes computed columns named y, ymin, ymax, se, and flipped_aes. after_stat(y) pulls the fitted value at each predicted x so you can plot it as points, segments, or labels.
5. Extend the line beyond the data range
fullrange = TRUE projects the fitted line across the full x-axis instead of stopping at the data's x range. Use when you need to show extrapolation explicitly and label it as such.
6. Weighted regression
Mapping weight = w makes ggplot pass the weight vector through to the underlying lm() call. Heavier rows pull the line harder, matching inverse-variance or sampling-design adjustments.
geom argument is the whole reason stat_smooth() exists as a separate name. Without it, geom_smooth() covers every case. Whenever you want "the fitted values from a smoother, but drawn as something other than a line and ribbon", swap geom_smooth() for stat_smooth(geom = "...") and you are done.stat_smooth() vs geom_smooth(): a side-by-side
Both calls construct the same ggplot2 layer; the difference is which side of the stat geom pairing reads more naturally at the call site.
| Aspect | stat_smooth() |
geom_smooth() |
|---|---|---|
| Default geom | smooth (line plus ribbon) |
smooth (line plus ribbon) |
| Default stat | smooth |
smooth |
| Override target | geom argument |
stat argument |
| Reads as | "compute a smoother, draw it" | "draw a smoothed line" |
| Method options | identical | identical |
| Computed columns | y, ymin, ymax, se |
same, via after_stat() |
| Best for | swapping the visual representation | adding a quick trend line |
If you write a lot of plots with custom geoms (geom_step, geom_ribbon, points on fitted values), stat_smooth() keeps the smoother visible in the function name. If you just want a regression line on a scatter, geom_smooth() reads better.
Common pitfalls
Pitfall 1: assuming stat_smooth() is more powerful than geom_smooth(). It is not. They construct the same layer. The split exists for naming consistency with ggplot2's stat geom pattern, not for capability.
Pitfall 2: forgetting to wrap after_stat() around computed names. Writing aes(y = y) inside a second stat_smooth() layer refers to the raw data y, not the fitted value. Use aes(y = after_stat(y)) to pull the smoother's prediction column.
geom = "..." argument only works with geoms that accept the computed columns. Mapping a smoother's output to geom_bar() or geom_boxplot() silently fails or warns. Stick to line, point, ribbon, segment, step, and text.Pitfall 3: chaining two identical stat_smooth() calls. Two layers with the same method, geom, and aesthetics just draw the line twice. The second call is wasted work; use it only when you change geom, color, or aes().
Try it yourself
Try it: Add a linear stat_smooth() to the mpg dataset (x = displ, y = hwy) but draw the smoother as POINTS at each computed prediction instead of the default line. Save the plot to ex_smooth.
Click to reveal solution
Explanation: geom = "point" redirects the smoother's fitted predictions through geom_point(). By default n = 80, so you get 80 evenly spaced points along the fit. se = FALSE avoids drawing the ribbon, which geom_point() cannot interpret.
Related ggplot2 functions
After stat_smooth(), several stat and geom siblings are worth knowing.
geom_smooth(): the geom-first twin; identical layer, more natural for trend linesstat_summary(): arbitrary summary statistics on top of a scatterstat_function(): draw a user-supplied math function across xstat_quantile(): quantile regression smoother for non-mean trendsafter_stat()andafter_scale(): access computed columns inside an aesthetic- The official ggplot2 reference: ggplot2.tidyverse.org/reference/geom_smooth.html
For mixed-effects or non-Gaussian models, fit with lme4::lmer() or mgcv::gam() and use ggeffects::ggpredict() to extract predictions for plotting via geom_line().
FAQ
What is the difference between stat_smooth and geom_smooth?
There is no functional difference. Both construct the same ggplot2 layer with the same default stat (StatSmooth) and default geom (GeomSmooth). stat_smooth() reads more naturally when you override the geom (geom = "step", geom = "ribbon"); geom_smooth() reads more naturally for a plain trend line. Pick whichever phrasing makes the modeling intent clearer at the call site.
How do I access the fitted values from stat_smooth?
stat_smooth() computes columns named y, ymin, ymax, se, and flipped_aes. Reference them inside aes() using after_stat(), for example aes(y = after_stat(y)). To get the raw fitted values into a data frame instead of plotting them, fit the model with lm() or gam() separately and call predict() with a grid of newdata values.
Can I use stat_smooth without geom_point?
Yes. stat_smooth() only needs aes(x, y) to fit and draw. The scatter points are a separate geom_point() layer you add for visual context. The smoother runs over the mapped (x, y) data regardless of whether points are drawn, so a plot with only stat_smooth() shows the fitted line and ribbon alone.
Does stat_smooth work with categorical x?
Not directly. A smoother requires a numeric x to fit a model over. For categorical x, use stat_summary() with fun = mean or fun.data = mean_cl_normal to draw a category-level summary, or convert the categorical to a numeric position before mapping it to x.
Why does stat_smooth print a console message about method?
When method = NULL (the default), ggplot2 picks LOESS for N below 1000 and GAM for larger and prints the choice. Pass method = "lm", "loess", or "gam" explicitly to silence the message and lock the behavior across data sizes, which keeps your plots reproducible as the dataset grows.