ggplot2 geom_smooth() in R: Trend Lines With Examples
The geom_smooth() function in ggplot2 fits a smooth trend line through (x, y) data with an optional confidence band. Use method = "lm" for linear regression, "loess" for local smoothing, or "gam" for generalized additive splines.
ggplot(df, aes(x, y)) + geom_point() + geom_smooth() # default loess ggplot(df, aes(x, y)) + geom_point() + geom_smooth(method = "lm") # linear regression ggplot(df, aes(x, y)) + geom_smooth(method = "lm", se = FALSE) # no CI band ggplot(df, aes(x, y, color = group)) + geom_smooth(method = "lm") # one line per group ggplot(df, aes(x, y)) + geom_smooth(method = "lm", formula = y ~ poly(x, 2)) ggplot(df, aes(x, y)) + geom_smooth(method = "gam") # GAM splines ggplot(df, aes(x, y)) + geom_smooth(method = "loess", span = 0.3) # tighter loess
Need explanation? Read on for examples and pitfalls.
What geom_smooth() does in one sentence
geom_smooth() fits a trend line to (x, y) data and optionally draws a 95% confidence band around it. The method argument picks the model: linear regression (lm), local polynomial (loess), generalized additive (gam), or any custom modeling function.
For exploratory analysis, geom_smooth() is the fastest way to add a quick statistical summary to a scatter plot. For publication, fit the model separately with lm() or gam(), extract predictions, and plot via geom_line() for full control over confidence levels, prediction intervals, and styling.
Behind the scenes, geom_smooth() calls a stat that subsamples your data, fits the chosen model, predicts y values across a regular grid of x, and connects those predictions with a line. The confidence band is computed from the model's standard errors. None of this is visible in the API; you just get a clean line on top of your scatter plot.
The smoothness of the resulting curve depends on the method: lm gives a perfectly straight line; loess and gam adapt to the data shape with parameters you can tune. Default settings work well for exploratory plots; for publication, always specify the method and any tuning parameters explicitly.
Syntax
geom_smooth() requires aes(x, y). Other arguments tune the model.
The full signature:
geom_smooth(mapping = NULL, data = NULL, stat = "smooth", position = "identity",
..., method = NULL, formula = NULL, se = TRUE, na.rm = FALSE,
orientation = NA, show.legend = NA, inherit.aes = TRUE)
method = NULL chooses LOESS for N < 1000 and GAM for larger. This means the same code can change behavior at scale. For consistent behavior across data sizes, always specify method explicitly.Six common patterns
1. Default smooth (loess for small data, gam for large)
The simplest case: loess curve fitted through the points, with a 95% confidence band. ggplot prints a console message indicating which method it chose.
2. Linear regression line
method = "lm" fits ordinary least squares and draws the line. The shaded band is the 95% confidence interval for the mean prediction.
3. Linear fit without confidence band
se = FALSE hides the confidence band. Use this when the band is misleading (small samples) or when you want a cleaner visual.
4. One smooth per group
Mapping color = drv automatically groups the smooth. Each drv gets its own regression line. Helpful for comparing trends across categories.
5. Polynomial fit
formula = y ~ poly(x, 2) fits a degree-2 polynomial. For higher-order, change the integer (poly(x, 3), etc.). Use splines::ns(x, df = 4) for natural splines.
6. Tighter LOESS span
span controls how much data is used at each x point. Default 0.75 is smooth; 0.3 is more responsive to local features. Lower span = wigglier fit.
geom_smooth() is for VISUAL summary, not formal modeling. The fitted line and band are descriptive, not inferential. For p-values, coefficients, model selection, fit your model separately with lm(), glm(), or gam() and then plot the predictions. geom_smooth() hides the model details on purpose.This separation matters in practice. A team might prototype with geom_smooth(method = "lm") and discover a useful linear trend. The actual analysis then runs lm() separately to get coefficients, residuals, and diagnostics. The final report uses both: a fitted-line plot for the visual story and a coefficient table for the statistical claim. geom_smooth() is the entry point, not the destination.
A note on confidence intervals in geom_smooth
The shaded band drawn by default with se = TRUE is the 95% confidence interval for the MEAN PREDICTION at each x. It tells you "the average y at this x is in this band with 95% confidence", assuming the model is correct. It is NOT a prediction interval for individual observations, which would be wider. For prediction intervals, fit the model with lm() and use predict(model, interval = "prediction") to extract the wider band, then plot via geom_ribbon().
The CI band's width depends on the model's standard errors at each x, the chosen significance level (default 0.95), and the spread of x values around that point. It is generally narrowest near the center of the x range and widens toward the extremes, which matches how a regression model is most certain near the mean of x.
geom_smooth() methods comparison
Choose method based on what you want to show.
| Method | What it does | Use for |
|---|---|---|
"lm" |
Ordinary least squares linear regression | Linear relationships, comparing slopes |
"glm" |
Generalized linear model (specify family) | Logistic, Poisson, etc. |
"loess" |
Local polynomial regression | Nonlinear, exploratory |
"gam" |
Generalized additive model with splines | Nonlinear, large data |
"rlm" (MASS) |
Robust linear regression | Outlier-resistant linear |
| Method | Default span/df | Computational cost |
|---|---|---|
| lm | N/A | Very low |
| loess | span = 0.75 | Medium (slow on large data) |
| gam | k = 10 | Low to medium |
When to use which:
- Use
lmwhen you want to show or test a linear trend. - Use
loessfor exploratory nonlinear smoothing on small data. - Use
gamfor large data or when you need spline-based nonlinearity.
Common pitfalls
Pitfall 1: forgetting that geom_smooth's confidence band is for the MEAN PREDICTION, not for individual points. A new observation at a given x has wider uncertainty than the band shows. The band tells you "the mean of y at this x is in this range with 95% confidence", not "any new point will fall in this range".
Pitfall 2: relying on the default method. Default switches between loess and gam at N=1000. Adding 50 rows of data can change the visual. Always specify method for reproducible plots.
method = "lm" always fits a STRAIGHT LINE unless you change the formula. If your data is curved, geom_smooth(method = "lm") will mislead. Either use loess/gam for nonlinear, or fit a polynomial via formula = y ~ poly(x, k).Pitfall 3: loess fails on very large data. Loess is computationally expensive. With 100K+ rows, it can be slow or run out of memory. Switch to gam for large data or sample the data first.
Try it yourself
Try it: Plot mpg$hwy vs mpg$displ, color points by class, add ONE shared linear regression line (NOT per group), and hide the confidence band. Save to ex_plot.
Click to reveal solution
Explanation: Mapping color = class INSIDE geom_point() (not in the parent aes) limits the color to the points only. The geom_smooth() then fits a single line through all data because it does not inherit the color aesthetic. se = FALSE hides the band; color = "black" overrides any default styling.
Related ggplot2 functions
After mastering geom_smooth(), look at:
stat_smooth(): alternative spelling; same functionstat_summary(): custom summary statistics on top of pointsgeom_line()plus model predictions: full control over the fitted linelm(),glm(),gam(): fit the model separately for full diagnostic controlggeffectspackage: extract and plot model predictions cleanlygeom_ribbon(): manual confidence bands when you want full control
For mixed-effects models, fit with lme4::lmer() and use ggeffects::ggpredict() to extract predictions for plotting.
FAQ
How do I add a regression line in ggplot2?
Use geom_smooth(method = "lm") after geom_point(). For just the line without the confidence band: geom_smooth(method = "lm", se = FALSE). For higher-order: formula = y ~ poly(x, 2) for quadratic.
What is the difference between lm and loess in geom_smooth?
method = "lm" fits a straight line via ordinary least squares. method = "loess" fits a smooth local curve that adapts to the data shape. Use lm for linear; loess for exploratory nonlinear.
How do I remove the confidence band in geom_smooth?
Add se = FALSE: geom_smooth(method = "lm", se = FALSE). The band is hidden but the line stays.
Why is my geom_smooth slow?
Loess (the default for small data) is O(N^2) and slows on large datasets. Solutions: switch to method = "gam" for large data, or sample the data first via slice_sample(prop = 0.1).
How do I fit a separate trend line per group in ggplot2?
Map a categorical variable to color, linetype, or group: aes(x, y, color = grp) + geom_smooth(method = "lm"). ggplot will fit one regression per unique grp value.