geom_smooth in R: Add Trend Lines and Confidence Bands to Plots
geom_smooth() adds a smooth trend line to a ggplot2 scatter plot — either a LOESS curve (default for n < 1,000), a linear regression line (method = "lm"), or any other statistical model you specify.
Introduction
A scatter plot shows individual data points. geom_smooth() reveals the overall pattern by fitting a line or curve through the noise. The result — a trend line with a shaded confidence band — gives readers two things at once: the direction and shape of the relationship, and how certain you are about that shape.
The key decision is which type of smooth to use. geom_smooth() supports several:
- LOESS (default for small data) — a flexible local polynomial that finds curves automatically. Great for exploration, hard to interpret as a model.
- lm — straight linear regression line. Simple, interpretable, assumes linearity.
- Polynomial — curved regression using
poly(). Captures curvature while staying interpretable. - GAM — generalized additive model (default for n ≥ 1,000). Flexible like LOESS but with penalized smoothing.
This post walks through each option with working code, explains the confidence band, and shows common mistakes that send readers to the wrong conclusions.
What does geom_smooth() do by default?
With no arguments, geom_smooth() fits a LOESS curve (locally estimated scatterplot smoothing) for datasets under 1,000 observations.
The blue line is the LOESS fit — a local polynomial that adapts to the shape of the data. The grey ribbon is the 95% confidence band: you can be 95% confident the true smooth passes through this band at each x-value.
Try it: Add span = 0.3 inside geom_smooth() to make the LOESS more responsive to local fluctuations (wigglier). Then try span = 1.5 for a smoother curve. The default span = 0.75 is a balance between the two.
How do you add a linear regression line?
LOESS is flexible but hard to interpret quantitatively. When you want to say "for every 1 unit increase in x, y changes by...", use method = "lm" for a straight regression line.
The formula = y ~ x argument is optional but good practice — it makes explicit which model you're fitting. For a simple linear regression this is always y ~ x.
Try it: Add se = FALSE inside geom_smooth() to remove the confidence band. This is useful in presentations where the band distracts from the trend, or when you've communicated uncertainty separately.
How do you fit a polynomial (curved) regression line?
When the relationship is clearly curved — like the U-shape in many biological dose-response relationships — a polynomial smooth fits a curve while remaining interpretable as a model.
poly(x, 2) fits a quadratic term (second-degree polynomial). poly(x, 3) would fit a cubic. Keep the degree as low as possible that captures the real curvature — higher-degree polynomials overfit at the extremes.
Try it: Compare formula = y ~ poly(x, 2) (quadratic) with formula = y ~ poly(x, 3) (cubic). Does the extra degree buy you anything, or does the curve just wiggle more at the edges?
How do you control the confidence band?
The shaded confidence band has several adjustable properties: its presence, width (confidence level), color, and transparency.
The level argument defaults to 0.95. Increasing to 0.99 widens the band — you're claiming higher confidence, which requires a wider interval. Reducing to 0.90 narrows the band.
Try it: Change level = 0.99 to level = 0.50 (50% CI). Notice the band becomes very narrow — not because the model is highly certain, but because you're only claiming 50% confidence. This illustrates that the CI width is driven by your chosen confidence level, not just the data.
How do you draw separate smooth lines per group?
When color or group is mapped to a variable, geom_smooth() automatically fits a separate smooth for each group.
The slope of each line tells a story: rear-wheel drive cars show a steeper fuel-efficiency penalty as engine displacement increases. Front-wheel drive cars maintain better fuel economy across sizes.
Try it: Add method = "loess" (no quotes needed, that's the default) to use LOESS per group instead of linear fits. Compare the two — do the linear fits miss any important curvature?
Complete Example: Multi-layer Smooth Plot
Common Mistakes and How to Fix Them
Mistake 1: Using LOESS for extrapolation
LOESS only smooths within the range of your observed data — it cannot reliably extend beyond it. Use method = "lm" or method = "gam" when you need to predict outside the data range.
Mistake 2: Ignoring the warning about method selection
When n ≥ 1,000, ggplot2 silently switches from LOESS to GAM. If you see "geom_smooth() using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'", you're no longer fitting LOESS. Be explicit: method = "loess" or method = "gam".
Mistake 3: Treating the CI band as a prediction interval
The confidence band shows where the true smooth/line likely falls — not where individual new observations would fall. A prediction interval would be much wider.
Mistake 4: Forgetting formula = y ~ x with method = "lm"
Without the explicit formula, ggplot2 may issue a warning. Always pair method = "lm" with formula = y ~ x (or your polynomial formula).
Mistake 5: Too many group smooths on one plot
More than 3-4 group-specific smooth lines creates a tangled spaghetti plot. Use facet_wrap() when you have many groups.
Practice Exercises
Exercise 1: LOESS vs linear
Using cars (speed vs dist), create a scatter plot with two overlaid geom_smooth() layers: one LOESS (default), one linear (method = "lm"). Use different colors and remove the CI band from the linear fit for clarity.
Show solution
Exercise 2: Polynomial fit
Using cars, fit a quadratic smooth (formula = y ~ poly(x, 2)) and compare with a cubic (poly(x, 3)). Which better captures the relationship without overfitting?
Show solution
Summary
| Argument | Effect |
|---|---|
method = "loess" |
Flexible local smooth (default, n < 1,000) |
method = "lm" |
Straight linear regression line |
method = "gam" |
Generalized additive model (default, n ≥ 1,000) |
formula = y ~ poly(x, 2) |
Quadratic (curved) regression |
se = FALSE |
Hide confidence band |
level = 0.99 |
99% CI instead of default 95% |
span = 0.5 |
LOESS smoothing span (lower = wigglier) |
color, fill |
Line and band color |
linewidth |
Line thickness |
Choosing a smooth type:
- Exploring unknown patterns → LOESS
- Reporting a model (slope, intercept) →
method = "lm" - Clear curvature →
method = "lm",formula = y ~ poly(x, 2) - Per-group patterns → map
colororgroup, ggplot2 fits one smooth per group
FAQ
Why does geom_smooth() switch from loess to gam for large datasets? LOESS is computationally expensive (O(n²)) — for n ≥ 1,000, it becomes slow. ggplot2 automatically uses GAM (from the mgcv package) for large data. Force LOESS with method = "loess" if you need it.
What is the span parameter in LOESS? span controls how much of the data is used for each local fit. Smaller span (e.g., 0.3) fits more locally (wigglier), larger span (e.g., 1.0) uses more of the data (smoother). The default is 0.75.
Can geom_smooth() fit a log or spline model? Yes — formula = y ~ log(x) fits a logarithmic curve. For splines, use formula = y ~ splines::ns(x, df = 4) (natural splines with 4 degrees of freedom).
How do I make geom_smooth() use my own model? Pass a function to method. For example, method = MASS::rlm fits a robust linear model. Any function that works like lm() (takes formula and data) can be used.
Why does my CI band disappear when I add both color and fill mapping? When aes(fill = group) is set, geom_smooth uses fill for the CI band and the group-specific colors. The band may become invisible if fill scales conflict. Fix: use aes(color = group) only, and let ggplot auto-set fill from color.
References
- Wickham H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer.
- ggplot2 official docs — geom_smooth: ggplot2.tidyverse.org/reference/geom_smooth.html
- Cleveland W.S. (1979). Robust Locally Weighted Regression and Smoothing Scatterplots. JASA.
- Wilke C. (2019). Fundamentals of Data Visualization — Chapter 14: Visualizing trends
What's Next?
- ggplot2 Scatter Plots — the foundation: geom_point(), overplotting, and annotations
- Error Bars in ggplot2 — add uncertainty intervals to mean estimates
- R Correlation Matrix Plot — visualize the full pairwise correlation structure