Polynomial and Spline Regression in R: Model Curvature Without Transforming Variables
Linear regression draws a straight line, but real-world relationships curve. Polynomial and spline regression let you model that curvature inside the familiar lm() workflow, no log-transforms required. This tutorial uses poly(), bs(), ns(), and mgcv::gam() on built-in R data.
Why does linear regression fail on curved relationships?
A linear model forces one straight line through the data. Plot mpg against hp in mtcars and you see a clear bend, low-hp cars lose mileage fast, high-hp cars level off. A linear fit misses that shape. Before we introduce new tools, let's measure how badly linear regression does, then see how adding a single squared term already helps.
Adding one squared term lifts R-squared from 0.60 to 0.76, a 16-point gain on the same predictor. That gap is the signal that hp relates to mpg through a curve, not a line. The rest of this tutorial is about fitting that curve well without hand-crafting I(hp^2) terms.
Next, let's see the miss in the residuals. A well-specified model leaves residuals scattered randomly around zero. A curved-truth-but-straight-fit leaves a clear pattern.
The red smoother on that plot curves in a pronounced U. That U is the curvature the straight line couldn't capture, it's pushing residuals up at the extremes and down in the middle. A model that fits the true shape would flatten that smoother.
Try it: Repeat the R-squared comparison on mpg ~ wt. Fit a linear model and a degree-2 polynomial, print both R-squared values to see whether wt also bends.
Click to reveal solution
Explanation: Weight bends mpg too, but less sharply than horsepower. The quadratic term adds about 6 points of R-squared, smaller than the 16-point bump for hp.
How does poly() fit polynomial regression in R?
The poly(x, d) function builds a matrix of d columns from your predictor. Feed it to lm() and R fits one coefficient per column, a standard linear regression on an expanded set of features. The model is linear in its coefficients even though the fit is curved in the original hp.

Figure 1: Basis expansion turns one predictor into many columns so lm() can fit curves.
Let's fit a cubic polynomial and look at what poly() produces.
Three things to notice. The degree-1 and degree-2 coefficients are highly significant, so the curve is real. The degree-3 coefficient has p = 0.49, so a cubic is overkill, a quadratic is enough. And the intercept equals the mean of mpg, because poly() builds orthogonal polynomials by default, each column is uncorrelated with every other.
Orthogonal polynomials don't tell you the coefficient on raw hp^2. For that, pass raw = TRUE. Predictions are identical, coefficients are not.
Same curve, very different numbers. The orthogonal version is best for deciding which degrees matter (each coefficient tests one basis piece independently). The raw version is what you want when interpreting the effect of hp^2 directly, say for a report or a manuscript.
hp, hp^2, and hp^3 grow together, so individual coefficients flip signs between models and their standard errors balloon. Never read raw-polynomial t-tests as "is this degree needed" tests, use orthogonal poly() or anova() for that.A formal way to pick the degree is to nest the models and run a sequential F-test with anova(). Each row compares one model to the previous.
The jump from linear to quadratic is overwhelming (p ≈ 0.0001), degree 3 adds nothing (p = 0.49), degree 4 adds something small. The sensible choice is degree 2, anything higher risks chasing noise. The degree-4 wiggle is the classic polynomial trap we'll avoid next with splines.
Try it: Fit a degree-5 polynomial and check whether the fifth-degree coefficient is significant. Use the default (orthogonal) poly() so the t-test is clean.
Click to reveal solution
Explanation: Only degrees 1 and 2 are strongly significant. Degree 4 is borderline and probably an artefact, ignore it. Stick with the quadratic.
When should you use splines instead of polynomials?
Polynomials have a problem: they are global. A wiggle at one end of the data distorts the fit everywhere else. That's the Runge phenomenon, high-degree polynomials oscillate near the boundaries even when the truth is flat there.
Splines fix this by being local. A spline chops the predictor's range into pieces at user-chosen knots, fits a separate low-degree polynomial on each piece, and glues the pieces together so the curve and its first two derivatives remain continuous. One piece bending sharply leaves the others alone.
The bs() function from the base splines package builds a B-spline basis, by default a cubic. Pass it to lm() exactly like poly().
R-squared is 0.77, slightly better than the degree-2 polynomial, on a model with the same effective flexibility. Let's overlay the two fits so you can see where they differ.
The spline (solid blue) and the polynomial (dashed red) agree in the bulk of the data where both are well-constrained. They diverge at the extremes where data are sparse. The polynomial flares, the spline bends more gently because its local piece near the boundary doesn't know or care about the other pieces. That's the win.
bs() defaults to cubic (degree 3). Each piece is a cubic polynomial, and the pieces share the curve, the slope, and the curvature at every knot. The visible fit is smoother than a quadratic polynomial but uses fewer global parameters.Try it: Refit the B-spline with a single knot at 120 and compare R-squared to the two-knot fit.
Click to reveal solution
Explanation: One knot splits the range in two and gives a slightly less flexible fit. The R-squared drop is small, which tells you the second knot didn't add much, many datasets only need one well-placed knot.
How do you choose knots for a spline?
Picking knots by hand is fine when you know the data and suspect a break at a specific value (for example, minimum wage laws kick in at a known pay rate). Most of the time you don't. Two simpler options work well:
- Pass
df = kand letbs()placek - 3knots at equally spaced quantiles. - Switch to
ns()for a natural cubic spline, which adds a constraint that the fit is linear beyond the outer knots. That reduces variance at the boundaries where data are sparse.
Natural splines are the default recommendation for most regression tasks. Here's the equivalent of the two-knot B-spline, but using ns() with df = 4.
ns() places knots at quantiles automatically when you pass df. You can verify this by specifying the same knots manually and checking that the fit matches.
In practice: start with df = 4 (three interior knots) and adjust from there. If df = 3 already fits the data well, use it, fewer degrees of freedom means fewer overfit opportunities. If df = 5 or 6 clearly beats smaller values on AIC, it's probably a sign you need a more flexible approach like a GAM.
df = 10, switch to mgcv::gam() and let the penalty decide.Try it: Grid-search df = 2:7 for ns(hp, df = d) and pick the smallest-AIC model. Lower AIC = better fit after penalising complexity.
Click to reveal solution
Explanation: AIC minimises at df = 3. Higher df values fit the training data better but pay the penalty, so AIC rises. With only 32 rows in mtcars, three degrees of freedom is plenty.
How do GAMs with mgcv automate smoothness?
Even with AIC grid-search, you're still picking a single integer df. Generalized Additive Models (GAMs) through the mgcv package do one better: they fit a large spline basis and penalise wiggliness during estimation. The data decides the effective smoothness.
In R, you write the same formula you'd write for lm() but wrap the curved predictor in s(...).
The important number is edf = 2.68, the effective degrees of freedom. edf = 1 would mean the smooth is equivalent to a straight line (no curvature is needed). edf close to the maximum basis size k (default 10) would mean the basis is too small and the model is fighting the ceiling. 2.7 sits comfortably in the middle: the penalty chose a smoothness about equivalent to a cubic polynomial, and no hand-tuning was required.
Two useful checks live in gam.check(). The k-index column tells you whether the basis size is adequate. A value near 1 and a high p-value mean "you have enough basis functions, the penalty chose the right edf." A low value and small p-value mean "raise k."
k' = 9 is the upper bound (defaulted from k = 10), edf = 2.68 is what the penalty used, k-index = 0.97 and p = 0.39 say "the basis is plenty big." If p had come in at 0.02 we'd refit with k = 20 and check again.
edf tells you how complex the fitted curve is. edf ≈ 1 ⇒ a straight line was sufficient. edf near k' ⇒ the basis is too small. The penalty does the degree-selection work, so you never have to pick an integer.Try it: Force k = 5 in s(hp, k = 5). Compare the new model's edf and AIC to the default fit.
Click to reveal solution
Explanation: Cutting k from 10 to 5 barely changes edf or AIC. The penalty was picking a smoothness equivalent to about 2-3 degrees of freedom either way, the excess basis size never got used. That's the beauty of penalised GAMs: they are robust to the choice of k as long as it's big enough.
Practice Exercises
Use these capstone exercises to combine the ideas above. Variables are prefixed with my_ so they don't clash with tutorial state.
Exercise 1: Rank four methods on mtcars by AIC
Fit four models to mtcars: lm(mpg ~ hp), lm(mpg ~ poly(hp, 2)), lm(mpg ~ ns(hp, df = 4)), and gam(mpg ~ s(hp)). Compute AIC for each and store them in a named vector called my_aic. Sort ascending and report the winner.
Click to reveal solution
Explanation: The quadratic polynomial wins here because the true curve in mpg ~ hp is nearly parabolic and the dataset has only 32 rows. The GAM and ns(df=4) are close runners-up. Linear is far behind, exactly what we expected from the residual plot at the start.
Exercise 2: Runge phenomenon on a simulated sine wave
Simulate my_x <- seq(0, 2*pi, length.out = 200) and my_y <- sin(my_x) + rnorm(200, sd = 0.25). Fit two models: a degree-10 polynomial (my_fit_poly) and a B-spline with df = 6 (my_fit_bs). Plot both fits against my_x and look at the behaviour near the boundaries.
Click to reveal solution
Explanation: The degree-10 polynomial (red) typically overshoots near x = 0 and x = 2π because high-degree polynomials amplify small boundary perturbations. The B-spline (blue) tracks the sine wave cleanly because its local pieces near the boundary aren't influenced by the wiggles in the middle. This is the Runge phenomenon in action, and it's why splines are the default choice for general curve fitting.
Complete Example: Ozone vs Temperature in airquality
Let's apply every method to one dataset end to end. airquality contains daily New York measurements, and ozone concentration is famously nonlinear in temperature.
The ranking matches mtcars: linear loses by a wide margin, the three curved models agree with each other within about 5 AIC points. The GAM squeaks ahead, mostly because ozone bends sharply at high temperatures and the penalised fit handles that boundary region more gracefully.
Interpretation: ozone grows slowly below 75°F and accelerates above it. The linear model averages those two regimes and misses both. The curved models all agree on the shape, they differ only in where they flex. For reporting, the GAM is the safest choice: no tuning parameters and a built-in diagnostic for whether the basis is big enough.
Summary

Figure 2: How to pick between polynomial, B-spline, natural spline, and GAM.
| Method | When to use | Key argument | Gotcha |
|---|---|---|---|
poly(x, d) |
Smooth global curve, small d |
d (degree) |
Keep d ≤ 4; boundary flare at higher degrees |
bs(x, knots) |
You know where breaks belong | knots or df |
High variance beyond outer knots |
ns(x, df) |
General-purpose, wiggle interior + linear tails | df (3-5 typical) |
Choose df by AIC grid-search |
gam(y ~ s(x)) |
Let the data pick smoothness | k (upper bound) |
Check edf vs k' after fitting |
Three practical rules of thumb:
- Start with a quadratic polynomial to confirm the curve is real, then go to splines if the fit stays patterned.
- Use
ns(df = 3)orns(df = 4)as your default curved fit for a single continuous predictor. - Switch to
mgcv::gam(y ~ s(x))when you have more than one curved predictor or when you want smoothness decided by the data, not by you.
poly(), bs(), ns(), and s() all turn one column of raw predictor into many columns that lm() (or gam()) fits with ordinary least squares. The curve is in the columns, the coefficients stay linear.References
- James, G., Witten, D., Hastie, T., Tibshirani, R. , An Introduction to Statistical Learning with Applications in R, 2nd ed. Springer (2021). Chapter 7: Moving Beyond Linearity. Link
- Wood, S. N. , Generalized Additive Models: An Introduction with R, 2nd ed. CRC Press (2017). Link
- Perperoglou, A., Sauerbrei, W., Abrahamowicz, M., Schmid, M. , "A review of spline function procedures in R." BMC Medical Research Methodology, 19:46 (2019). Link
- R Core Team ,
?poly, basestatsdocumentation. Link splinespackage ,?bs,?ns. Link- Wood, S. N. ,
mgcvpackage documentation on CRAN. Link - Hastie, T., Tibshirani, R., Friedman, J. , The Elements of Statistical Learning, 2nd ed. Springer (2009). Chapter 5: Basis Expansions and Regularization. Link
Continue Learning
- Linear Regression in R , the foundation this tutorial builds on: assumptions, fitting, and interpreting
lm()output. - Logistic Regression in R , the same basis-expansion ideas work for classification; swap
lm()forglm(family = binomial). - Regression Diagnostics in R , how to check whether your chosen curve actually fits the data's shape.