Linear Regression Assumptions in R: Test All 5 and Know What to Do When They Fail
Linear regression in R relies on five assumptions: linearity, independence of errors, homoscedasticity, normality of residuals, and no multicollinearity. When any one of them breaks, your coefficients, standard errors, and p-values become unreliable, so every assumption needs both a diagnostic test and a known fix. This post shows all five, using base R so every code block runs in your browser.
Why do the 5 linear regression assumptions matter?
Ordinary least squares (OLS) returns unbiased slopes only when its assumptions hold. Violate linearity and your predictions curve away from the truth; violate homoscedasticity or independence and the standard errors lie, so p-values stop being trustworthy. Violate the no-multicollinearity condition and individual coefficients swing wildly even when overall fit looks fine. The fastest way to inspect all of this at once is the four-plot diagnostic panel R produces when you call plot() on a fitted model.
par(mfrow = c(2, 2)) arranges all four plots in a single 2x2 grid for scanning at a glance.
One line of code produced the entire dashboard. The top-left (Residuals vs Fitted) tests linearity, the top-right (Q-Q) tests normality of residuals, the bottom-left (Scale-Location) tests homoscedasticity, and the bottom-right (Residuals vs Leverage) surfaces influential points. Our R² is 0.83, which feels great, but that number tells you nothing about whether the inferences from this model are valid. That question is what the assumptions answer.

Figure 1: Each of the four diagnostic plots tests a different assumption.
Try it: Fit Petal.Length ~ Sepal.Length + Sepal.Width on the built-in iris dataset, then run plot() to see the same 4-panel dashboard.
Click to reveal solution
Explanation: lm() fits the model and plot() dispatches to plot.lm(), which renders the same four diagnostic plots for any linear model.
How do you test linearity (and what if it fails)?
Linearity means the true relationship between each predictor and the response is a straight line. You don't check this by squinting at scatterplots of y against each x; you check it with the Residuals vs Fitted plot. If the assumption holds, residuals should scatter randomly around zero with no visible trend. A U-shape, arch, or curve means the model missed a nonlinear pattern.
Let's isolate the first diagnostic plot from our fit model. Passing which = 1 asks plot.lm() for just the Residuals vs Fitted panel.
The red LOESS curve should hug the horizontal zero line. In our mtcars model it dips, then rises, hinting that the true relationship between mpg and its predictors curves. That is a linearity violation, and the fix is to let the model bend.
poly(hp, 2) or a transformation like log(hp) usually flattens it.Refitting with a quadratic term on hp lets the model capture the curvature without abandoning OLS.
The smoother is now much flatter, R² climbed from 0.83 to 0.88, and we've addressed the linearity issue without changing the core model family. If a polynomial still leaves a visible pattern, try a log transform on the response or a spline (splines::ns()).
Try it: Fit Temp ~ Ozone on the built-in airquality dataset (use complete cases), inspect plot(..., which = 1), then refit with log(Ozone) and compare.
Click to reveal solution
Explanation: The raw model shows an arch because high Ozone days are hotter but with diminishing returns. Taking log(Ozone) straightens the relationship.
How do you check independence of residuals in R?
Independence means your errors are not correlated with each other. When data is collected in time order (daily sales, weekly temperatures, monthly traffic), residuals often carry autocorrelation: a positive error today is followed by a positive error tomorrow. That breaks OLS's variance formula, so standard errors shrink and p-values look better than they are.
The Durbin-Watson (DW) statistic tests for first-order autocorrelation in residuals:
$$DW = \frac{\sum_{i=2}^{n}(e_i - e_{i-1})^2}{\sum_{i=1}^{n} e_i^2}$$
Where:
- $e_i$ is the residual for observation $i$
- $n$ is the number of observations
- DW ≈ 2 means no autocorrelation
- DW < 1.5 suggests positive autocorrelation (the usual problem)
- DW > 2.5 suggests negative autocorrelation
We can compute DW directly from the residuals in base R, no extra packages needed.
A DW of 1.36 on the mtcars model is a mild positive autocorrelation warning, but mtcars is not ordered in time, so the statistic is less meaningful here. The same calculation on a time-ordered dataset would be the real test. What matters is knowing how to get the number and read it.
car::durbinWatsonTest() gives the same statistic plus a p-value and bootstrap CI in local RStudio: car::durbinWatsonTest(fit). The car package is not pre-compiled for WebR, so the manual formula above is your drop-in equivalent here.If DW flags autocorrelation, the fix is to acknowledge the time structure: add lagged predictors, switch to generalized least squares with nlme::gls(..., correlation = corAR1()), or model the series directly with arima() or the forecast package. Ignoring autocorrelation and reporting OLS p-values is the silent-bug version of this problem.
Try it: Simulate an autocorrelated residual series and confirm the DW statistic drops well below 2.
Click to reveal solution
Explanation: The AR(1) process builds strong positive correlation between consecutive values, so DW drops far from 2, the benchmark for independent errors.
How do you test homoscedasticity (and when do you transform y)?
Homoscedasticity means residual variance is constant across the range of fitted values. Heteroscedasticity, its opposite, shows up as a cone or fan shape in the residual plots: small errors at low fitted values, large errors at high ones (or vice versa). That inflates some standard errors and deflates others, so confidence intervals become meaningless even though the point estimates stay unbiased.
The Scale-Location plot (plot(fit, which = 3)) shows $\sqrt{|standardized~residuals|}$ against fitted values. A flat red smoother confirms constant variance; a sloped or curved smoother confirms heteroscedasticity.
For the mtcars model, the red smoother rises toward higher fitted values, a classic heteroscedasticity signal. The fix is usually a transformation of the response that stabilizes variance.
Log-transforming the response is the most common remedy when variance grows with the fitted value.
The Scale-Location plot is much flatter after the log transform, so variance is now roughly constant on the log scale. If a transformation isn't appropriate, use weighted least squares (lm(..., weights = ...)) or report heteroscedasticity-consistent standard errors (available via sandwich::vcovHC() in local R).
lmtest::bptest() gives the formal Breusch-Pagan test with a p-value in local RStudio: small p-values reject the null of homoscedasticity. The Scale-Location plot tells you essentially the same story and is WebR-native.Try it: Fit mpg ~ hp on mtcars, check Scale-Location, then refit as log(mpg) ~ hp and compare.
Click to reveal solution
Explanation: log(mpg) compresses large values more than small ones, which pulls in the cone shape and flattens the Scale-Location curve.
How do you test normality of residuals in R?
Normality says the residuals are approximately normally distributed. This assumption mainly matters for small-sample inference: confidence intervals and prediction intervals rely on it. Coefficient point estimates stay valid without it, and the Central Limit Theorem often covers large samples anyway.
The Q-Q plot is the standard visual diagnostic. If residuals are normal, the points fall along the diagonal reference line. Heavy tails bow away from the line at the ends.
The Q-Q plot is mostly on the line with mild tail deviation, and Shapiro-Wilk's p-value is 0.087, above the usual 0.05 threshold, so we don't reject normality. That's the calm outcome. When the plot curves away or the p-value is tiny, you have options.
If residuals are visibly non-normal, the most common remedies are: transform the response (log, square root, or Box-Cox via MASS::boxcox()), trim outliers with clear justification, or use bootstrap confidence intervals (boot package) that don't require the normality assumption.
Try it: Run shapiro.test() on the residuals of mpg ~ hp (simpler model) and interpret the p-value.
Click to reveal solution
Explanation: p = 0.065 is above 0.05, so we fail to reject normality. With only 32 rows, Shapiro-Wilk has limited power, so the Q-Q plot should always accompany this test.
How do you detect and fix multicollinearity in R?
Multicollinearity happens when two or more predictors are highly correlated, so OLS can't tell their individual effects apart. Coefficient estimates become unstable (big standard errors, signs flipping with small data changes) even when overall R² is fine. The Variance Inflation Factor (VIF) quantifies the damage:
$$VIF_j = \frac{1}{1 - R^2_j}$$
Where:
- $R^2_j$ is the R² from regressing predictor $j$ on all the other predictors
- $VIF_j = 1$ means predictor $j$ is uncorrelated with the others
- $VIF_j > 5$ is cause for investigation
- $VIF_j > 10$ is a serious red flag
We can compute VIF directly in base R by running the auxiliary regressions ourselves.
disp has a VIF of 10.5 (serious), and cyl is at 7.9 (borderline serious). disp and cyl both correlate strongly with each other and with wt, so including all three is almost redundant.
car::vif() runs the same math in a single call in local RStudio: car::vif(fit_mc). The base-R loop above is the WebR-safe equivalent.The fix is almost always to drop the most redundant predictor, combine predictors into a single index (like engine size instead of disp + cyl), or switch to a method that tolerates correlation (PCA regression, ridge regression via glmnet).
Dropping disp pulled every VIF well below 10, and cyl is now near the 5 threshold rather than past it. The surviving coefficients are now safe to interpret.
Try it: Compute VIF for wt alone in the full 4-predictor model, then after dropping disp, and see the drop.
Click to reveal solution
Explanation: Dropping disp removes a predictor highly correlated with wt, so wt's variance inflation nearly halves.
Practice Exercises
Exercise 1: Full diagnostic workflow on airquality
Fit Ozone ~ Solar.R + Wind + Temp on airquality (complete cases). Run the 4-panel diagnostic and identify which assumption is most visibly violated.
Click to reveal solution
Explanation: The Scale-Location plot shows a rising red smoother (heteroscedasticity) and the Q-Q plot's upper tail bows away from the line (non-normal residuals). Both point to a skewed response, a clue that log(Ozone) is worth trying.
Exercise 2: Repair heteroscedasticity with a log transform
Using the same aq_data from Exercise 1, refit with log(Ozone) as the response. Show the Scale-Location plot before and after.
Click to reveal solution
Explanation: The log transform pulls in the cone pattern, so the red smoother is roughly flat on the right-hand panel. Coefficients now have valid standard errors.
Exercise 3: Detect and resolve multicollinearity
Compute VIF for every predictor in the log(Ozone) model. If any VIF > 5, drop the highest-VIF predictor and refit.
Click to reveal solution
Explanation: All VIFs are well below 5, so multicollinearity is not a problem here. No predictor needs to be dropped. If any had exceeded 5, the fix would be to drop that predictor or combine it with a correlated sibling.
Complete Example
Let's run the entire diagnostic and remedy cycle end-to-end on airquality. The idea is to model daily ozone concentration as a function of solar radiation, wind speed, and temperature, then check all five assumptions and apply fixes where needed.
First, fit the baseline model and look at the full diagnostic panel.
The Residuals vs Fitted plot shows a curve, the Scale-Location plot slopes upward, and the Q-Q plot's upper tail bows away. That's three assumptions wobbling at once. The common culprit with skewed count-like responses (like ozone) is a right-skewed y, and the standard remedy is a log transform.

Figure 2: The five assumptions of ordinary least squares at a glance.
Now refit on the log scale and rerun every check.
The log transform flattened the residual patterns, DW is close to 2 (no autocorrelation concern), and every VIF is under 2 (no multicollinearity). R² on the log scale is 0.66, a realistic number for air-quality modeling. The same workflow, in this order (fit → 4-panel → fix → refit → recheck), works for essentially any OLS model.
Summary

Figure 3: Which remedy to reach for when a diagnostic flags a violation.
| Assumption | Quick Test | Rule of Thumb | Fix When Violated |
|---|---|---|---|
| Linearity | Residuals vs Fitted | Flat band around 0 | Polynomial, log, or spline term |
| Independence | Durbin-Watson | 1.5 ≤ DW ≤ 2.5 | Add lags, GLS, or time-series model |
| Homoscedasticity | Scale-Location | Flat red smoother | log y, WLS, or robust SE |
| Normality | Q-Q plot | Points on the line | Transform y or bootstrap CIs |
| No multicollinearity | VIF | VIF < 5 (ideally < 2) | Drop predictor, combine, or ridge |
The key mental shift is that each assumption maps to both a diagnostic and a specific fix. When you see a violation, you already know the next move. Run plot(fit) on every model you fit, scan the four panels, compute Durbin-Watson and VIF, and decide what to do before reading coefficient p-values.
References
- Fox, J. & Weisberg, S., An R Companion to Applied Regression, 3rd ed. Sage (2019). Link
- Faraway, J., Linear Models with R, 2nd ed. CRC Press. Link
- R Core Team,
plot.lmdocumentation. Link - Zeileis, A. & Hothorn, T., Diagnostic Checking in Regression Relationships, R News 2(3), 2002 (lmtest package). Link
- Lüdecke, D. et al.,
performancepackage, Journal of Open Source Software 6(60), 2021. Link - Kim, J. H., Multicollinearity and misleading statistical results. Korean Journal of Anesthesiology 72(6), 2019. Link
- UCLA IDRE, Regression with R: Regression Diagnostics. Link
- University of Virginia Library, Understanding Diagnostic Plots for Linear Regression. Link
Continue Learning
- Linear Regression in R, fit, interpret, and predict with
lm()before you start worrying about assumptions. - Outlier Treatment With R, dig into the influential points flagged by your Residuals vs Leverage plot.
- Logistic Regression With R, when normality and linearity make no sense for your response, move to a GLM.