Rr‑statistics.co

Diagnostic Plot Interpreter

R's plot.lm() produces four diagnostic plots that tell you whether your linear model's assumptions hold. Paste residuals, fitted values, and leverage to get a per-plot verdict on heteroscedasticity, nonlinearity, normality, and influential points, with a clear recap of what each issue means and what to do about it.

i New to regression diagnostics? Read the 4-min primer

What diagnostic plots check. A linear model rests on four assumptions: the relationship is linear, the residuals have constant variance, they are roughly normal, and observations are independent. R's plot(lm(...)) emits four panels designed to expose the first three (independence is harder to read off a single picture). The job of those panels is not to compute a p-value; it is to let your eyes catch shapes that formal tests sometimes miss.

The four key checks. Residuals vs Fitted looks for a smooth curve away from zero (nonlinearity). Normal Q-Q looks for points that bend off the diagonal (non-normal residuals, often heavy tails). Scale-Location plots the square root of the standardised residual against fitted; a fanning trend means variance grows with the mean. Residuals vs Leverage highlights points whose Cook's distance is large enough to dominate the fit.

How to read each plot. Boring is good. A flat band of points around zero with no shape, a Q-Q line that hugs the diagonal, a flat scale-location smoother, and no points stranded in the upper-right corner of the leverage plot mean the assumptions are plausible. Trumpet shapes, parabolas, S-shaped Q-Q curves, and isolated high-leverage points are the warning signs the human eye is good at and a single test statistic often misses.

What to do when a check fails. Heteroscedasticity calls for a log or square-root transform of y, weighted least squares, or robust (sandwich) standard errors. Nonlinearity calls for a polynomial or spline term, or an interaction you forgot. Non-normal residuals at moderate n are usually fine because of the central limit theorem; at small n consider a transform or a generalised linear model. Influential points should be inspected by hand, never auto-deleted.

4 verdict cards · 2x2 mini plot.lm() · reproducible R

Try a real-world example to load.

Pick a worked example or paste two columns and click Analyse.
R code RUNNABLE
R Reproduce in R
# Fit the model and inspect the four standard diagnostic plots
fit <- lm(mpg ~ wt + hp, data = mtcars)
par(mfrow = c(2, 2))
plot(fit)

# Formal tests behind the verdicts
library(lmtest)
bptest(fit)        # Breusch-Pagan: heteroscedasticity
resettest(fit)     # RESET: nonlinearity
shapiro.test(residuals(fit))   # normality of residuals

# Influence: which rows have Cook's d > 4/n ?
which(cooks.distance(fit) > 4 / nrow(mtcars))

        
2x2 plot.lm() preview VISUAL
Residuals vs Fitted
Normal Q-Q
Scale-Location
Residuals vs Leverage
Scenario data drives all four panels; red points are residuals beyond 2.5 SD; red curve is the loess-style smoother.
Choose a worked example to render the four panels.
2.5
4.0
Inference

Read more How each verdict is computed
Breusch-Pagan style heteroscedasticity: g_i = e_i^2 / (mean(e^2)) fit g_i ~ y_hat_i (linear regression) LM = n * R^2_aux p = 1 - pchisq(LM, df = 1)
Heteroscedasticity. If the squared residuals trend up or down with the fitted value, the variance is not constant. The Breusch-Pagan style auxiliary regression captures that trend; n * R^2 from the auxiliary fit is asymptotically chi-square with one degree of freedom under constant variance. p < 0.05 flags a problem; 0.05 - 0.10 is borderline.
RESET style nonlinearity: fit e_i ~ y_hat_i + y_hat_i^2 + y_hat_i^3 F on the joint significance of the two extra terms p from F(2, n - 3)
Nonlinearity. Ramsey's RESET test asks whether powers of the fitted value carry signal that a linear specification missed. Here the residual is regressed on y_hat, y_hat^2, y_hat^3; if the quadratic and cubic terms are jointly significant, the linear model is misspecified.
Shapiro-Wilk style normality (Royston 1992): W = ( sum a_i * x_(i) )^2 / sum (x_i - mean(x))^2 where x_(i) are the ordered residuals and a_i are weights derived from the expected order statistics of a standard normal sample.
Normality. The Shapiro-Wilk statistic compares the ordered residuals to the expected order statistics of a standard normal. W close to 1 means normality; small W with a small p-value means the tails or the skew are off. The Royston 1992 approximation supports n up to about 5,000.
Outlier count: r_i = e_i / sd(e) (standardised) out = #{ i : |r_i| > 2.5 }
Outliers. A standardised residual beyond 2.5 in absolute value is unusual under normality (about 1.2% of points) but should be expected once or twice in a sample of 100. The verdict flags 4 or more as problematic; the slider lets you tighten or loosen the threshold.
Cook's distance (when leverage is supplied): D_i = ( r_i^2 / k ) * ( h_ii / (1 - h_ii)^2 ) flag D_i > 4 / n
Influence. Cook's distance combines residual size and leverage. When you paste a third column of hatvalues(fit), the tool computes D directly; otherwise it uses leverage proxied by the position of the fitted value, which is rough but useful for screening. Always confirm with cooks.distance(fit).
Caveats When this is the wrong tool
If you have…
Use instead
A glm() with non-Gaussian family
Use deviance residuals, not raw residuals(fit); pass residuals(fit, type = "deviance") and predict(fit, type = "link"). Shapiro-Wilk on deviance residuals is informational, not a formal test.
Time-series or autocorrelated residuals
None of these checks see autocorrelation. Run lmtest::dwtest(fit) for serial correlation, or inspect acf(residuals(fit)); switch to ARIMA / GLS if the residuals are not independent.
A mixed-effects model (lme4 / nlme)
Marginal residuals violate independence by construction. Use the conditional residuals from residuals(fit, type = "pearson") and inspect by group; performance::check_model handles the bookkeeping.
Small n (< 15)
All four verdicts have low power at small n; the Shapiro-Wilk approximation also degrades. Trust the eye on the mini Q-Q panel and bring domain knowledge in.
You only have the printed plot, not the data
The tool reads numbers, not pixels. The lm() output interpreter and the diagnostic-plot tutorials linked below cover the visual cues directly.
Further reading

Numerical notes: Breusch-Pagan and RESET are approximate; the canonical implementations live in lmtest::bptest and lmtest::resettest. Shapiro-Wilk uses Royston's 1992 polynomial approximation; for n > 5,000 prefer Anderson-Darling. Cook's distance is exact when the third column is the true hatvalues(fit); otherwise the proxy is for screening only.