Regression Diagnostics Exercises in R: 10 Assumption-Testing Problems, Solved Step-by-Step
These 10 regression diagnostics exercises in R walk you through every OLS assumption check: linearity via residuals vs fitted, normality via Q-Q plots and Shapiro-Wilk, homoscedasticity via a Breusch-Pagan test, multicollinearity via VIF, and influence via Cook's distance and leverage. Every problem ships with a runnable scaffold, a hint, and a collapsible solution so you can self-check the moment you finish coding.
How do you test the linearity assumption in R?
The single most important diagnostic in OLS is the residuals vs fitted plot: if the red LOWESS line bends, your model missed a non-linear pattern. We will fit a multiple regression on mtcars and read the plot right now, then Problems 1 and 2 ask you to diagnose a new specification and tame a deliberately curved one. Every block here is base R, so the diagnostic runs in your browser with no package install.
The residuals vs fitted plot draws each residual against its corresponding fitted value. If the linearity assumption holds, the cloud hovers around zero with no systematic pattern and the red LOWESS line stays roughly flat. A clear curve (U-shape, S-shape, or monotonic tilt) is the signature of a missing non-linear term, usually a polynomial or a log transform on a predictor.
Problem 1: Diagnose a two-predictor mtcars specification
Try it: Fit lm(mpg ~ wt + disp, data = mtcars), draw the Residuals vs Fitted plot, and print a short character string ("roughly linear" or "curved") describing what you see. Store it in ex1_verdict.
Click to reveal solution
Explanation: On mpg ~ wt + disp the LOWESS line bends downward then flattens, hinting that one of the predictors enters non-linearly. Adding I(wt^2) or swapping disp for log(disp) straightens the plot.
Problem 2: Fix a deliberately curved model with poly()
Try it: Generate a known-curved dataset (y = 2 + 3x - 0.6x^2 + noise), fit a straight-line model, confirm the residuals vs fitted plot bends, then refit with poly(x, 2) and confirm the bend disappears.
Click to reveal solution
Explanation: poly(x, 2) adds an orthogonal quadratic term, which is exactly what the curved residual pattern was asking for. Once the missing curvature is in the model, the residuals lose their pattern.
How do you check residual normality in R?
The Normal Q-Q plot sorts the standardized residuals and plots them against theoretical normal quantiles. On a clean fit the points hug the diagonal reference line; heavy tails, skew, or outliers show up as systematic deviations at the ends. Pair the plot with shapiro.test(resid(fit)), which returns a p-value that formalises what your eyes saw.
The Q-Q plot for model keeps most points near the diagonal with a mild tail at the top. Shapiro-Wilk returns p = 0.225, so at the 0.05 level we do not reject normality. The visual and numeric checks agree: the residuals are close enough to normal that the reported standard errors and p-values can be trusted.
Problem 3: Shapiro-Wilk on a 3-predictor model
Try it: Fit lm(mpg ~ wt + hp + qsec, data = mtcars), run Shapiro-Wilk on its residuals, and store the p-value in ex3_p plus a logical ex3_reject that is TRUE iff the test rejects at alpha = 0.05.
Click to reveal solution
Explanation: $p.value pulls the scalar p from the htest object. With p = 0.34 the residuals are consistent with a normal distribution, so the normality assumption holds for this specification.
Problem 4: Show how a single outlier drives a Shapiro rejection
Try it: You are given a residual vector ex4_res that contains 49 standard-normal draws plus a single large outlier. Compute the Shapiro p-value on the full vector and on the vector with the outlier dropped, then confirm the decision flips from reject to accept.
Click to reveal solution
Explanation: One extreme residual can sink the test on the whole sample even when the other 49 values are perfectly normal. Always inspect the Q-Q plot before acting on a Shapiro rejection, the test cannot tell you whether the problem is widespread or one-point.
How do you detect heteroscedasticity in R?
The Scale-Location plot draws the square-root of the absolute standardized residuals against the fitted values. A horizontal band means variance is constant (homoscedasticity); a rising or funnel-shaped band means variance grows with the fit (heteroscedasticity). The numeric companion is the Breusch-Pagan test: regress squared residuals on the fitted values and check whether the auxiliary regression explains any variance.
The BP statistic is n * R² from the auxiliary regression of squared residuals on fitted values, with 1 degree of freedom (one explanatory variable). The statistic 0.91 gives p = 0.34, well above 0.05, so for this model the variance is compatible with being constant. Any p below 0.05 is the flag to consider a log(y) transform or weighted least squares.
Problem 5: Wrap the BP calculation in a reusable function
Try it: Write bp_test(fit) that takes any fitted lm and returns a named numeric vector with stat and p. Use the n * R² formulation shown above with df = 1.
Click to reveal solution
Explanation: Wrapping the auxiliary regression, the test statistic, and the chi-squared tail in one function means you can reuse it on every model in this post (and every model in your workflow) with a single call.
Problem 6: Detect and cure heteroscedasticity on a synthetic dataset
Try it: Build a dataset where variance grows with x (y = 1 + 2*x + noise * x), show bp_test() flags it, then refit with log(y) on the shifted response and show the flag goes away.
Click to reveal solution
Explanation: The raw fit has BP = 29.8 with p < 0.0001, which lines up with the clear funnel in the Scale-Location plot. The log transform compresses the large-y residuals and brings p back above 0.05. Any time a Scale-Location plot funnels, try log(y) first, then weighted least squares.
How do you measure multicollinearity with VIF?
Variance Inflation Factors (VIF) quantify how much the standard error of a coefficient is inflated because the predictor can be predicted from the others. The formula is simple: regress predictor j on every other predictor, read the auxiliary R² , then
$$\text{VIF}_j = \frac{1}{1 - R^2_j}$$
Values above 5 are a warning, values above 10 are a red flag. Below we compute the full VIF vector on a 4-predictor mtcars model using base R, no car package required.
Every VIF sits above 3, but disp stands out at 10.1, meaning disp is almost a linear combination of the other three. That is why the disp coefficient will flip sign and lose significance depending on which other predictors are in the model. Dropping disp (or replacing it with a derived quantity like disp/cyl) is the cleanest fix.
Problem 7: Generalize the VIF sweep into a function
Try it: Write vif_all(fit) that takes any fitted lm and returns a named numeric vector of VIFs for every predictor. The function should work regardless of how many predictors the model has.
Click to reveal solution
Explanation: model.matrix() gives us the design matrix after R has handled factors and interactions, so the function works on any lm, not just simple main-effects models. Dropping the intercept column avoids a trivial regression.
Problem 8: Identify and drop the worst multicollinearity offender
Try it: Apply vif_all() to full_fit, identify the predictor with the highest VIF, refit the model without it, and confirm every remaining VIF sits below 6. Save the refit as ex8_refit.
Click to reveal solution
Explanation: Dropping disp cuts the maximum VIF from 10.1 to 3.8 without touching the model's overall predictive quality (adjusted R² stays within a rounding error). That is the signature of a pure multicollinearity cleanup, the model got simpler with no loss of fit.
How do you identify influential observations?
Three numbers per row turn influence into a checklist: the hat value (hatvalues()) measures how far the row's predictor combination sits from the centroid; the standardized residual (rstandard()) measures how surprising the row's response is after fitting; and Cook's distance (cooks.distance()) combines both into a single "how much would the model shift if I dropped this row?" score. The conventional cutoffs are hat > 2*(p+1)/n, |rstandard| > 3, and cooks > 4/n.
Chrysler Imperial tops the Cook's D ranking with 0.41, an order of magnitude above the others. Its hat value is 0.23 (high) and its standardized residual is 2.43 (large), so the row is both far from the centroid and poorly predicted by the model, a textbook influential point. Before refitting, decide whether the row is a data error, a legitimate edge case, or a signal that your model needs a new term.
Problem 9: Build a top-3 influence table
Try it: Given model, return a data frame ex9_infl with the 3 rows with the highest Cook's D, each row containing the car name, hat value, standardized residual, and Cook's D (all rounded to 3 decimal places).
Click to reveal solution
Explanation: Sort by negative Cook's D and keep the first 3 rows. All three cars combine a nontrivial residual with enough leverage to matter. That is exactly the combination that defines an influential point.
Problem 10: Quantify the coefficient shift from removing the worst row
Try it: Refit model without the row with the highest Cook's D, then return a data frame ex10_diff with the coefficient name, the original estimate, the refit estimate, and the percent change. Flag any coefficient that shifts by more than 10%.
Click to reveal solution
Explanation: Dropping Chrysler Imperial shifts wt by 16%, hp by 28%, and flips disp from near zero to a meaningfully positive slope. That is a strong argument that the row should not be dropped silently, the model's story changes. Either keep it and report sensitivity, or collect more data from that region.
Practice Exercises
Capstone exercises combine three or more diagnostic ideas into a single pipeline. Use distinct variable names (prefix cap) so exercise state does not bleed into the earlier problems.
Exercise 1: One-shot diagnose() function
Write diagnose(fit) that returns a named list with: shapiro_p, bp_p, max_vif, n_high_cooks (count of Cook's D > 4/n), and a single verdict string that picks the first violation found (order: multicollinearity, heteroscedasticity, non-normality, influential points present, or "clean"). Run it on lm(mpg ~ wt + hp + disp, data = mtcars).
Click to reveal solution
Explanation: The ordered if/else chain returns the most severe violation the fit triggers. model passes the first three checks but flags two influential points, so the verdict lands on the fourth rung. Swap the order of the checks if your workflow cares about a different hierarchy.
Exercise 2: Walk-forward remedy pipeline
Start from lm(mpg ~ disp + hp + wt + cyl, data = mtcars), apply a three-step remedy: (a) drop the predictor with the highest VIF if it exceeds 10, (b) refit and recompute VIFs, (c) run bp_test() and shapiro.test() on the refit. Return a data frame cap2_df comparing the two fits on max VIF, BP p-value, and Shapiro p-value.
Click to reveal solution
Explanation: Removing disp cuts the max VIF from 10.1 to 3.8 while the BP and Shapiro p-values stay comfortably above 0.05. The "remedy" added no complexity, it only removed a redundant predictor.
Exercise 3: Build a violation scoreboard across six specifications
For six mtcars specifications (mpg ~ wt, mpg ~ wt + hp, mpg ~ wt * hp, mpg ~ poly(wt, 2), mpg ~ wt + I(hp^2), mpg ~ log(disp) + wt), build cap3_df with one row per model containing: adjusted R², Shapiro p, BP p, max VIF (NA if the model has fewer than 2 predictors), and number of Cook's D values above 4/n. Print the model with the best adjusted R² whose verdict from diagnose() is NOT "multicollinear" or "heteroscedastic".
Click to reveal solution
Explanation: wt * hp has the highest adjusted R² (0.872) but fails Shapiro (p = 0.041) and BP (p = 0.001), so it is overfitted into violations. The best safe pick is wt + I(hp^2) (adj R² = 0.817, all three diagnostic p-values above 0.05, only one influential row). Scoreboards like this are how you avoid picking the flashiest model when diagnostics quietly disagree.
Complete Example
Here is the full diagnose-remedy-reverify loop on a single mtcars specification. Fit the candidate, run every assumption check, identify the worst violation, apply the targeted fix, re-check, and print a before/after table.
The candidate flunked multicollinearity (disp VIF above 10), so we dropped disp and refit. The refit keeps the same adjusted R² (0.807 → 0.812, a non-material gain), cuts max VIF by two-thirds, and leaves every other diagnostic comfortably above 0.05. The only remaining flag is two influential rows (Chrysler Imperial, Toyota Corolla), which you now inspect row-by-row rather than patching through modeling.
Summary
| # | Problem bucket | Assumption tested | Base-R primitive |
|---|---|---|---|
| 1-2 | Linearity | Residuals scatter around zero | plot(fit, which = 1), poly() |
| 3-4 | Normality | Residuals follow a normal distribution | shapiro.test(), plot(fit, which = 2) |
| 5-6 | Homoscedasticity | Constant residual variance | Manual Breusch-Pagan (lm(resid^2 ~ fitted)) |
| 7-8 | Multicollinearity | Predictors are not near-linear combinations of each other | Manual VIF (1 / (1 - R^2)) |
| 9-10 | Influence | No single row dominates the fit | hatvalues(), rstandard(), cooks.distance() |
| Capstones | Combined | One-shot diagnose(), walk-forward remedy, scoreboard |
All of the above |
The recurring pattern: every diagnostic has a visual cue (a plot) and a numeric cue (a test or cutoff). You almost never act on only one. If both agree, you have a clean answer; if they disagree, the plot usually wins because tests can over-fire at large n and under-fire at small n.
References
- Fox, J. , Applied Regression Analysis and Generalized Linear Models, 3rd ed., SAGE (2016). Link
- Faraway, J. , Linear Models with R, 2nd ed., CRC Press (2015). Link
- Cook, R. D. & Weisberg, S. , Residuals and Influence in Regression, Chapman & Hall (1982). Link
- Breusch, T. S. & Pagan, A. R. , "A Simple Test for Heteroscedasticity and Random Coefficient Variation", Econometrica 47(5) (1979). Link
- Shapiro, S. S. & Wilk, M. B. , "An analysis of variance test for normality (complete samples)", Biometrika 52 (1965). Link
- R Core Team ,
stats::lmreference. Link - R Core Team ,
stats::influence.measuresreference. Link
Continue Learning
- Regression Diagnostics in R , the visual companion covering the five
plot(lm)panels referenced throughout these exercises. - Multiple Regression in R , the modeling walkthrough for fitting and interpreting the specifications you just diagnosed.
- Multiple Regression Exercises in R , 12 companion problems covering fitting, interpretation, interaction terms, and stepwise selection.