lm() Output Interpreter
R's lm() fits linear regression, the workhorse stat model for continuous outcomes. Paste a summary(lm(...)) block to get a per-coefficient plain-English read (direction, magnitude, significance), R-squared interpretation, F-test verdict, and a Compare-2-models view with AIC, BIC, and nested anova.
New to reading lm() output? Read the 4-min primer ▾
What lm() does. R's lm() fits a linear model by ordinary least squares. You give it a formula like y ~ x1 + x2 and a data frame; it finds the coefficients (slopes and intercept) that minimize the sum of squared residuals. summary() on the fit object prints a structured report with the formula, the residual quantiles, a coefficient table, the model-level R^2 and F-statistic, and the residual standard error.
Reading the coefficient table. Each row is one term in the model. Estimate is the slope (or for a factor level, the gap from the reference). Std. Error is its sampling SE. t value is estimate divided by SE; under the null of no effect, it follows a t distribution. Pr(>|t|) is the two-sided p-value. The trailing *** / ** / * stars rank significance at 0.001, 0.01, 0.05.
What R^2, F, and residuals mean. Multiple R^2 is the fraction of variance in the outcome that the model explains. Adjusted R^2 penalizes adding predictors that do not help. The F-statistic tests whether the model as a whole is better than the intercept-only baseline; its p-value is the model-level significance. Residual standard error is the typical size of a residual on the outcome's scale, on a given degrees of freedom (n - k - 1).
Picking which model to trust. When you have two nested models, an anova(fit1, fit2) F-test asks whether the extra terms are worth their parameters. For non-nested models on the same data, compare AIC or BIC: lower is better, with delta < 2 being effectively a tie and delta > 7 being decisive. Always sanity-check residuals (a Q-Q plot, a fitted-vs-residual plot) before celebrating a high R^2.
Try a real-world example to load.
Read more Anatomy of summary(lm)
sigma^2 (X^T X)^(-1), where sigma^2 is estimated by the residual mean square. Dividing the estimate by its SE gives the t-statistic; large |t| means the estimate is many SEs away from zero. The reference distribution is Student-t with the residual df.anova(A, B) runs an F-test on whether the residual SS dropped by more than chance. Equivalent to a likelihood-ratio test for Gaussian linear models. If you instead need to compare non-nested fits, use AIC / BIC.Caveats When this is the wrong tool
- If you have…
- Use instead
- Binary, count, or proportion outcome
- Use the glm() output interpreter (logistic / Poisson / quasi-binomial). lm() assumes Gaussian residuals; for non-Gaussian outcomes the SEs and p-values from this tool will not transfer.
- Clustered or repeated-measures data
- Use a mixed-effects model (lme4 / nlme) so within-cluster correlation does not deflate your standard errors. A planned future tool (mixed-effects interpreter) will cover this.
- Suspected multicollinearity
- Run a VIF analysis (planned VIF-checker tool, scoped). lm()'s SEs balloon under collinearity; the diagnostic callouts here flag the most extreme cases but a real VIF check is needed.
- Need to inspect residuals visually
- The diagnostic plot tool (planned, scoped) will render the four
plot(lm)diagnostics. This interpreter only sees the printed summary, so it cannot speak to leverage, normality, or heteroscedasticity directly. - Time-series or autocorrelated residuals
- Standard lm() ignores serial correlation. Switch to ARIMA / GLS / Newey-West, or use
lmtest::dwtestfirst to check whether independence is plausible.
- Linear regression in R, end-to-end - lm(), summary, diagnostics, and reporting in one tutorial.
- Linear regression assumptions - linearity, normality, equal variance, independence.
- Regression diagnostics - residual plots, leverage, Cook's distance.
- Interaction effects in R - reading
x1:x2andx1*x2rows correctly. - Dummy variables in R - how factor levels become coefficients.
- Polynomial and spline regression - what
poly()andI(x^2)rows mean. - Model selection - AIC, BIC, anova; when to compare which way.
- Confidence interval calculator - CIs for any single regression coefficient.
Numerical notes: parser handles "< 2e-16", scientific notation, and Signif. codes lines. AIC for model comparison is computed as the comparable kernel n*log(RSS/n) + 2*(k+1) from RSE/df/k; ranks correctly but the absolute value differs from R's stats::AIC by a constant.