Poisson Regression in R: Model Count Data and Handle Overdispersion
Poisson regression models count outcomes, non-negative integers like number of website visits, accidents per month, or warp breaks per loom. Unlike linear regression, it respects the floor at zero and the variance-equals-mean structure of count data, so predictions stay sensible and standard errors stay honest.
When should you use Poisson regression instead of linear regression?
Count outcomes sit at or above zero with no upper bound, and their variance grows with the mean. Linear regression ignores both facts, so its predictions can drift negative and its standard errors can be badly wrong. Poisson regression models the log of the expected count, which keeps predictions positive and matches the mean-variance link built into count data.
Fitting a Poisson model takes one line. Use glm() with family = poisson on the built-in warpbreaks dataset, where each row is a loom and breaks is the warp-break count.
Every coefficient is on the log scale. The intercept of 3.69 means the expected count under wool A and low tension is $e^{3.69} \approx 40$ breaks, and each negative coefficient means fewer breaks relative to that baseline. The z-values and stars look fine, but notice the residual deviance (210.39) is much larger than the degrees of freedom (50). That mismatch is a warning sign that we will come back to in a later section.
Under the hood, Poisson regression models the expected count as
$$\log(\mu_i) = \beta_0 + \beta_1 x_{1i} + \dots + \beta_p x_{pi}$$
where $\mu_i$ is the expected count for observation $i$. The probability of observing exactly $y$ events is
$$P(Y = y) = \frac{\mu^y e^{-\mu}}{y!}, \quad y = 0, 1, 2, \dots$$
and the distribution's variance equals its mean: $\text{Var}(Y) = \mu$. That single constraint is what makes Poisson regression work, and also what makes it fragile.
Try it: Fit a Poisson model with tension as the only predictor. How does the residual deviance change compared to the wool + tension model?
Click to reveal solution
Explanation: Dropping wool costs about 16 units of deviance and 14 AIC points, so wool carries real information. Both models have similar dispersion problems (deviance far above df), which we will fix later with negative binomial.
How do you interpret Poisson coefficients as incident rate ratios?
Reading coefficients on the log scale is awkward. Exponentiate them and you get incident rate ratios (IRRs), which are multiplicative effects on the expected count. An IRR of 0.80 means 20 percent fewer events; an IRR of 1.50 means 50 percent more. This is the form you report in papers and dashboards, so it deserves its own code block.
We will build a small table with IRRs and 95 percent confidence intervals side by side. exp(coef(m_pois)) gives point estimates; wrapping confint(m_pois) in exp() gives the CI on the same scale.
The IRR for tensionM is 0.725: under medium tension, you expect about 27 percent fewer breaks than under low tension, holding wool fixed. The CI is [0.644, 0.816], entirely below 1, so the effect is convincingly negative. woolB gives an 18.6 percent reduction versus wool A. Intercept IRR is the expected count at the baseline (wool A, low tension), around 40 breaks.
Try it: From the same m_pois model, read off the IRR for tensionH versus the baseline. Interpret it in plain English.
Click to reveal solution
Explanation: High tension reduces expected breaks by about 40 percent versus low tension, and the CI [0.525, 0.675] stays below 1, so the effect is robust. The baseline is low tension (the reference level of the tension factor).
How do you add an offset for exposure time or population size?
Raw counts are not comparable when observations were exposed for different amounts of time, area, or population. An operator who worked 200 hours should not be expected to log the same ticket count as one who worked 40 hours. The fix is an offset, a variable that enters the linear predictor with a fixed coefficient of 1. The algebra is cleaner than it sounds:
$$\log(\mu_i) = X_i \beta + \log(E_i) \quad \Longleftrightarrow \quad \log\!\left(\frac{\mu_i}{E_i}\right) = X_i \beta$$
Subtracting $\log(E_i)$ moves exposure to the left side, and suddenly the model is predicting a rate (count per unit of exposure) rather than a raw count. Figure 1 captures the mental model.

Figure 1: Offsets let glm() model a rate (count per unit of exposure) instead of a raw count.
To see the pattern in action, we simulate a 60-row dataset of support operators. Each row has hours worked (exposure), experience in years, a team label, and tickets resolved. We set the true rate to depend on experience and team, then we will try to recover it with and without an offset.
Now fit the same formula two ways: once ignoring exposure (m_raw), once with offset(log(hours)) so the model predicts tickets per hour. Watch what happens to the intercept and to the coefficient on experience across the two specifications.
Without the offset, the intercept mixes the true baseline rate with the average exposure, so it reads 2.66 instead of the true $-2.0$. With the offset, the intercept recovers near the truth ($-2.02$), and the coefficient on experience is closer to the simulated 0.02 per year. The offset lets glm() compare operators on a fair rate-per-hour basis, even though they worked very different numbers of hours.
Try it: Fit a Poisson model predicting tickets from team only, with log(hours) as the offset. Which team has the highest rate?
Click to reveal solution
Explanation: Team A's baseline rate is about 0.141 tickets per hour. Team B resolves at 79 percent of that rate; team C at 93 percent. Team A is fastest per hour of exposure, which matches the simulated team effect.
How do you detect overdispersion in a Poisson model?
Poisson's defining assumption is $\text{Var}(Y) = \mu$. When real data has more variance than that, we call it overdispersion, and Poisson's standard errors get artificially small, inflating z-values and shrinking p-values. Common causes are extra zeros, clustering, and unmeasured predictors that make the rate vary across observations.
Figure 2 shows the one test you should run on every Poisson fit. It compares the sum of squared Pearson residuals to the residual degrees of freedom. Under a well-specified Poisson, the ratio should hover around 1.

Figure 2: Diagnose overdispersion by comparing Pearson chi-squared to degrees of freedom.
The Pearson dispersion statistic is
$$\hat{\phi} = \frac{1}{n - p} \sum_{i=1}^{n} \frac{(y_i - \hat{\mu}_i)^2}{\hat{\mu}_i}$$
A value near 1 means Poisson is well-specified. Above 1.5 is concerning, above 2 is definitely overdispersed and you should switch models. Compute it directly from the fitted model, no extra package needed.
The warpbreaks Poisson model has a dispersion of 3.8, which is severe; our standard errors from summary(m_pois) are about $\sqrt{3.8} \approx 1.95$ times too small. The support-tickets model (m_rate) is close to 1.0, which is what we expect because we simulated it from a true Poisson process. This is exactly why you always check dispersion before reading p-values off a Poisson summary.
Try it: Compute the Pearson dispersion for m_rate. Is Poisson appropriate for the simulated support data?
Click to reveal solution
Explanation: The dispersion is near 1, which is the theoretical value for a correctly specified Poisson. That is consistent with how we simulated the data, drawing counts directly from rpois(). No need to escalate to a negative binomial model here.
What do you do when you find overdispersion?
You have three practical options, and they differ in how much machinery they require.
The quasi-Poisson family inflates all standard errors by $\sqrt{\hat{\phi}}$ but leaves the point estimates unchanged. It is a one-line fix: swap family = poisson for family = quasipoisson. The cost is that there is no likelihood, so AIC, BIC, and likelihood-ratio tests no longer apply.
The negative binomial is a proper probability model with an extra dispersion parameter $\theta$ (sometimes called size). Its variance is $\mu + \mu^2 / \theta$, so variance grows faster than the mean, and small $\theta$ means more dispersion. Fit with MASS::glm.nb(). It gives you a real log-likelihood, real AIC, and real LR tests.
The third option is robust (sandwich) standard errors, which you apply on top of the Poisson fit. We mention it for completeness but will not demonstrate it here; see the References section for the sandwich package.
Let us refit the warpbreaks model both ways and line up standard errors side by side so you can see what each adjustment does.
All three models agree on the point estimates (not shown here), but the Poisson standard errors are the smallest by a factor of roughly two. Quasi-Poisson inflates them by $\sqrt{3.8} \approx 1.95$ as promised. Negative binomial inflates them slightly more because it also accounts for the extra $\theta$ parameter. After the adjustment, the coefficients are still statistically significant, but the gap between Poisson SE and the honest SE is large enough that anyone using raw Poisson here would overstate certainty.
Try it: Refit the support-tickets model as a negative binomial with offset(log(hours)). What is the estimated theta?
Click to reveal solution
Explanation: Theta is effectively infinite here, which is NB's way of saying "the extra dispersion parameter is not needed." That matches the support data's Pearson dispersion near 1. If your real data returns a small theta (say 1 to 10), NB is doing real work; if theta is huge, Poisson was already fine.
How do you check the fit of a Poisson or negative binomial model?
Dispersion is one check, but you also want to know whether the model captures the structure of the data and predicts well for new inputs. Four quick checks cover most cases:
- Residual deviance / df. Should be near 1 for a well-fit Poisson (equivalent to the dispersion check above).
- Fitted versus observed. Scatter the fitted values against the actual counts; a tight diagonal means the model reproduces the pattern.
- Deviance goodness-of-fit test. Compare the residual deviance to $\chi^2$ with the model's residual df. A small p-value means the model is misspecified.
- Predict for a new row. Use
predict(..., type = "response")to check that the model gives sensible rate predictions on out-of-sample inputs.
The GOF p-value is essentially zero, meaning the warpbreaks Poisson model does not fit well, confirming what the dispersion statistic told us. The predicted breaks for wool A + medium tension is about 29, which matches what we calculate by hand from the coefficients: $e^{3.69 - 0.32} \approx 29$. For honest inference, you should rerun the negative binomial model (m_nb) and get predictions from there; point estimates barely move, but intervals widen.
countreg from R-Forge if you want it.Try it: Compute the deviance goodness-of-fit p-value for m_rate. Does the simulated support model pass?
Click to reveal solution
Explanation: A p-value of 0.21 means the model's deviance is consistent with a well-fit Poisson at the 5 percent level. Combined with the Pearson dispersion near 1, m_rate is a safe fit for the simulated support data.
Practice Exercises
Exercise 1: Interaction on warpbreaks
Fit a Poisson model on warpbreaks with a wool * tension interaction and find the IRR for woolB:tensionM (the interaction term). Does wool B behave differently at medium tension than at low tension?
Click to reveal solution
Explanation: The interaction IRR for woolB:tensionM is 1.365, which means wool B under medium tension has a rate 36 percent higher than the additive model predicts. Wool B's disadvantage at low tension is partly cancelled out at medium and high tension. This is exactly what an interaction captures: effects that depend on the level of another factor.
Exercise 2: Negative binomial with offset on support
Fit a negative binomial model on support with experience + team as predictors and offset(log(hours)) for exposure. Interpret the IRR for experience in plain English.
Click to reveal solution
Explanation: The IRR for experience is 1.021. Each additional year of experience is associated with about a 2.1 percent higher ticket-resolution rate per hour, holding team fixed. Team B resolves 16 percent fewer tickets per hour than team A, and team C 9 percent fewer. These match how we simulated the data.
Exercise 3: Poisson versus NB on overdispersed data
Simulate 500 counts with mean 5 and variance 15 (using MASS::rnegbin), then fit both a Poisson and a negative binomial to y ~ 1. Compare the standard error of the intercept across the two fits.
Click to reveal solution
Explanation: NB's SE is about 1.7 times the Poisson SE, close to the $\sqrt{3}$ we expect when variance is three times the mean. If you trusted the Poisson SE you would be reporting a confidence interval that is 70 percent too narrow.
Complete Example
Let us tie everything together. Imagine you are handed the support dataset and asked, "Does each year of experience meaningfully improve tickets per hour, adjusting for team?" Here is the end-to-end workflow.
Dispersion near 1 means Poisson would have been fine; NB gives essentially the same answer. Each year of experience buys about 2 percent more tickets per hour. A ten-year operator on team A working 120 hours is expected to resolve about 19 or 20 tickets. That is the story, and every step was one function call in R.
Summary
The six-step workflow below covers nearly every practical Poisson regression job. Figure 3 is a quick visual reminder of which model to reach for when dispersion breaks down.
| Step | What you do | Why it matters |
|---|---|---|
| 1 | Fit glm(y ~ ..., family = poisson) |
Baseline count model |
| 2 | Exponentiate coefficients to IRRs | Human-readable effects |
| 3 | Add offset(log(E)) if exposure varies |
Turn counts into rates |
| 4 | Compute Pearson dispersion | Check Poisson's main assumption |
| 5 | If dispersion > 1.5, fit glm.nb() |
Honest standard errors |
| 6 | Goodness-of-fit + predict on new data | Validate the final model |

Figure 3: Choose Poisson, quasi-Poisson, or negative binomial based on dispersion.
Poisson regression is the right starting point any time your outcome is a non-negative integer. Check dispersion every time. When it blows up, switch to negative binomial or quasi-Poisson. When exposure varies, add an offset. Those three habits will keep your count models honest.
References
- Faraway, J. Extending the Linear Model with R, 2nd Edition. Chapter 5, Count Regression. Link
- McCullagh, P. & Nelder, J. Generalized Linear Models, 2nd Edition. Chapman & Hall (1989).
- Venables, W. & Ripley, B. Modern Applied Statistics with S, 4th Edition. Springer. MASS package reference. Link
- Zeileis, A. & Kleiber, C. Applied Econometrics with R. Springer. AER package, dispersiontest(). Link
- Hilbe, J. Negative Binomial Regression, 2nd Edition. Cambridge University Press (2011).
- UCLA OARC Stats, Poisson Regression, R Data Analysis Examples. Link
- R Core Team, ?glm and ?glm.nb reference pages. Link
Continue Learning
- Logistic Regression With R. When the outcome is a binary yes/no rather than a count, glm() still gets the job, just with
family = binomial. - Linear Regression. The starting point for regression modeling. Use it as the base case and Poisson as the count-data upgrade.
- Which Statistical Test in R. A decision guide for choosing between regression families, t-tests, and chi-squared tests based on your outcome type.