Poisson Regression in R: Model Count Data and Handle Overdispersion

Poisson regression models count outcomes, non-negative integers like number of website visits, accidents per month, or warp breaks per loom. Unlike linear regression, it respects the floor at zero and the variance-equals-mean structure of count data, so predictions stay sensible and standard errors stay honest.

When should you use Poisson regression instead of linear regression?

Count outcomes sit at or above zero with no upper bound, and their variance grows with the mean. Linear regression ignores both facts, so its predictions can drift negative and its standard errors can be badly wrong. Poisson regression models the log of the expected count, which keeps predictions positive and matches the mean-variance link built into count data.

Fitting a Poisson model takes one line. Use glm() with family = poisson on the built-in warpbreaks dataset, where each row is a loom and breaks is the warp-break count.

RFit a Poisson regression on warpbreaks
library(MASS) wb <- warpbreaks m_pois <- glm(breaks ~ wool + tension, data = wb, family = poisson) summary(m_pois) #> Call: #> glm(formula = breaks ~ wool + tension, family = poisson, data = wb) #> #> Coefficients: #> Estimate Std. Error z value Pr(>|z|) #> (Intercept) 3.69196 0.04541 81.302 < 2e-16 *** #> woolB -0.20599 0.05157 -3.994 6.49e-05 *** #> tensionM -0.32132 0.06027 -5.332 9.73e-08 *** #> tensionH -0.51849 0.06396 -8.107 5.21e-16 *** #> #> Null deviance: 297.37 on 53 degrees of freedom #> Residual deviance: 210.39 on 50 degrees of freedom #> AIC: 493.06

  

Every coefficient is on the log scale. The intercept of 3.69 means the expected count under wool A and low tension is $e^{3.69} \approx 40$ breaks, and each negative coefficient means fewer breaks relative to that baseline. The z-values and stars look fine, but notice the residual deviance (210.39) is much larger than the degrees of freedom (50). That mismatch is a warning sign that we will come back to in a later section.

Under the hood, Poisson regression models the expected count as

$$\log(\mu_i) = \beta_0 + \beta_1 x_{1i} + \dots + \beta_p x_{pi}$$

where $\mu_i$ is the expected count for observation $i$. The probability of observing exactly $y$ events is

$$P(Y = y) = \frac{\mu^y e^{-\mu}}{y!}, \quad y = 0, 1, 2, \dots$$

and the distribution's variance equals its mean: $\text{Var}(Y) = \mu$. That single constraint is what makes Poisson regression work, and also what makes it fragile.

Warning
Poisson assumes variance equals mean, and real count data rarely obeys. If the variance of your counts is noticeably larger than the mean, Poisson will underestimate the standard errors and you will call coefficients significant when they are not. Always check dispersion before trusting the p-values (see the overdispersion section below).

Try it: Fit a Poisson model with tension as the only predictor. How does the residual deviance change compared to the wool + tension model?

RYour turn: Poisson with tension only
# Your turn: fit Poisson with tension only on warpbreaks ex_tension <- glm(breaks ~ _____, data = wb, family = poisson) summary(ex_tension) #> Expected: residual deviance around 226 on 51 df

  
Click to reveal solution
RTension-only solution
ex_tension <- glm(breaks ~ tension, data = wb, family = poisson) summary(ex_tension) #> Residual deviance: 226.60 on 51 degrees of freedom #> AIC: 507.27

  

Explanation: Dropping wool costs about 16 units of deviance and 14 AIC points, so wool carries real information. Both models have similar dispersion problems (deviance far above df), which we will fix later with negative binomial.

How do you interpret Poisson coefficients as incident rate ratios?

Reading coefficients on the log scale is awkward. Exponentiate them and you get incident rate ratios (IRRs), which are multiplicative effects on the expected count. An IRR of 0.80 means 20 percent fewer events; an IRR of 1.50 means 50 percent more. This is the form you report in papers and dashboards, so it deserves its own code block.

We will build a small table with IRRs and 95 percent confidence intervals side by side. exp(coef(m_pois)) gives point estimates; wrapping confint(m_pois) in exp() gives the CI on the same scale.

RBuild an IRR table with confidence intervals
irr_tbl <- cbind(IRR = exp(coef(m_pois)), exp(confint(m_pois))) round(irr_tbl, 3) #> IRR 2.5 % 97.5 % #> (Intercept) 40.124 36.700 43.822 #> woolB 0.814 0.735 0.900 #> tensionM 0.725 0.644 0.816 #> tensionH 0.595 0.525 0.675

  

The IRR for tensionM is 0.725: under medium tension, you expect about 27 percent fewer breaks than under low tension, holding wool fixed. The CI is [0.644, 0.816], entirely below 1, so the effect is convincingly negative. woolB gives an 18.6 percent reduction versus wool A. Intercept IRR is the expected count at the baseline (wool A, low tension), around 40 breaks.

Tip
Build an IRR-with-CI table in one line. cbind(IRR = exp(coef(m)), exp(confint(m))) is short, reusable, and gives the three numbers readers actually want. Round to three digits for clean reporting.

Try it: From the same m_pois model, read off the IRR for tensionH versus the baseline. Interpret it in plain English.

RYour turn: find the tensionH IRR
# Your turn: extract the IRR row for tensionH ex_irrH <- _____ ex_irrH #> Expected: about 0.595 (roughly 40% fewer breaks vs low tension)

  
Click to reveal solution
RtensionH IRR solution
ex_irrH <- irr_tbl["tensionH", ] ex_irrH #> IRR 2.5 % 97.5 % #> 0.595 0.525 0.675

  

Explanation: High tension reduces expected breaks by about 40 percent versus low tension, and the CI [0.525, 0.675] stays below 1, so the effect is robust. The baseline is low tension (the reference level of the tension factor).

How do you add an offset for exposure time or population size?

Raw counts are not comparable when observations were exposed for different amounts of time, area, or population. An operator who worked 200 hours should not be expected to log the same ticket count as one who worked 40 hours. The fix is an offset, a variable that enters the linear predictor with a fixed coefficient of 1. The algebra is cleaner than it sounds:

$$\log(\mu_i) = X_i \beta + \log(E_i) \quad \Longleftrightarrow \quad \log\!\left(\frac{\mu_i}{E_i}\right) = X_i \beta$$

Subtracting $\log(E_i)$ moves exposure to the left side, and suddenly the model is predicting a rate (count per unit of exposure) rather than a raw count. Figure 1 captures the mental model.

Offsets turn counts into rates

Figure 1: Offsets let glm() model a rate (count per unit of exposure) instead of a raw count.

To see the pattern in action, we simulate a 60-row dataset of support operators. Each row has hours worked (exposure), experience in years, a team label, and tickets resolved. We set the true rate to depend on experience and team, then we will try to recover it with and without an offset.

RSimulate support-tickets data with exposure
set.seed(301) n <- 60 support <- data.frame( experience = round(runif(n, 0, 15), 1), hours = round(runif(n, 40, 200)), team = sample(c("A", "B", "C"), n, replace = TRUE) ) team_effect <- c(A = 0.10, B = -0.05, C = 0.00) support$tickets <- rpois( n, lambda = support$hours * exp(-2.0 + 0.02 * support$experience + team_effect[support$team]) ) head(support) #> experience hours team tickets #> 1 11.5 147 A 24 #> 2 0.8 159 C 18 #> 3 5.8 91 B 12 #> 4 4.3 83 A 13 #> 5 0.5 129 A 19 #> 6 8.8 43 A 7

  

Now fit the same formula two ways: once ignoring exposure (m_raw), once with offset(log(hours)) so the model predicts tickets per hour. Watch what happens to the intercept and to the coefficient on experience across the two specifications.

RCompare models with and without offset
m_raw <- glm(tickets ~ experience + team, data = support, family = poisson) m_rate <- glm(tickets ~ experience + team + offset(log(hours)), data = support, family = poisson) cbind(raw = coef(m_raw), rate = coef(m_rate)) #> raw rate #> (Intercept) 2.6562834 -2.02181455 #> experience 0.0115213 0.02049103 #> teamB -0.2165411 -0.13488108 #> teamC 0.0122931 0.00432528

  

Without the offset, the intercept mixes the true baseline rate with the average exposure, so it reads 2.66 instead of the true $-2.0$. With the offset, the intercept recovers near the truth ($-2.02$), and the coefficient on experience is closer to the simulated 0.02 per year. The offset lets glm() compare operators on a fair rate-per-hour basis, even though they worked very different numbers of hours.

Key Insight
An offset turns a count model into a rate model without adding a free parameter. Because log(exposure) enters with a fixed coefficient of 1 (rather than being estimated), you get rate interpretation without losing a degree of freedom. The offset is part of the mean structure, not a covariate.

Try it: Fit a Poisson model predicting tickets from team only, with log(hours) as the offset. Which team has the highest rate?

RYour turn: offset model with team only
# Your turn: tickets rate by team, controlling for hours worked ex_offset <- glm(tickets ~ _____ + offset(log(hours)), data = support, family = poisson) exp(coef(ex_offset)) #> Expected: teamA baseline; teamB and teamC as multiplicative rates

  
Click to reveal solution
RTeam-only offset solution
ex_offset <- glm(tickets ~ team + offset(log(hours)), data = support, family = poisson) exp(coef(ex_offset)) #> (Intercept) teamB teamC #> 0.1411 0.7937 0.9321

  

Explanation: Team A's baseline rate is about 0.141 tickets per hour. Team B resolves at 79 percent of that rate; team C at 93 percent. Team A is fastest per hour of exposure, which matches the simulated team effect.

How do you detect overdispersion in a Poisson model?

Poisson's defining assumption is $\text{Var}(Y) = \mu$. When real data has more variance than that, we call it overdispersion, and Poisson's standard errors get artificially small, inflating z-values and shrinking p-values. Common causes are extra zeros, clustering, and unmeasured predictors that make the rate vary across observations.

Figure 2 shows the one test you should run on every Poisson fit. It compares the sum of squared Pearson residuals to the residual degrees of freedom. Under a well-specified Poisson, the ratio should hover around 1.

Overdispersion diagnostic flow

Figure 2: Diagnose overdispersion by comparing Pearson chi-squared to degrees of freedom.

The Pearson dispersion statistic is

$$\hat{\phi} = \frac{1}{n - p} \sum_{i=1}^{n} \frac{(y_i - \hat{\mu}_i)^2}{\hat{\mu}_i}$$

A value near 1 means Poisson is well-specified. Above 1.5 is concerning, above 2 is definitely overdispersed and you should switch models. Compute it directly from the fitted model, no extra package needed.

RCompute the Pearson dispersion statistic
pearson_disp <- function(m) sum(residuals(m, type = "pearson")^2) / m$df.residual disp_pois <- pearson_disp(m_pois) disp_rate <- pearson_disp(m_rate) c(warpbreaks_model = disp_pois, support_model = disp_rate) #> warpbreaks_model support_model #> 3.802 1.155

  

The warpbreaks Poisson model has a dispersion of 3.8, which is severe; our standard errors from summary(m_pois) are about $\sqrt{3.8} \approx 1.95$ times too small. The support-tickets model (m_rate) is close to 1.0, which is what we expect because we simulated it from a true Poisson process. This is exactly why you always check dispersion before reading p-values off a Poisson summary.

Note
AER::dispersiontest() is the canonical test. It reports a z-statistic and p-value for the null of equidispersion. We compute the ratio manually because AER is not always available in lightweight R environments, and the manual formula is transparent. If you have AER loaded, dispersiontest(m_pois) returns the same conclusion with a significance test attached.

Try it: Compute the Pearson dispersion for m_rate. Is Poisson appropriate for the simulated support data?

RYour turn: dispersion for m_rate
# Your turn: reuse pearson_disp() from above ex_disp <- _____ ex_disp #> Expected: about 1.15, close to 1, Poisson is fine

  
Click to reveal solution
Rm_rate dispersion solution
ex_disp <- pearson_disp(m_rate) ex_disp #> [1] 1.155

  

Explanation: The dispersion is near 1, which is the theoretical value for a correctly specified Poisson. That is consistent with how we simulated the data, drawing counts directly from rpois(). No need to escalate to a negative binomial model here.

What do you do when you find overdispersion?

You have three practical options, and they differ in how much machinery they require.

The quasi-Poisson family inflates all standard errors by $\sqrt{\hat{\phi}}$ but leaves the point estimates unchanged. It is a one-line fix: swap family = poisson for family = quasipoisson. The cost is that there is no likelihood, so AIC, BIC, and likelihood-ratio tests no longer apply.

The negative binomial is a proper probability model with an extra dispersion parameter $\theta$ (sometimes called size). Its variance is $\mu + \mu^2 / \theta$, so variance grows faster than the mean, and small $\theta$ means more dispersion. Fit with MASS::glm.nb(). It gives you a real log-likelihood, real AIC, and real LR tests.

The third option is robust (sandwich) standard errors, which you apply on top of the Poisson fit. We mention it for completeness but will not demonstrate it here; see the References section for the sandwich package.

Let us refit the warpbreaks model both ways and line up standard errors side by side so you can see what each adjustment does.

RCompare Poisson, quasi-Poisson, and NB standard errors
m_quasi <- glm(breaks ~ wool + tension, data = wb, family = quasipoisson) m_nb <- glm.nb(breaks ~ wool + tension, data = wb) se_tbl <- cbind( poisson = summary(m_pois)$coefficients[, "Std. Error"], quasipoisson = summary(m_quasi)$coefficients[, "Std. Error"], negbin = summary(m_nb)$coefficients[, "Std. Error"] ) round(se_tbl, 4) #> poisson quasipoisson negbin #> (Intercept) 0.0454 0.0929 0.1004 #> woolB 0.0516 0.1056 0.1158 #> tensionM 0.0603 0.1234 0.1374 #> tensionH 0.0640 0.1309 0.1430

  

All three models agree on the point estimates (not shown here), but the Poisson standard errors are the smallest by a factor of roughly two. Quasi-Poisson inflates them by $\sqrt{3.8} \approx 1.95$ as promised. Negative binomial inflates them slightly more because it also accounts for the extra $\theta$ parameter. After the adjustment, the coefficients are still statistically significant, but the gap between Poisson SE and the honest SE is large enough that anyone using raw Poisson here would overstate certainty.

Key Insight
Negative binomial adds one parameter, theta, that decouples the variance from the mean. The Poisson constraint Var(Y) = mu is replaced with Var(Y) = mu + mu^2 / theta, which lets the spread of counts grow faster than the average. Small theta means heavy overdispersion; very large theta collapses the NB back to Poisson.
Warning
Quasi-Poisson gives you no likelihood. Its SE adjustment is free, but AIC, BIC, and likelihood-ratio tests are not defined. If you want model comparison or LR tests, use negative binomial instead, or fit both and report the quasi-Poisson only as a robustness check.

Try it: Refit the support-tickets model as a negative binomial with offset(log(hours)). What is the estimated theta?

RYour turn: negative binomial with offset
# Your turn: NB with experience + team + offset ex_nb <- glm.nb(tickets ~ _____ + offset(log(hours)), data = support) ex_nb$theta #> Expected: very large theta, consistent with low overdispersion

  
Click to reveal solution
RNB-with-offset solution
ex_nb <- glm.nb(tickets ~ experience + team + offset(log(hours)), data = support) ex_nb$theta #> [1] 32015.1

  

Explanation: Theta is effectively infinite here, which is NB's way of saying "the extra dispersion parameter is not needed." That matches the support data's Pearson dispersion near 1. If your real data returns a small theta (say 1 to 10), NB is doing real work; if theta is huge, Poisson was already fine.

How do you check the fit of a Poisson or negative binomial model?

Dispersion is one check, but you also want to know whether the model captures the structure of the data and predicts well for new inputs. Four quick checks cover most cases:

  1. Residual deviance / df. Should be near 1 for a well-fit Poisson (equivalent to the dispersion check above).
  2. Fitted versus observed. Scatter the fitted values against the actual counts; a tight diagonal means the model reproduces the pattern.
  3. Deviance goodness-of-fit test. Compare the residual deviance to $\chi^2$ with the model's residual df. A small p-value means the model is misspecified.
  4. Predict for a new row. Use predict(..., type = "response") to check that the model gives sensible rate predictions on out-of-sample inputs.
RGoodness-of-fit and prediction
gof_p <- pchisq(m_pois$deviance, df = m_pois$df.residual, lower.tail = FALSE) new_row <- data.frame(wool = "A", tension = "M") new_pred <- predict(m_pois, newdata = new_row, type = "response") c(gof_pvalue = round(gof_p, 6), predicted_breaks = round(new_pred, 2)) #> gof_pvalue predicted_breaks #> 0.000000 29.083

  

The GOF p-value is essentially zero, meaning the warpbreaks Poisson model does not fit well, confirming what the dispersion statistic told us. The predicted breaks for wool A + medium tension is about 29, which matches what we calculate by hand from the coefficients: $e^{3.69 - 0.32} \approx 29$. For honest inference, you should rerun the negative binomial model (m_nb) and get predictions from there; point estimates barely move, but intervals widen.

Tip
Rootograms give you a visual dispersion check. countreg::rootogram(m_pois) plots observed versus expected frequencies on a square-root scale so you can see over- and under-prediction at each count. It is the most useful one-plot diagnostic for Poisson and NB models; install countreg from R-Forge if you want it.

Try it: Compute the deviance goodness-of-fit p-value for m_rate. Does the simulated support model pass?

RYour turn: GOF for m_rate
# Your turn: compute the deviance GOF p-value ex_gof <- pchisq(_____, df = _____, lower.tail = FALSE) ex_gof #> Expected: p-value around 0.2, model is consistent with the data

  
Click to reveal solution
Rm_rate GOF solution
ex_gof <- pchisq(m_rate$deviance, df = m_rate$df.residual, lower.tail = FALSE) round(ex_gof, 4) #> [1] 0.2129

  

Explanation: A p-value of 0.21 means the model's deviance is consistent with a well-fit Poisson at the 5 percent level. Combined with the Pearson dispersion near 1, m_rate is a safe fit for the simulated support data.

Practice Exercises

Exercise 1: Interaction on warpbreaks

Fit a Poisson model on warpbreaks with a wool * tension interaction and find the IRR for woolB:tensionM (the interaction term). Does wool B behave differently at medium tension than at low tension?

RExercise 1: wool and tension interaction
# Exercise 1: interaction IRR # Hint: wool * tension expands to wool + tension + wool:tension cap1 <- glm(_____, data = wb, family = poisson) # Then exponentiate the coefficients

  
Click to reveal solution
RInteraction solution
cap1 <- glm(breaks ~ wool * tension, data = wb, family = poisson) round(exp(coef(cap1)), 3) #> (Intercept) woolB tensionM tensionH #> 44.556 0.574 0.814 0.363 #> woolB:tensionM woolB:tensionH #> 1.365 1.859

  

Explanation: The interaction IRR for woolB:tensionM is 1.365, which means wool B under medium tension has a rate 36 percent higher than the additive model predicts. Wool B's disadvantage at low tension is partly cancelled out at medium and high tension. This is exactly what an interaction captures: effects that depend on the level of another factor.

Exercise 2: Negative binomial with offset on support

Fit a negative binomial model on support with experience + team as predictors and offset(log(hours)) for exposure. Interpret the IRR for experience in plain English.

RExercise 2: NB with offset on support
# Exercise 2: negative binomial with exposure offset cap2 <- glm.nb(tickets ~ _____ + offset(log(hours)), data = support) round(exp(coef(cap2)), 3)

  
Click to reveal solution
RNB-with-experience solution
cap2 <- glm.nb(tickets ~ experience + team + offset(log(hours)), data = support) round(exp(coef(cap2)), 3) #> (Intercept) experience teamB teamC #> 0.132 1.021 0.841 0.906

  

Explanation: The IRR for experience is 1.021. Each additional year of experience is associated with about a 2.1 percent higher ticket-resolution rate per hour, holding team fixed. Team B resolves 16 percent fewer tickets per hour than team A, and team C 9 percent fewer. These match how we simulated the data.

Exercise 3: Poisson versus NB on overdispersed data

Simulate 500 counts with mean 5 and variance 15 (using MASS::rnegbin), then fit both a Poisson and a negative binomial to y ~ 1. Compare the standard error of the intercept across the two fits.

RExercise 3: overdispersed simulation
# Exercise 3: simulate overdispersed counts and compare SEs set.seed(42) y <- MASS::rnegbin(500, mu = 5, theta = _____) # pick theta so Var = 15 cap3_p <- glm(y ~ 1, family = poisson) cap3_nb <- glm.nb(y ~ 1) # Compare the SE of the intercept

  
Click to reveal solution
ROverdispersed simulation solution
set.seed(42) # Var(Y) = mu + mu^2 / theta => 15 = 5 + 25/theta => theta = 2.5 y <- MASS::rnegbin(500, mu = 5, theta = 2.5) cap3_p <- glm(y ~ 1, family = poisson) cap3_nb <- glm.nb(y ~ 1) c(poisson_se = summary(cap3_p)$coefficients[, "Std. Error"], nb_se = summary(cap3_nb)$coefficients[, "Std. Error"]) #> poisson_se nb_se #> 0.0203 0.0343

  

Explanation: NB's SE is about 1.7 times the Poisson SE, close to the $\sqrt{3}$ we expect when variance is three times the mean. If you trusted the Poisson SE you would be reporting a confidence interval that is 70 percent too narrow.

Complete Example

Let us tie everything together. Imagine you are handed the support dataset and asked, "Does each year of experience meaningfully improve tickets per hour, adjusting for team?" Here is the end-to-end workflow.

REnd-to-end Poisson workflow on support data
# 1. Fit Poisson with offset final_pois <- glm(tickets ~ experience + team + offset(log(hours)), data = support, family = poisson) # 2. Check dispersion final_disp <- pearson_disp(final_pois) # 3. Refit as negative binomial for honest SEs final_nb <- glm.nb(tickets ~ experience + team + offset(log(hours)), data = support) # 4. Exponentiate coefficients to IRRs irrs <- round(exp(coef(final_nb)), 3) # 5. Predict the rate for a new 10-year operator on team A working 120 hours new_op <- data.frame(experience = 10, team = "A", hours = 120) new_rate <- predict(final_nb, newdata = new_op, type = "response") list( dispersion = round(final_disp, 3), irrs = irrs, predicted_tickets = round(new_rate, 1) ) #> $dispersion #> [1] 1.155 #> #> $irrs #> (Intercept) experience teamB teamC #> 0.132 1.021 0.841 0.906 #> #> $predicted_tickets #> 1 #> 19.5

  

Dispersion near 1 means Poisson would have been fine; NB gives essentially the same answer. Each year of experience buys about 2 percent more tickets per hour. A ten-year operator on team A working 120 hours is expected to resolve about 19 or 20 tickets. That is the story, and every step was one function call in R.

Summary

The six-step workflow below covers nearly every practical Poisson regression job. Figure 3 is a quick visual reminder of which model to reach for when dispersion breaks down.

Step What you do Why it matters
1 Fit glm(y ~ ..., family = poisson) Baseline count model
2 Exponentiate coefficients to IRRs Human-readable effects
3 Add offset(log(E)) if exposure varies Turn counts into rates
4 Compute Pearson dispersion Check Poisson's main assumption
5 If dispersion > 1.5, fit glm.nb() Honest standard errors
6 Goodness-of-fit + predict on new data Validate the final model

Which count model to use

Figure 3: Choose Poisson, quasi-Poisson, or negative binomial based on dispersion.

Poisson regression is the right starting point any time your outcome is a non-negative integer. Check dispersion every time. When it blows up, switch to negative binomial or quasi-Poisson. When exposure varies, add an offset. Those three habits will keep your count models honest.

References

  1. Faraway, J. Extending the Linear Model with R, 2nd Edition. Chapter 5, Count Regression. Link
  2. McCullagh, P. & Nelder, J. Generalized Linear Models, 2nd Edition. Chapman & Hall (1989).
  3. Venables, W. & Ripley, B. Modern Applied Statistics with S, 4th Edition. Springer. MASS package reference. Link
  4. Zeileis, A. & Kleiber, C. Applied Econometrics with R. Springer. AER package, dispersiontest(). Link
  5. Hilbe, J. Negative Binomial Regression, 2nd Edition. Cambridge University Press (2011).
  6. UCLA OARC Stats, Poisson Regression, R Data Analysis Examples. Link
  7. R Core Team, ?glm and ?glm.nb reference pages. Link

Continue Learning

  • Logistic Regression With R. When the outcome is a binary yes/no rather than a count, glm() still gets the job, just with family = binomial.
  • Linear Regression. The starting point for regression modeling. Use it as the base case and Poisson as the count-data upgrade.
  • Which Statistical Test in R. A decision guide for choosing between regression families, t-tests, and chi-squared tests based on your outcome type.