broom tidy() for glm in R: Logistic and Poisson Output
The broom::tidy() function turns a fitted glm object into a one-row-per-term tibble you can pipe into dplyr, ggplot2, or a report. It works on every glm family (binomial, Poisson, Gamma, quasi) and can return odds ratios, incidence rate ratios, and confidence intervals in a single call.
tidy(logit_fit) # log-odds coefficients tidy(logit_fit, exponentiate = TRUE) # odds ratios tidy(logit_fit, conf.int = TRUE) # add 95% CI columns tidy(logit_fit, exponentiate = TRUE, conf.int = TRUE) # OR plus CI tidy(poisson_fit, exponentiate = TRUE) # incidence rate ratios tidy(logit_fit, conf.level = 0.99) # custom CI level tidy(logit_fit) |> arrange(p.value) # rank terms by p-value
Need explanation? Read on for examples and pitfalls.
What tidy() does for glm in one sentence
tidy() reshapes a glm object into a rectangular tibble. A fitted glm is an S3 list with deeply nested slots for coefficients, the design matrix, the link function, and the IRLS trace. broom::tidy() extracts the parts you usually need (term, estimate, standard error, statistic, p.value) and returns one row per predictor.
For a binomial fit, the estimate column is on the log-odds scale by default. Set exponentiate = TRUE and it flips to odds ratios, with standard errors and p-values left on the link scale (the statistically correct presentation under the delta method). The same flag returns incidence rate ratios for Poisson and quasi-Poisson, and untransformed values for Gaussian and Gamma families.
Syntax
tidy.glm() is the S3 method that broom dispatches to when you pass a glm fit. You almost never call it directly; just call tidy() on the fit and the right method runs.
The three arguments that matter for daily reporting are:
exponentiate:TRUEreturns odds ratios for binomial fits and incidence rate ratios for Poisson; defaultFALSEreturns log-scale coefficientsconf.int:TRUEaddsconf.lowandconf.highcolumns derived from a profile likelihood; defaultFALSEconf.level: confidence level for the interval; default0.95
Everything else (quick, expone, ... passed to confint()) is rarely needed.
exponentiate = TRUE and conf.int = TRUE for a publication-ready odds ratio table. A single call returns OR, lower CI, upper CI, and p-value, which is the format most journals expect. You skip the manual exp(coef()) plus exp(confint()) two-step.Common patterns
1. Coefficient table from a logistic regression
Each row is one predictor. The estimate for wt is the change in log-odds of am = 1 for a one-unit (one-thousand-pound) increase in weight, holding mpg constant. Negative means heavier cars are less likely to be manual. The statistic column is the Wald z-statistic and p.value is its two-sided p-value, copied straight from summary(fit)$coefficients.
2. Odds ratios with confidence intervals
This is the table you put in a Methods or Results section. Each estimate is an odds ratio: 1.36 for mpg means a one-mpg increase multiplies the odds of being a manual transmission by 1.36, controlling for weight. The huge intercept OR is normal in glm output; it represents the baseline odds when all predictors are zero, which is far outside the support of the data and rarely interpretable on its own. Most reporting workflows drop the intercept row before plotting or printing, which is what the forest plot in pattern 4 does.
3. Poisson glm with incidence rate ratios
For Poisson and quasi-Poisson fits, exponentiate = TRUE returns incidence rate ratios. An IRR of 1.005 for hp means each additional unit of horsepower multiplies the expected carburetor count by 1.005, or about a 0.5% increase, holding cylinders constant. Counts are a stretch on a 32-row toy dataset, but the mechanics generalize: any Poisson glm, including those with offsets for exposure, tidies into the same seven-column tibble.
4. Plot odds ratios as a forest plot
The log-scale x-axis keeps OR = 1 visually centered between protective and risk-increasing effects, which matches how analysts read forest plots. Dropping the intercept avoids the extreme value squashing the rest of the plot into the left margin.
tidy.glm() covers every family glm() supports. Gaussian, binomial, Poisson, Gamma, inverse.gaussian, quasi, quasibinomial, quasipoisson all dispatch to the same method. The only thing that changes is what exponentiate = TRUE means: odds ratio (binomial), IRR (Poisson and quasi-Poisson), or untransformed (Gaussian).tidy() vs base summary() and gtsummary
Three tools cover the same job from different angles. Pick by what you do next with the output.
| Tool | Output type | Best for |
|---|---|---|
summary(fit) |
printed text plus nested list | Quick console check |
broom::tidy(fit) |
tibble (data frame) | dplyr piping, ggplot, custom tables |
gtsummary::tbl_regression(fit) |
rendered HTML or Word table | Final reports without manual formatting |
Use tidy() whenever the next step is code: filtering significant terms, combining many glm fits with purrr::map(), drawing a forest plot, or writing to CSV. Use gtsummary for the final document; it calls broom::tidy() under the hood and adds publication formatting. The summary() printout is still the fastest way to eyeball a single fit interactively, but it is a dead end for anything programmatic.
tidy() returns a tibble, every dplyr verb, every ggplot geom, and every gt or flextable layout works without a custom shim. This is why broom ships inside the tidymodels meta-package even if you only fit a single glm.Common pitfalls
Pitfall 1: forgetting exponentiate = TRUE. The default is the link scale (log-odds for binomial, log-rate for Poisson), which is rarely what you report. If your "odds ratio" column has negative numbers, you forgot the argument.
Pitfall 2: tidying a glm fit on the response scale by hand. Manually calling exp(tidy(fit)$estimate) works for the point estimate but is brittle: the standard error of an exponentiated coefficient is not simply exp(SE). Let exponentiate = TRUE do the bookkeeping: it returns OR for the estimate and CI but keeps SE, statistic, and p-value on the link scale where they were computed.
tidy(fit, conf.int = TRUE) can fail to converge on sparse data. broom calls MASS::confint.glm() internally, which uses profiling. With separation or very few events per predictor, you may see a glm.fit: fitted probabilities numerically 0 or 1 occurred warning and NA CI bounds. Switch to Wald intervals (confint.default()) or refit with brglm2::brglm() to handle separation.Pitfall 3: passing level = 0.9 instead of conf.level = 0.9. The current argument name is conf.level. Older broom accepted level; tutorials predating broom 0.7 still show it. Check packageVersion("broom") if you see unused argument.
Try it yourself
Try it: Fit a logistic regression on mtcars predicting am from hp and qsec. Use tidy() to produce an odds-ratio table with 95% confidence intervals. Save the result to ex_or_table.
Click to reveal solution
Explanation: Combining exponentiate = TRUE with conf.int = TRUE returns odds ratios and their 95% interval in a single call. No manual exp() or confint() step is required, and the standard errors stay on the link scale where they belong.
Related broom functions for glm
After mastering tidy(), the next two broom verbs round out the workflow:
glance(fit): one-row model summary withnull.deviance,df.null,logLik,AIC,BIC,deviance, andnobsaugment(fit): per-observation tibble with.fitted,.resid,.std.resid,.hat, and.cooksdgtsummary::tbl_regression(fit, exponentiate = TRUE): formatted regression table built on top of broom
To combine multiple glm fits from purrr::map(), use map_dfr(fits, tidy, .id = "model"). The .id column lets you facet a forest plot by model.
See the official broom documentation for glm methods for the full argument list.
FAQ
How do I get odds ratios from a glm in R?
Call tidy(glm_fit, exponentiate = TRUE) on a binomial fit. The estimate column will hold odds ratios instead of log-odds coefficients. Add conf.int = TRUE to include 95% confidence bounds in the same call. This single line replaces the older two-step pattern of exp(coef(fit)) plus exp(confint(fit)), and it keeps the term, standard error, and p-value aligned with the OR in one tibble.
Does broom tidy work with quasibinomial and quasipoisson fits?
Yes. tidy.glm() handles every family glm() accepts, including quasibinomial, quasipoisson, Gamma, and inverse.gaussian. The exponentiate = TRUE argument follows the same convention as the parent family: odds ratios for quasibinomial, IRRs for quasipoisson, and untransformed values for Gamma and Gaussian.
What is the difference between tidy(), glance(), and augment() for a glm?
tidy() returns one row per coefficient (term-level). glance() returns one row summarizing the whole fit (AIC, deviance, null deviance, degrees of freedom). augment() returns one row per observation in the training data, with fitted values, residuals, hat values, and Cook's distance. The three together replace nearly every base R summary(), coef(), fitted(), and residuals() call.
Why are my confidence intervals different from confint.default()?
By default, tidy(fit, conf.int = TRUE) calls MASS::confint.glm(), which produces profile-likelihood intervals. These are usually narrower and more accurate than Wald (confint.default()) intervals on small samples. If you need Wald intervals for speed or reproducibility, pass conf.int = TRUE together with conf.method = "Wald" (broom 1.0 and later).
Can tidy() handle a glm with an offset or weights?
Yes. broom reads the offset and weights from the fit object, so the standard errors and p-values in the tidy output reflect them. The estimates are interpreted exactly as in the underlying glm: an offset shifts the linear predictor, and weights rescale each observation's contribution to the deviance.