broom tidy() for Survival Models in R: coxph and survfit
The broom::tidy() function turns survival model objects into one-row-per-term data frames you can pipe into dplyr, ggplot2, or a report. It works on coxph, survfit, and survreg, and can return hazard ratios with confidence intervals in a single call.
tidy(cox_fit) # coefficients, log hazard tidy(cox_fit, exponentiate = TRUE) # hazard ratios tidy(cox_fit, conf.int = TRUE) # add 95% CI columns tidy(cox_fit, exponentiate = TRUE, conf.int = TRUE) # HR + CI together tidy(km_fit) # KM curve: time, estimate, CI tidy(km_fit) |> filter(strata == "sex=Female") # subset stratified output tidy(aft_fit) # survreg AFT coefficients
Need explanation? Read on for examples and pitfalls.
What tidy() does for survival models in one sentence
tidy() reshapes survival objects into rectangular data frames. A fitted Cox model is an S3 object with deeply nested matrices for coefficients, variance, and call metadata. broom::tidy() extracts the parts you usually need (term, estimate, standard error, statistic, p-value) and returns a tibble with one row per term.
For a survfit object (a Kaplan-Meier curve), tidy() returns one row per event time with the survival estimate, the number at risk, the number of events, and confidence bounds. This makes downstream plotting and reporting much easier than working with the $surv, $time, and $n.risk slots by hand.
Syntax
tidy() is an S3 generic, so the call is the same regardless of model class. broom dispatches to the right method (tidy.coxph, tidy.survfit, tidy.survreg) based on the input object.
The most useful arguments are:
exponentiate:TRUEreturns hazard ratios (exp of the coefficient) for Cox and Weibull AFT models; defaultFALSEgives log scaleconf.int:TRUEaddsconf.lowandconf.highcolumns; defaultFALSEconf.level: confidence level for the interval; default0.95
These three cover almost every reporting use case.
exponentiate = TRUE and conf.int = TRUE for a publication-ready hazard ratio table. This single call gives you the columns most journals require: HR, lower CI, upper CI, and p-value. No manual exp(coef()) or confint() step needed.Common patterns
1. Coefficient table from a Cox model
Each row corresponds to a predictor. The estimate column holds the log hazard ratio. A positive estimate means higher hazard (shorter survival), and a negative estimate means lower hazard.
2. Hazard ratios with confidence intervals
This is the table you put in a Methods / Results section. Each estimate is a hazard ratio; values below 1 mean protective, above 1 mean increased risk. Females (sex = 2) have about 40% lower hazard than males in lung.
3. Kaplan-Meier curves with tidy(survfit)
Every row is one event time. The strata column appears because we fit ~ sex. With strata, you can pipe straight into ggplot2:
tidy() omits the strata column. survfit(Surv(time, status) ~ 1, data = lung) returns a single overall curve, so the tidy output has 8 columns instead of 9.4. AFT models with tidy(survreg)
survreg uses the accelerated failure time (AFT) parameterization, so coefficients are on the log-time scale. A positive estimate means longer survival.
tidy() vs base summary() and gtsummary
Three tools cover the same job from different angles. Pick by what you do next with the output.
| Tool | Output type | Best for |
|---|---|---|
summary(fit) |
printed text + nested list | Quick console check |
broom::tidy(fit) |
tibble (data frame) | dplyr piping, ggplot, custom tables |
gtsummary::tbl_regression(fit) |
rendered HTML/Word table | Final reports without manual formatting |
Use tidy() whenever the next step is code: filtering terms, combining models, drawing a forest plot, writing to CSV. Use gtsummary for the final document; under the hood it calls broom::tidy() for you.
tidy() returns a tibble, every dplyr verb, every ggplot geom, and every gt/flextable layout works without any custom shim. This is why broom is in the tidymodels meta-package even when you only fit one model.Common pitfalls
Pitfall 1: forgetting exponentiate = TRUE. The default is the log scale, which is rarely what you want to report. If your hazard ratio column looks like 0.017 instead of 1.02, you forgot the argument.
Pitfall 2: tidy() on a Surv object instead of a fit. Surv(time, status) is a response object, not a model. Pass the output of coxph(), survfit(), or survreg(), not the bare Surv call.
tidy.survfit.cox for adjusted survival curves. If you call survfit(cox_fit, newdata = ...) you get a survfitcox object. tidy() will fall back to the generic survfit method and may drop the adjustment metadata. For Cox-adjusted curves, use ggsurvfit::tidy_survfit() instead.Pitfall 3: mixing conf.int with level. Older broom versions used conf.level. Some tutorials still show level = 0.9; the current argument name is conf.level. Check the version with packageVersion("broom") if you see an unused-argument warning.
Try it yourself
Try it: Fit a Cox model on lung with age, sex, and ph.karno as predictors. Use tidy() to produce a hazard-ratio table with 95% confidence intervals. Save the result to ex_hr_table.
Click to reveal solution
Explanation: Combining exponentiate = TRUE with conf.int = TRUE returns hazard ratios alongside their 95% interval in a single call. No manual exp() or confint() step is required.
Related broom functions for survival
After mastering tidy(), look at:
glance(): one-row model summary withn,nevent,concordance, andAICaugment(): per-observation residuals and predictions (Cox only)tidy_survfit()in the ggsurvfit package: drop-in replacement that handles Cox-adjusted curvesgtsummary::tbl_regression(): formatted regression table built on top of broom
For a forest plot of hazard ratios, pipe tidy(cox_fit, exponentiate = TRUE, conf.int = TRUE) into ggplot2::geom_pointrange(). Use a log x-axis so HR = 1 sits in the middle.
See the official broom documentation for survival methods for the complete argument list per method.
FAQ
How do I get hazard ratios from a coxph object?
Call tidy(cox_fit, exponentiate = TRUE). The estimate column then holds hazard ratios instead of log coefficients. Add conf.int = TRUE to include 95% confidence bounds in the same call. This replaces the older two-step pattern of exp(coef(cox_fit)) plus exp(confint(cox_fit)).
Does broom tidy work with survfit objects?
Yes. tidy(survfit_object) returns one row per event time with columns for time, n.risk, n.event, estimate (survival probability), std.error, conf.high, and conf.low. If the survfit call was stratified, a strata column identifies each group.
What is the difference between tidy(), glance(), and augment()?
tidy() returns one row per model term. glance() returns one row total summarizing the whole model (n, log-likelihood, AIC, concordance). augment() returns one row per observation with predictions, residuals, or fitted values. All three are S3 generics that dispatch on model class.
Can tidy() return median survival from a survfit object?
Not directly. tidy(survfit_fit) gives the full survival curve, but median survival is a summary statistic, not a row in the tidy output. Use summary(survfit_fit)$table or survminer::surv_median() to extract median survival time per stratum.
Why does tidy() drop my model name or call?
By design. tidy() returns coefficient-level data, not metadata. Model-level information lives in glance(). If you fit many models with purrr::map(), combine them with bind_rows(.id = "model") after tidying so the model identifier becomes a column.