parsnip proportional_hazards() in R: Cox Survival Models
The parsnip proportional_hazards() function defines a Cox proportional hazards model for time-to-event data in tidymodels. It gives you one interface that fits a classic or penalized Cox model underneath.
proportional_hazards() # Cox model, survival engine proportional_hazards() |> set_engine("survival") # classic Cox via coxph proportional_hazards(penalty = 0.01) |> set_engine("glmnet") # penalized Cox fit(spec, Surv(time, status) ~ ., data = df) # train on a Surv() outcome predict(fit, new_data, type = "time") # predicted event time predict(fit, new_data, type = "linear_pred") # linear predictor (log-hazard) predict(fit, new_data, type = "survival", eval_time = t) # survival probability
Need explanation? Read on for examples and pitfalls.
What proportional_hazards() does
proportional_hazards() is a model specification, not a fitted model. It records your intent to build a Cox proportional hazards model in tidymodels, but no data touches it until you call fit(). The same specification can be reused across datasets or resampling folds.
The Cox model studies time-to-event data, where the outcome is the time until an event such as death, relapse, or customer churn. Its defining feature is that it models the hazard rate, the instantaneous risk of the event, as a baseline hazard multiplied by exp() of a linear predictor. The word proportional means each covariate scales the hazard by a constant factor that does not change over time.
Unlike survival_reg(), the Cox model never assumes a shape for the baseline hazard. It is semi-parametric: it estimates covariate effects while leaving the baseline unspecified. That makes it the default choice when hazard ratios, not predicted times, are the goal. See the parsnip reference for the full argument list.
proportional_hazards() hands you clean exp(coef) ratios but needs extra steps to produce an absolute predicted time.proportional_hazards() syntax and arguments
The signature is short, and most arguments only matter for one engine. The survival engine has no tuning parameters at all.
The mode argument is fixed at "censored regression", so you rarely set it. The engine argument picks the backend: survival fits a classic Cox model with survival::coxph(), while glmnet fits a penalized Cox model for variable selection. The penalty and mixture arguments tune that penalization and are ignored by the survival engine.
library(censored) registers the survival and glmnet engines with parsnip. Without it, parsnip reports that the engine is unavailable even though the survival package is installed.Fitting a proportional_hazards() model: examples
Start by loading the framework and defining a specification. The lung dataset from the survival package records survival times for advanced lung cancer patients.
The outcome must be a Surv() object that pairs the follow-up time with the event status. In lung, status is coded 1 for censored and 2 for dead, which Surv() reads automatically.
The exp(coef) column holds the hazard ratios. A value of 0.578 for sex means the higher-coded group has roughly 58 percent of the reference hazard, while 1.587 for ph.ecog means each one-point rise in the performance score lifts the hazard by about 59 percent.
extract_fit_engine() and run survival::cox.zph() on it. A small p-value flags a covariate whose effect drifts over time, which breaks the model.To get a survival curve, use type = "survival" and pass the times at which to evaluate it through eval_time. The result is a list column of small tibbles, one per subject.
The linear predictor, the log-hazard relative to the average subject, comes from type = "linear_pred". Negative values mean lower risk than average, positive values mean higher.
NA.proportional_hazards() vs other censored models
The censored package registers several model types. Choose by the question you are asking, not just the shape of the data.
| Model | parsnip function | Best for |
|---|---|---|
| Cox proportional hazards | proportional_hazards() |
Hazard ratios, no distribution assumption |
| Parametric survival | survival_reg() |
Predicted times, smooth extrapolation |
| Penalized Cox | proportional_hazards() + glmnet |
Many predictors, variable selection |
| Survival decision tree | decision_tree() |
Non-linear effects, interpretable splits |
| Boosted survival model | boost_tree() |
Highest predictive accuracy |
Use proportional_hazards() when relative risk between groups is the goal and you prefer not to commit to a distribution. Switch to survival_reg() when you trust a distributional assumption and want predicted times or curves that extend past the observed follow-up. Reach for tree or boosting engines when raw predictive accuracy outweighs interpretability.
Common pitfalls
Forgetting to load the censored package is the most frequent error. The engine looks registered in the documentation but fails at fit time.
Passing a bare numeric outcome instead of a Surv() object silently changes the problem. The Cox model needs the censoring indicator, so the left side of the formula must be Surv(time, status), never time alone.
Requesting type = "survival" without eval_time raises an error. Survival probabilities are only defined at specific times, so you must state which times you want.
Try it yourself
Try it: Fit a Cox proportional_hazards() model on the lung dataset using age and sex as predictors, then get the hazard ratios with tidy(). Save the result to ex_hr.
Click to reveal solution
Explanation: The specification fits a Cox model with the survival engine, and tidy(exponentiate = TRUE) converts the log-hazard coefficients into hazard ratios. The sex ratio near 0.6 means the higher-coded group has lower risk.
Related tidymodels functions
- survival_reg() fits a parametric survival model with assumed distributions.
- set_engine() swaps the backend, such as glmnet for a penalized Cox model.
- fit() trains a specification on a
Surv()outcome. - predict() returns times, survival probabilities, or the linear predictor.
- extract_fit_engine() reaches the underlying
coxphobject.
FAQ
What is proportional_hazards() in R? proportional_hazards() is a parsnip function that defines a Cox proportional hazards model for censored, time-to-event data. It records the engine and any penalization, but does not touch data until you call fit(). The Cox model estimates how covariates scale the event hazard without assuming a shape for the baseline hazard, which makes it a semi-parametric model.
What is the difference between proportional_hazards() and survival_reg()? proportional_hazards() fits a Cox model that makes no distributional assumption and reports hazard ratios between groups. survival_reg() fits a parametric model that assumes a named distribution, such as Weibull, and can predict absolute event times. Use proportional_hazards() when relative risk is the goal; use survival_reg() when you want predicted times or smooth survival curves.
What does the proportional hazards assumption mean? The assumption says each covariate multiplies the hazard by a constant factor that stays the same at every point in time. If a treatment helps early but stops helping later, that effect is not constant and the assumption is violated. Check it by running survival::cox.zph() on the model extracted with extract_fit_engine().
Why do I need the censored package for proportional_hazards()? proportional_hazards() lives in core parsnip, but its engines are registered by the censored extension package. Running library(censored) connects parsnip to the survival and glmnet backends. Without that call, parsnip reports the engine is unavailable even though the underlying survival package is installed.
How do I get hazard ratios from a proportional_hazards() model? Fit the model, then call tidy() with exponentiate = TRUE. The Cox model estimates coefficients on the log-hazard scale, and exponentiating them converts each one into a hazard ratio. A ratio above 1 means higher risk, below 1 means lower risk, and the std.error and p.value columns help you judge significance.