parsnip survival_reg() in R: Parametric Survival Models
The parsnip survival_reg() function defines a parametric survival regression model for time-to-event data in tidymodels. It gives you one interface that fits with the survival, flexsurv, or flexsurvspline engine underneath.
survival_reg() # default Weibull, survival engine survival_reg(dist = "exponential") # exponential AFT model survival_reg(dist = "lognormal") # log-normal accelerated failure survival_reg() |> set_engine("flexsurv") # flexible parametric engine fit(spec, Surv(time, status) ~ ., data = df) # train on a Surv() outcome predict(fit, new_data, type = "time") # predicted event time predict(fit, new_data, type = "survival", eval_time = t) # survival prob
Need explanation? Read on for examples and pitfalls.
What survival_reg() does
survival_reg() is a model specification, not a fitted model. It records your intent to build a parametric survival model and the distribution you want, but no data touches it until you call fit(). The same specification can then be reused across many datasets or resampling folds.
Survival regression models the time until an event happens, such as death, failure, or churn. The defining feature of this data is censoring: some subjects have not had the event by the end of the study, so you only know their time exceeds some value. A parametric survival model assumes the event times follow a named distribution, like Weibull or log-normal, and estimates that distribution from both observed and censored records.
The function belongs to the tidymodels framework. Because parsnip standardizes the interface, the same survival_reg() code runs on the classic survival engine or the more flexible flexsurv engine with only one line changed.
linear_reg().survival_reg() syntax and arguments
The function signature is short. Most of the modeling choices live in dist and the engine.
The mode argument is fixed at "censored regression", so you rarely set it. The engine argument picks the backend package that does the math. The dist argument names the assumed event-time distribution; leaving it NULL lets the engine choose its default, which is Weibull for the survival engine.
library(censored) registers the survival, flexsurv, and flexsurvspline engines with parsnip. Without it, parsnip reports that the engine is not available even though the survival package is installed.Fitting a survival_reg() model: examples
Start by loading the framework and defining a specification. The lung dataset from the survival package records survival times for advanced lung cancer patients.
The outcome must be a Surv() object that pairs the follow-up time with the event status. In lung, status is coded 1 for censored and 2 for dead, which Surv() interprets automatically.
To get predictions, pick a type. Use "time" for a single predicted event time per subject.
For a survival curve, use type = "survival" and pass the times at which to evaluate it through eval_time. The result is a list column of small tibbles, one per subject.
Switching distributions takes one argument. An exponential model assumes a constant hazard, while a log-normal model allows the hazard to rise and then fall.
dist values on the same data and pick the one with the best fit statistic. The log-normal often beats the exponential because real hazards are rarely constant over time.survival_reg() vs other censored models
The censored package registers several model types. Choose by the question you are asking, not just the data shape.
| Model | parsnip function | Best for |
|---|---|---|
| Parametric survival | survival_reg() |
Smooth survival curves, extrapolation |
| Cox proportional hazards | proportional_hazards() |
Hazard ratios, no distribution assumption |
| Survival decision tree | decision_tree() |
Non-linear effects, interpretable splits |
| Boosted survival model | boost_tree() |
Highest predictive accuracy |
Use survival_reg() when you trust a distributional assumption and want predicted times or curves that extend beyond the observed follow-up. Switch to proportional_hazards() when you care about hazard ratios and prefer not to commit to a distribution. Reach for tree or boosting engines when prediction accuracy matters more than interpretability.
Common pitfalls
Forgetting to load the censored package is the most frequent error. The engine looks registered in the docs but fails at fit time.
Passing a bare numeric outcome instead of a Surv() object silently changes the problem. survival_reg() needs the censoring indicator, so the left side of the formula must be Surv(time, status), never time alone.
Requesting type = "survival" without eval_time raises an error. Survival probabilities are only defined at specific times, so you must say which times you want.
Try it yourself
Try it: Fit a Weibull survival_reg() model on the lung dataset using age and sex as predictors, then predict the event time for the first two rows. Save the predictions to ex_pred.
Click to reveal solution
Explanation: The specification sets the Weibull distribution and survival engine, then fit() trains it on a Surv() outcome. Asking for type = "time" returns one predicted event time per row.
Related tidymodels functions
- proportional_hazards() defines a Cox model for hazard ratios.
- set_engine() swaps the backend, such as flexsurv.
- fit() trains a specification on data.
- predict() returns times, survival, or hazard.
- extract_fit_engine() reaches the underlying survreg object.
FAQ
What is survival_reg() in R? survival_reg() is a parsnip function that defines a parametric survival regression model for censored, time-to-event data. It specifies the assumed event-time distribution and the engine, but does not touch data until you call fit(). The model handles censored records, where the event has not yet happened, by treating their times as lower bounds rather than missing values.
What is the difference between survival_reg() and proportional_hazards()? survival_reg() fits a parametric model that assumes a named distribution, such as Weibull, and can predict absolute event times. proportional_hazards() fits a Cox model that makes no distributional assumption and instead estimates hazard ratios between groups. Use survival_reg() when you want predicted times or smooth curves; use proportional_hazards() when relative risk is the goal.
What distributions does survival_reg() support? The survival engine supports weibull, exponential, gaussian, logistic, lognormal, and loglogistic distributions through the dist argument. The flexsurv and flexsurvspline engines add further options, including spline-based hazards. Weibull is the default for the survival engine because it flexibly models hazards that rise or fall over time.
Why do I need the censored package for survival_reg()? survival_reg() lives in core parsnip, but its engines are registered by the censored extension package. Running library(censored) connects parsnip to the survival, flexsurv, and flexsurvspline backends. Without that call, parsnip reports the engine is unavailable even though the underlying survival package is installed.
How do I predict survival probabilities with survival_reg()? Call predict() with type = "survival" and supply an eval_time vector of the times at which you want probabilities. The result is a list column of tibbles, each holding the evaluation times and matching survival probabilities for one subject. Use tidyr::unnest() to flatten it into a tidy frame for plotting.