parsnip discrim_regularized() in R: Fit RDA Models
The parsnip discrim_regularized() function defines a regularized discriminant analysis (RDA) model in R. It blends linear and quadratic discriminant analysis through two shrinkage knobs, so one model can sit anywhere between LDA and QDA.
discrim_regularized() # default RDA spec, klaR engine discrim_regularized(frac_common_cov = 1) # behaves like LDA discrim_regularized(frac_common_cov = 0) # behaves like QDA discrim_regularized(frac_identity = 0.5) # shrink covariance to identity discrim_regularized(frac_common_cov = tune()) # mark a parameter for tuning discrim_regularized() |> set_engine("klaR") # the only engine discrim_regularized() |> fit(Species ~ ., train) # fit to training data predict(rda_fit, test, type = "prob") # per-class probabilities
Need explanation? Read on for examples and pitfalls.
What discrim_regularized() does
discrim_regularized() creates a regularized discriminant analysis specification. Like its siblings it belongs to the parsnip model family but ships in the companion discrim package, so you load both tidymodels and discrim. The function returns a model specification, not a fitted model. You pair it with the klaR engine, set the mode, then call fit().
RDA, introduced by Friedman in 1989, sits on a continuum between LDA and QDA. Two fractions control where it lands. frac_common_cov decides how much covariance is pooled across classes, and frac_identity decides how strongly each covariance is shrunk toward the identity matrix. Tuning those two values often beats picking plain LDA or QDA.
discrim_regularized() syntax and arguments
discrim_regularized() exposes two real hyperparameters, which is what sets it apart from discrim_linear() and discrim_quad(). Those siblings have nothing to tune. RDA has two knobs that move it across the LDA-to-QDA spectrum.
| Argument | Purpose | Range |
|---|---|---|
frac_common_cov |
Fraction of covariance pooled across classes. 1 = LDA, 0 = QDA | 0 to 1 |
frac_identity |
Fraction shrinking each covariance toward the identity matrix | 0 to 1 |
mode |
Prediction type | "classification" (the only valid mode) |
engine |
Computational backend | "klaR" (the only engine) |
A complete specification chains the spec with set_engine() and set_mode().
The printed object confirms both fractions, the mode, and the engine. Nothing is fitted yet, so this spec is reusable across resamples and workflows.
klaR is the only engine. Unlike discrim_quad(), which also offers sparsediscrim, discrim_regularized() always calls klaR::rda(). There is no engine choice to make, so set_engine("klaR") is optional but worth keeping for clarity.Fit a regularized discriminant model
fit() estimates the class means and the regularized covariance matrices. Split the data first so you can measure honest accuracy on rows the model never saw. The iris dataset works well: three classes and four numeric predictors.
Parsnip maps frac_common_cov to klaR's lambda and frac_identity to gamma, so the printed call shows lambda = 0.5 and gamma = 0.1. Predictions arrive as tidy tibbles.
Bind the predictions back to the test set to score the model with yardstick's accuracy().
Tune frac_common_cov and frac_identity
The point of RDA is to let the data choose the LDA-to-QDA blend. Mark both arguments with tune(), build a grid, and resample with cross-validation. The dials package supplies parameter objects named after each argument.
On iris the classes are well separated, so a partial pooling of covariance scores highest. The gap between settings widens on harder data.
select_best(rda_res, metric = "accuracy") to finalize_model() to lock the tuned values, then refit on the full training set before predicting on test data.discrim_regularized() vs LDA and QDA
RDA contains LDA and QDA as special cases. Set frac_common_cov to 1 and RDA reproduces linear discriminant analysis. Set it to 0 and RDA reproduces quadratic discriminant analysis. Every value in between is a mixture that neither sibling can express.
| Setting | Equivalent model | Boundary |
|---|---|---|
frac_common_cov = 1 |
LDA (discrim_linear()) |
Linear |
frac_common_cov = 0 |
QDA (discrim_quad()) |
Quadratic |
0 < frac_common_cov < 1 |
Regularized blend | Between |
frac_identity > 0 |
Extra shrinkage to identity | Stabilized |
discrim_regularized() exposes that slider so cross-validation can land between them, which is useful when classes are too small for stable QDA but too distinct for LDA.Common pitfalls
RDA fails in a few predictable ways. Each has a clear fix.
discrimnot loaded.discrim_regularized()is not in parsnip core. Withoutlibrary(discrim)you getcould not find function.- Fractions outside 0 to 1. Both
frac_common_covandfrac_identityare fractions. Values above 1 or below 0 produce errors or meaningless covariance estimates. - Tuning on a single split. One train-test split makes the chosen fractions noisy. Always resample with
vfold_cv()before trusting a setting.
tidymodels alone is not enough. The core tidymodels bundle does not attach discrim. Always call library(discrim) in the same script, or the model function will not be found.Try it yourself
Try it: Tune frac_common_cov only, holding frac_identity at 0, on iris_train with 5-fold cross-validation, and report the best accuracy. Save the tuning result to ex_res.
Click to reveal solution
Explanation: Fixing frac_identity at 0 keeps the second regularizer off, so the grid only varies the LDA-to-QDA blend. grid_regular() with levels = 5 spreads frac_common_cov evenly from 0 to 1.
Related parsnip functions
These functions pair naturally with discrim_regularized().
- discrim_linear() defines an LDA model with a shared covariance.
- discrim_quad() defines a QDA model with one covariance per class.
discrim_flexible()fits flexible discriminant analysis via basis expansion.set_engine()selects the computational backend.tune()marks an argument for grid search.
FAQ
What is the difference between discrim_regularized() and discrim_quad()?
discrim_quad() fits quadratic discriminant analysis with a separate covariance matrix per class and has no tuning parameters. discrim_regularized() adds two fractions, frac_common_cov and frac_identity, that pull the covariance estimates toward a pooled matrix or toward the identity. QDA is one fixed point on that scale, recovered when frac_common_cov is 0. RDA can sit anywhere between LDA and QDA, which helps when a class is too small for stable QDA.
What do frac_common_cov and frac_identity control?
frac_common_cov sets how much of each class covariance is replaced by one pooled covariance shared across all classes. A value of 1 gives linear discriminant analysis; 0 gives quadratic. frac_identity sets how strongly each covariance is shrunk toward the identity matrix, which stabilizes estimates when predictors are correlated or rows are scarce. Both are fractions between 0 and 1, and both can be tuned.
Which engine does discrim_regularized() support?
discrim_regularized() supports a single engine, "klaR", which calls klaR::rda() under the hood. There is no alternative backend, so set_engine("klaR") is optional, though keeping it makes the specification explicit. Run discrim_regularized() |> translate() to see the exact engine call parsnip will make, including how the two fractions map to klaR's lambda and gamma.
Does discrim_regularized() need feature scaling?
No. RDA is built on covariance matrices, and rescaling predictors does not change the fitted decision boundary. You can still center and scale inside a recipe for consistency with other models in a workflow, but discrim_regularized() does not require it. What does matter is roughly normal predictors within each class, since discriminant analysis assumes class-conditional normality.
How do I tune a regularized discriminant model in R?
Mark frac_common_cov and frac_identity with tune(), build a grid with grid_regular(frac_common_cov(), frac_identity()), and pass it to tune_grid() with a vfold_cv() resampling object. Inspect results with show_best() and lock the winner with select_best() plus finalize_model(). Cross-validation is essential here because the right blend depends on the data, not a fixed rule.
For the full argument reference, see the discrim package documentation.