parsnip discrim_linear() in R: Build an LDA Classifier
The parsnip discrim_linear() function defines a linear discriminant analysis (LDA) model in R, a fast classifier that draws straight-line boundaries between classes and plugs into any tidymodels workflow.
discrim_linear() # bare spec, classification discrim_linear(mode = "classification") # set mode inline discrim_linear() |> set_engine("MASS") # default MASS::lda engine discrim_linear(penalty = 1) |> set_engine("mda") # regularized LDA discrim_linear() |> set_engine("sda") # shrinkage discriminant discrim_linear() |> fit(y ~ ., data = df) # define and train predict(fit, new_data, type = "prob") # class probabilities
Need explanation? Read on for examples and pitfalls.
What discrim_linear() does
discrim_linear() declares a classifier, it does not train one. The function returns a model specification: an engine-agnostic description of the LDA model you want. No data touches it until you call fit(). That split keeps your modeling code portable across the whole tidymodels stack, so you can swap engines or resampling schemes without rewriting the spec.
Linear discriminant analysis assumes every class follows a Gaussian distribution and that all classes share one common covariance matrix. Under that assumption the decision boundary between any two classes is a straight line (or a flat hyperplane in higher dimensions). LDA estimates each class mean and the pooled covariance, then assigns a new observation to the class whose centre is closest in that covariance-adjusted distance.
discrim_linear() syntax and arguments
Two optional arguments plus the engine control how LDA is estimated. Any argument you leave out falls back to the engine default, so a bare discrim_linear() call is valid on its own.
| Argument | What it controls | Typical value |
|---|---|---|
mode |
Only "classification" is supported |
"classification" |
engine |
Fitting backend, set with set_engine() |
"MASS", "mda", "sda" |
penalty |
Amount of regularization (mda and sda engines) | 0 to 1 |
regularization_method |
Covariance shrinkage style for the sparsediscrim engine | "diagonal", "shrink_cov" |
You build a spec by piping the constructor into set_engine() and set_mode().
The printed spec shows the engine and any arguments you set. Nothing is fitted yet, so this object is cheap to create and reuse across resamples.
Fit an LDA model
Pass a formula and a data frame to fit(), then predict on new rows. LDA needs numeric predictors and no scaling, since the pooled covariance handles differing variances. Here it classifies the three species in the built-in iris dataset.
The fitted object wraps the trained MASS::lda model and predicts a tidy tibble. Because LDA produces posterior probabilities, you can also ask for the class probabilities behind each label.
The type = "prob" argument returns one .pred_<class> column per class, and the values in each row sum to 1. These probabilities feed straight into yardstick metrics such as roc_auc().
Choosing an engine: MASS, mda, and sda
The engine decides the algorithm behind a shared interface. The default MASS engine wraps MASS::lda() and gives you textbook LDA with no tuning parameters. The other engines add regularization, which helps when you have many predictors relative to rows.
| Engine | Backend | Best for |
|---|---|---|
MASS |
MASS::lda() |
Standard LDA, the default, no tuning |
mda |
mda::fda() |
Penalized LDA via a ridge penalty |
sda |
sda::sda() |
Shrinkage estimates for wide data |
sparsediscrim |
sparsediscrim methods | Regularized covariance, high dimensions |
The penalty argument is ignored by the MASS engine but active on mda and sda. Switching engines is a one-line change.
mda engine needs the mda package installed, sda needs sda, and sparsediscrim needs sparsediscrim. Run show_engines("discrim_linear") to list every engine and the modes it supports.Common pitfalls
Most discrim_linear() errors trace back to a missing package. The function is exported by the discrim package, a parsnip extension, so loading parsnip alone is not enough to register the model.
Adding library(discrim) registers the model and the spec builds cleanly. Two more traps to watch:
- LDA has no regression mode. Calling
set_mode("regression")errors because discriminant analysis only predicts class labels, never a continuous number. - LDA fails when a predictor is constant within a class or when two predictors are perfectly collinear, because the pooled covariance matrix becomes singular. Drop or combine those columns before fitting.
Try it yourself
Try it: Build an LDA spec with the MASS engine, fit it to classify Species from all columns of iris, and save the fitted model to ex_lda_fit.
Click to reveal solution
Explanation: The bare discrim_linear() constructor builds the spec, set_engine("MASS") picks the backend, and fit() trains the classifier on iris. The result is a parsnip model_fit wrapping the underlying lda object.
Related parsnip functions
discrim_linear() is one classifier in a family of parsnip specifications. When the linear boundary is too rigid or the problem is not classification, these neighbors share the same set_engine() and fit() workflow:
discrim_quad()fits quadratic discriminant analysis with a per-class covariance for curved boundaries.discrim_flexible()fits flexible discriminant analysis for nonlinear decision regions.naive_Bayes()is a fast probabilistic classifier that assumes feature independence.multinom_reg()fits multinomial logistic regression as a linear alternative.set_engine()chooses the computational backend for any spec.
See the discrim package reference for the full list of supported engines.
FAQ
What package is discrim_linear() in? The discrim_linear() function is exported by the discrim package, a parsnip extension for discriminant analysis models. Loading parsnip alone throws a "could not find function" error. Always run library(discrim) (or library(tidymodels) plus library(discrim)) before defining the spec. The discrim package registers the MASS, mda, sda, and sparsediscrim engines.
What is the difference between discrim_linear() and discrim_quad()? Both fit discriminant analysis classifiers, but they differ in the covariance assumption. discrim_linear() assumes every class shares one covariance matrix, which produces a straight-line decision boundary. discrim_quad() gives each class its own covariance, which produces a curved boundary and adds flexibility at the cost of more parameters. Use LDA on small or noisy data and QDA when classes clearly differ in spread.
Does discrim_linear() support regression? No. Linear discriminant analysis is a classification-only algorithm, so the spec accepts set_mode("classification") and nothing else. Calling set_mode("regression") raises an error stating that regression is not a known mode. For a numeric outcome, use linear_reg() or another regression model spec instead.
How do I tune discrim_linear()? Mark the penalty argument for tuning by setting it to tune(), as in discrim_linear(penalty = tune()), and pick the mda or sda engine since the MASS engine has no tunable parameter. Build a grid with the dials package and pass it to tune_grid() with a resampling object. The tuning step reports the penalty that scores best on your chosen metric.
What engines does discrim_linear() support? The default MASS engine fits classic LDA with MASS::lda(). The mda engine adds a ridge penalty, sda applies shrinkage estimates for wide datasets, and sparsediscrim exposes the regularization_method argument for high-dimensional covariance shrinkage. Call show_engines("discrim_linear") to see every registered engine and its supported mode.