parsnip naive_Bayes() in R: Build a Naive Bayes Classifier
The parsnip naive_Bayes() function defines a naive Bayes classification model in R, a fast probabilistic classifier that plugs into any tidymodels engine.
naive_Bayes() # bare spec, classification naive_Bayes(mode = "classification") # set mode inline naive_Bayes(Laplace = 1) # smoothing for zero counts naive_Bayes(smoothness = 1.5) # kernel density bandwidth naive_Bayes() |> set_engine("klaR") # default klaR engine naive_Bayes() |> set_engine("naivebayes") # switch the fitting engine naive_Bayes(Laplace = 1) |> fit(y ~ ., data = df) # define and train
Need explanation? Read on for examples and pitfalls.
What naive_Bayes() does
naive_Bayes() declares a classifier, it does not train one. The function returns a model specification: an engine-agnostic description of the naive Bayes classifier you want. No data touches it until you call fit(). That split keeps your modeling code portable across the whole tidymodels stack.
A naive Bayes classifier applies Bayes' theorem with one simplifying assumption: every predictor is conditionally independent of the others once you know the class. For each class it multiplies the class prior by the likelihood of each feature, then picks the class with the highest score. The assumption is rarely true, yet the classifier stays accurate and trains in a single pass over the data.
naive_Bayes() syntax and arguments
Two hyperparameters control how the classifier estimates probabilities. Both arguments are optional, and any you leave out falls back to the engine default.
| Argument | What it controls | Typical value |
|---|---|---|
smoothness |
Kernel density bandwidth for numeric predictors | 0.5 to 2 |
Laplace |
Additive smoothing for zero-frequency categories | 0 to 3 |
mode |
Only "classification" is supported |
"classification" |
engine |
Fitting backend, set with set_engine() |
"klaR", "naivebayes" |
You build a spec by piping the constructor into set_engine() and set_mode().
The printed spec shows your chosen arguments and the engine. Nothing is fitted yet, so this object is cheap to create and reuse.
Fit a naive Bayes classifier
Pass a formula and a data frame to fit(), then predict on new rows. Naive Bayes handles numeric and categorical predictors and needs no scaling. Here it classifies the three species in the built-in iris dataset.
The fitted object wraps the trained engine model and predicts a tidy tibble. Because naive Bayes is probabilistic, you can also ask for the class probabilities behind each label.
The type = "prob" argument returns one .pred_<class> column per class, and the values in each row sum to 1. These probabilities feed straight into yardstick metrics like roc_auc().
Choosing an engine: klaR vs naivebayes
The engine decides the algorithm behind a shared interface. The default klaR engine wraps klaR::NaiveBayes and supports both smoothness and Laplace. The naivebayes engine wraps naivebayes::naive_bayes, runs faster, and has a lighter dependency footprint.
klaR engine needs the klaR package installed, and naivebayes needs the naivebayes package. Run show_engines("naive_Bayes") to list every engine and the modes it supports.Common pitfalls
Most naive_Bayes() errors trace back to a missing package. The function is exported by the discrim package, not parsnip core, so loading parsnip alone is not enough.
Adding library(discrim) registers the model and the spec builds cleanly. Two more traps to watch:
- Naive Bayes has no regression mode. Calling
set_mode("regression")errors because the algorithm only predicts class labels, never a continuous number. - A categorical level never seen with a class gets a likelihood of zero, which wipes out the whole product. Set
Laplaceto a small positive number to add pseudo-counts and avoid that collapse.
Try it yourself
Try it: Build a naive Bayes spec with Laplace = 0.5, fit it to classify Species from all columns of iris, and save the fitted model to ex_nb_fit.
Click to reveal solution
Explanation: The spec sets Laplace smoothing and the mode, set_engine("klaR") picks the backend, and fit() trains the classifier on iris. The result is a parsnip model_fit wrapping the underlying NaiveBayes object.
Related parsnip functions
naive_Bayes() is one classifier in a family of parsnip specifications. When the independence assumption is too strong, these neighbors share the same set_engine() and fit() workflow:
discrim_linear()fits linear discriminant analysis for a linear class boundary.discrim_quad()fits quadratic discriminant analysis with a per-class covariance.rand_forest()averages many trees when predictor interactions matter.nearest_neighbor()classifies by distance-based voting.set_engine()chooses the computational backend for any spec.
See the tidymodels parsnip reference for the full list of supported engines.
FAQ
What package is naive_Bayes() in? The naive_Bayes() function is exported by the discrim package, a parsnip extension for discriminant and Bayesian classifiers. Loading parsnip alone throws a "could not find function" error. Always run library(discrim) (or library(tidymodels) plus library(discrim)) before defining the spec. The discrim package also registers the klaR, naivebayes, and h2o engines.
Does naive_Bayes() support regression? No. Naive Bayes is a classification-only algorithm, so the spec accepts set_mode("classification") and nothing else. Calling set_mode("regression") raises an error stating that regression is not a known mode. For a numeric outcome, use linear_reg() or another regression model spec instead.
What does the Laplace argument do? Laplace adds a small constant to every feature-class count before estimating probabilities. Without it, a categorical level that never appears with a class gets a probability of zero, and that zero wipes out the entire product for that class. Setting Laplace = 1 (add-one smoothing) is a common safe default that keeps every class in contention.
Should I use the klaR or naivebayes engine? Use the default klaR engine for general work; it is well tested and supports both smoothness and Laplace. Choose the naivebayes engine when you want faster fitting and a lighter dependency, especially on larger datasets. Both share the same parsnip interface, so switching engines is a one-line change to set_engine().
How do I tune the smoothness parameter? Mark the argument for tuning by setting it to tune(), as in naive_Bayes(smoothness = tune()). Then build a grid with the dials package and pass it to tune_grid() along with a resampling object. The tuning step searches the candidate values and reports the bandwidth that scores best on your chosen metric.