parsnip svm_linear() in R: Linear SVM Specification
The parsnip svm_linear() function defines a linear support vector machine, a maximum-margin classifier or regressor, for tidymodels. It gives you one interface that fits with the LiblineaR or kernlab engine underneath.
svm_linear() # default spec, LiblineaR engine svm_linear() |> set_mode("classification") # classify a factor outcome svm_linear() |> set_mode("regression") # predict a numeric outcome svm_linear(cost = 2) # set the margin-violation penalty svm_linear(margin = 0.1) # set epsilon for regression svm_linear() |> set_engine("kernlab") # swap the backend engine fit(spec, Species ~ ., data = iris) # train on a dataset
Need explanation? Read on for examples and pitfalls.
What svm_linear() does
svm_linear() is a model specification, not a fitted model. It records your choice of a linear support vector machine and its hyperparameters, but no data touches it until you call fit(). This separation lets you reuse one specification across many datasets or resampling folds.
A linear SVM finds the straight boundary, a hyperplane, that separates classes with the widest possible margin. The cost argument trades margin width against misclassified points: a high cost fits the training data tightly, a low cost keeps a wider, smoother margin. For regression, the model fits a flat band of width margin and penalizes only points outside it.
The function belongs to the tidymodels framework. Because parsnip standardizes the interface, the same svm_linear() code runs on the fast LiblineaR engine or the kernlab engine with only one line changed.
fit() turns it into a trained model object. Keeping those two steps apart is what makes tidymodels workflows reproducible across resamples.LiblineaR engine needs the LiblineaR package installed, and set_engine("kernlab") needs the kernlab package. Install the engine package before you fit, or R reports that the engine is not available.svm_linear() syntax and arguments
svm_linear() takes two hyperparameters and two setup verbs. The arguments control how strict the margin is, while set_engine() and set_mode() finish the specification.
The cost argument sets the penalty for points that fall inside the margin or on the wrong side of the boundary, where a larger cost means a tighter, lower-bias fit. The margin argument sets the epsilon insensitivity band used only in regression, where residuals smaller than margin cost nothing.
The mode is never "unknown" at fit time. A linear SVM can predict a class or a number, so you must call set_mode("classification") or set_mode("regression") before fitting. You can pass the engine through set_engine() instead of the engine argument, which is the more common tidymodels style.
Fit a linear SVM: four examples
Every example below uses a built-in R dataset. The iris data drives the classification examples and mtcars drives the regression example, so the code runs anywhere with no downloads.
Example 1: Classify with the default LiblineaR engine
Build the specification, then fit it to data. The LiblineaR engine trains a linear SVM quickly and is the parsnip default.
The model assigns a class to every iris flower, and comparing those labels to the true species gives a training accuracy near 97%. A straight boundary separates the three species well because the petal measurements are close to linearly separable.
Example 2: Predict species for new rows
predict() returns a tidy tibble with one row per input row. Each prediction is the class on whichever side of the hyperplane the row falls.
The .pred_class column holds the predicted species as a factor. The LiblineaR engine returns hard class labels only; for per-class probabilities you switch to the kernlab engine, shown in Example 4.
Example 3: Fit a regression linear SVM on mtcars
Switch the mode to "regression" and the same function predicts a number. The margin argument now controls the width of the insensitivity band.
The regression SVM returns a numeric .pred column. Residuals inside the margin band of 0.1 add nothing to the loss, so the fit ignores tiny errors and concentrates on the larger ones.
Example 4: Get class probabilities with the kernlab engine
Swap to kernlab when you need predicted probabilities. The LiblineaR engine cannot produce them, but kernlab can.
The probability columns are named .pred_<class> and each row sums to one. kernlab estimates these with Platt scaling, which fits a logistic curve to the SVM decision values.
step_normalize(all_numeric_predictors()) in a recipe so each predictor contributes fairly.Compare svm_linear() engines
svm_linear() runs on two engines that share the same code. You swap engines with one set_engine() call, and parsnip translates cost and margin to each backend.
| Engine | Package | Strengths | Use when |
|---|---|---|---|
LiblineaR |
LiblineaR | Very fast on wide or sparse data | Large datasets; the default choice |
kernlab |
kernlab | Supports class probabilities, scales predictors | You need .pred probability columns |
The decision rule is short. Use LiblineaR for speed on large or sparse data, and switch to kernlab when you need class probabilities or want to match other kernlab SVM models in the same project.
Common pitfalls
Three mistakes catch most newcomers to svm_linear(). Each one below shows the problem and the fix.
The most common is forgetting to set the mode. A linear SVM can classify or predict a number, so parsnip cannot guess which one you want and fit() fails until you call set_mode().
The second pitfall is asking the LiblineaR engine for probabilities. predict(fit, type = "prob") errors unless the model was fit with the kernlab engine. The third is leaving predictors unscaled, which lets a large-scale variable dominate the cost penalty and skews the boundary.
set_engine("kernlab") if your workflow needs .pred_<class> probability columns.Try it yourself
Try it: Fit a regression linear SVM on mtcars with cost = 2, then predict mpg for the first row. Save the prediction to ex_pred.
Click to reveal solution
Explanation: Setting the mode to "regression" makes svm_linear() predict the numeric mpg column, and cost = 2 tightens the fit. Row 1 of mtcars is the Mazda RX4, whose true mpg is 21, so the linear SVM lands close.
Related parsnip functions
svm_linear() works alongside the rest of the parsnip model family. These functions cover the neighboring tasks in a tidymodels project.
svm_poly()defines a support vector machine with a polynomial kernel.svm_rbf()defines a support vector machine with a radial basis kernel.logistic_reg()defines a linear classifier that returns probabilities directly.set_engine()chooses the computational backend for any specification.fit()trains a specification on data and returns a model object.
FAQ
What package is svm_linear() in?
svm_linear() ships in core parsnip, so library(tidymodels) or library(parsnip) makes it available. The function only describes the model, though, and the actual fitting happens in an engine package. The default LiblineaR engine needs the LiblineaR package, and set_engine("kernlab") needs the kernlab package installed separately.
What is the difference between svm_linear() and svm_rbf()?
svm_linear() fits a straight decision boundary, a hyperplane, between classes. svm_rbf() uses a radial basis kernel that bends the boundary into flexible curves. Choose svm_linear() when the classes are close to linearly separable or the data has many predictors, and svm_rbf() when the boundary is clearly non-linear. The linear model trains faster and is easier to interpret.
What engine does svm_linear() use by default?
The default engine is LiblineaR, a fast C++ library built for linear classification and regression on large or sparse datasets. You can confirm or change it with set_engine(), and show_engines("svm_linear") lists every registered option. Switch to kernlab when you need predicted class probabilities, which LiblineaR does not provide.
Does svm_linear() give class probabilities?
Only with the kernlab engine. The default LiblineaR engine returns hard class labels, so predict(fit, type = "prob") errors. Refit with set_engine("kernlab") and kernlab estimates probabilities with Platt scaling. If you need calibrated probabilities and a linear boundary together, kernlab is the engine to use.
How do I tune the cost parameter in svm_linear()?
Set the argument to tune(), as in svm_linear(cost = tune()), then pass the specification to tune_grid() with a resampling object such as vfold_cv(). The framework scores a grid of cost values with cross-validation. Use select_best() to pick the winner, then finalize_workflow() to lock the value before the final fit.
For the full argument reference, see the parsnip svm_linear() documentation.