caret knn3() in R: k-Nearest Neighbors Classification
The knn3() function in caret fits a k-nearest neighbors classifier directly, without the resampling overhead of train(). It accepts a formula or an x-y interface, returns a model object that predicts both classes and class probabilities, and is the classification cousin of knnreg().
knn3(Species ~ ., data = iris, k = 5) # formula interface knn3(x = iris[, 1:4], y = iris$Species, k = 5) # x, y interface knn3(Species ~ ., data = iris, k = 1) # 1-NN (max variance) knn3(Species ~ ., data = iris, k = 7, na.action = na.omit) # drop NA rows predict(fit, newdata = iris[1:5, ], type = "class") # predicted class predict(fit, newdata = iris[1:5, ], type = "prob") # class probabilities caret::knn3Train(train, test, cl, k = 5) # low-level vector path
Need explanation? Read on for examples and pitfalls.
What knn3() does in one sentence
knn3() is caret's formula-friendly k-nearest neighbors classifier. You hand it a formula and a data frame, pick a k, and it stores the training rows; calling predict() later finds the k closest training points to each new observation (by Euclidean distance on the predictors) and assigns the majority class.
There is no model to "fit" in the parametric sense. knn3() simply packages the training data, the response, and k into an object that predict() knows how to query. The actual classification work happens at prediction time, which is why k-NN is called a lazy learner. The entire training set travels with the model; prediction cost scales linearly with the number of training rows.
knn3() syntax and arguments
The signature has just three required pieces: predictors, an outcome, and a value of k. caret offers two equivalent entry points.
The formula form mirrors lm():
knn3(formula, data, subset, na.action, k = 5, ...)
The matrix form skips the formula expansion:
knn3(x, y, k = 5, ...)
formula: a model formula likeSpecies ~ .with a factor on the left.data: a data frame holding the columns named in the formula.x: a numeric matrix of predictors, one column per feature.y: a factor vector of class labels, one entry per row ofx.k: the number of nearest neighbors used at prediction time. Default 5.na.action:na.action = na.omitdrops rows with missing values before fitting.
x and y directly skips that work, which matters when you have hundreds of predictors or millions of rows.knn3() examples by use case
1. Fit a basic classifier on iris
The shortest call uses every column to predict Species. The fitted object prints its training size, k, and the class levels.
The returned object is of class knn3 and carries the training rows. Saving it to disk and loading it later is enough to score new data; there are no learned coefficients to ship.
2. Predict classes and probabilities
predict.knn3() has two prediction types. "class" returns a factor; "prob" returns a numeric matrix with one column per class.
The probability is the fraction of the k neighbors that belong to each class. With k = 5, possible values are 0, 0.2, 0.4, 0.6, 0.8, 1.0. Tied votes default to alphabetical class order. Use these probabilities to set custom decision thresholds or compute area under the ROC curve via pROC::roc().
3. Use the x, y interface for speed
When the predictors are already a numeric matrix and the response is a factor, skip the formula entirely.
The fit is identical to the formula version on the same columns; only the construction cost changes. For a dataset wider than a few hundred predictors, the matrix path is noticeably faster. It also makes it easy to drop in pre-standardized predictors: scale x once with scale() and pass the result.
4. Hold out a test set and check accuracy
k-NN scores on training data look unrealistically good because each row is its own nearest neighbor. Split the data first.
The 0.95 accuracy on held-out rows is an honest estimate. Re-running with k = 1 inflates training accuracy to 1.0 but typically drops test accuracy because the model memorizes noise. Larger k values smooth the boundary at the cost of letting rare classes get outvoted.
5. Compare k values without resampling
A quick sweep over candidate k values, scored once on a held-out set, is enough to pick a reasonable neighborhood size for exploration.
The sweet spot here is k = 7 to k = 11. A single split is a noisy estimator; for a defensible choice, switch to repeated cross-validation through train() once the rough range is known.
Petal.Width (a centimeter difference) the same as one unit of Sepal.Length. When features live on different scales, the larger-range variable silently dominates the distance. Run scale() or wrap knn3 inside a train() call with preProcess = c("center", "scale") to neutralize that bias.knn3() vs knn() and caret train(method = "knn")
knn3() is the model object; knn() is a one-shot call; train(method = "knn") is the resampled wrapper. All three compute the same neighbors and majority vote, but expose the result differently.
| Function | Returns | Resampling built in | Probability output |
|---|---|---|---|
knn3() (caret) |
a knn3 model object that supports predict() |
No, use train() for that |
Yes, predict(fit, type = "prob") |
class::knn() |
a factor of predicted classes only | No | No, only votes |
train(method = "knn") |
a train object with cross-validated metrics and a bestTune row |
Yes, via trControl |
Yes, via predict(fit, type = "prob") |
Pick knn3() when you want a saved model to predict() on later. Pick class::knn() for one-line scripts. Pick train(method = "knn") when you want caret to pick k via cross-validation. See the caret reference for the full option list.
Common pitfalls
Pitfall 1: passing a numeric outcome. knn3() is classification-only; for numeric outcomes use knnreg(). Check class(df$y) first.
Pitfall 2: forgetting to scale predictors. Variables on larger scales dominate the Euclidean distance. Standardize with scale() first, or use preProcess = c("center", "scale") inside train().
Pitfall 3: choosing k by training accuracy. k = 1 always achieves perfect training accuracy because each row is its closest neighbor. Score on a held-out partition, never on training rows.
Pitfall 4: imbalanced classes. Majority voting biases toward whichever class fills the neighborhood. Rebalance with caret::upSample(), or use distance-weighted kknn::kknn().
predict() on knn3 does not accept a vector. Pass a data frame or matrix with the same column names and structure as the training data. A single new observation must be wrapped as a one-row data frame, not a numeric vector.Try it yourself
Try it: Fit a knn3 classifier on iris with k = 7, predict the class of the first row of the training data, and check the predicted probabilities. Save the predictions to ex_class and ex_prob.
Click to reveal solution
Explanation: type = "class" returns the majority vote; type = "prob" returns the vote fractions. For the first iris row (a clear setosa), all 7 neighbors are setosa.
Related caret functions
These complete a typical k-NN workflow:
knnreg(): regression counterpart for numeric outcomestrain()withmethod = "knn": resampled, cross-validated k-NNcreateDataPartition(): stratified train/test split before fittingconfusionMatrix(): per-class metrics for predictionspreProcess(): center, scale, or impute predictors before distances
FAQ
What is the difference between knn3() and knn() in R?
class::knn() is a one-shot base R function that takes training data, test data, and labels, and returns predictions in a single call. knn3() returns a model object you can save and reuse; call predict() on it later with new data. knn3() also exposes type = "prob" for class probabilities, which base knn() does not. Pick knn3() when you want to score multiple test sets without rebuilding the training structure.
How do I choose the best k for knn3?
Cross-validate over a grid of candidate values via train(Species ~ ., method = "knn", tuneGrid = data.frame(k = c(3, 5, 7, 9, 11)), trControl = trainControl(method = "cv", number = 10)). The bestTune slot stores the winning k. Start near the square root of the training-set size and search 5 values around it.
Does knn3() return probabilities?
Yes. Call predict(fit, newdata = ..., type = "prob") to get a numeric matrix with one column per class. Each row sums to 1. The probability is the fraction of the k neighbors in that class, so with k = 5 the possible values are 0, 0.2, 0.4, 0.6, 0.8, and 1.0.
Can knn3() handle missing values?
Not directly. By default it errors on NA rows. Either pre-impute with caret::preProcess(..., method = "knnImpute") or drop incomplete rows with na.action = na.omit. Imputing is safer when missingness is not at random; dropping works for a quick exploratory fit on otherwise clean data.
Is knn3() suitable for large datasets?
Not really. k-NN computes distances from every test row to every training row at prediction time, so cost scales with n_train * n_test * p. For training sets above 100,000 rows, use approximate-neighbor packages like FNN or RANN, or switch to a parametric model. knn3() is best for teaching, prototyping, and moderate-sized tabular problems.