Home › Clustering Exercises in R: 20 Practice Problems
Clustering Exercises in R: 20 Practice Problems
Twenty practice problems on clustering in R : k-means, hierarchical, DBSCAN, silhouette, elbow method, visualization. Hidden solutions.
By Selva Prabhakaran · Published May 11, 2026 · Last updated May 11, 2026
library (dplyr)
library (ggplot2)
library (cluster)
library (dbscan)
library (mclust)
▶ Run
↺ Reset
Exercise 1: k-means basic
Difficulty: Beginner.
Show solution
set.seed (1 )
km <- kmeans (iris[, 1 : 4 ], centers = 3 )
table (km$ cluster, iris$ Species)
▶ Run
↺ Reset
Exercise 2: Plot k-means clusters
Difficulty: Intermediate.
Show solution
set.seed (1 )
km <- kmeans (iris[, 1 : 4 ], centers = 3 )
iris$ cluster <- factor (km$ cluster)
ggplot (iris, aes (Sepal.Length, Petal.Length, color = cluster)) + geom_point ()
▶ Run
↺ Reset
Exercise 3: nstart parameter
Difficulty: Intermediate.
Show solution
set.seed (1 )
kmeans (iris[, 1 : 4 ], centers = 3 , nstart = 25 )
▶ Run
↺ Reset
Exercise 4: Scale before k-means
Difficulty: Intermediate.
Show solution
set.seed (1 )
kmeans (scale (iris[, 1 : 4 ]), centers = 3 )
▶ Run
↺ Reset
Exercise 5: Elbow method
Difficulty: Advanced.
Show solution
set.seed (1 )
wss <- sapply (1 : 10 , function (k) kmeans (scale (iris[,1 : 4 ]), centers = k, nstart = 10 )$ tot.withinss)
plot (1 : 10 , wss, type = "b" )
▶ Run
↺ Reset
Exercise 6: Silhouette
Difficulty: Advanced.
Show solution
set.seed (1 )
km <- kmeans (scale (iris[,1 : 4 ]), centers = 3 )
sil <- silhouette (km$ cluster, dist (scale (iris[,1 : 4 ])))
mean (sil[, 3 ])
▶ Run
↺ Reset
Exercise 7: Hierarchical clustering
Difficulty: Intermediate.
Show solution
d <- dist (scale (iris[, 1 : 4 ]))
hc <- hclust (d, method = "complete" )
cutree (hc, k = 3 ) |> table ()
▶ Run
↺ Reset
Exercise 8: Plot dendrogram
Difficulty: Intermediate.
Show solution
hc <- hclust (dist (scale (iris[, 1 : 4 ])))
plot (hc)
▶ Run
↺ Reset
Exercise 9: Different linkage methods
Difficulty: Advanced.
Show solution
d <- dist (scale (iris[,1 : 4 ]))
list (complete = hclust (d, method = "complete" ),
ward = hclust (d, method = "ward.D2" ),
single = hclust (d, method = "single" ))
▶ Run
↺ Reset
Exercise 10: DBSCAN
Difficulty: Advanced.
Show solution
set.seed (1 )
dbscan:: dbscan (scale (iris[, 1 : 4 ]), eps = 0.5 , minPts = 5 )
▶ Run
↺ Reset
Exercise 11: Compare k-means vs hierarchical
Difficulty: Advanced.
Show solution
set.seed (1 )
km <- kmeans (scale (iris[,1 : 4 ]), 3 )
hc <- cutree (hclust (dist (scale (iris[,1 : 4 ]))), 3 )
table (km$ cluster, hc)
▶ Run
↺ Reset
Exercise 12: Cluster centroids
Difficulty: Beginner.
Show solution
set.seed (1 )
km <- kmeans (iris[, 1 : 4 ], 3 )
km$ centers
▶ Run
↺ Reset
Exercise 13: Within-cluster sum of squares
Difficulty: Beginner.
Show solution
set.seed (1 )
kmeans (iris[, 1 : 4 ], 3 )$ tot.withinss
▶ Run
↺ Reset
Exercise 14: PAM (k-medoids)
Difficulty: Advanced.
Show solution
set.seed (1 )
pam_fit <- cluster:: pam (scale (iris[,1 : 4 ]), k = 3 )
pam_fit$ clusinfo
▶ Run
↺ Reset
Exercise 15: Gap statistic
Difficulty: Advanced.
Show solution
set.seed (1 )
gap <- cluster:: clusGap (scale (iris[,1 : 4 ]), FUN = kmeans, K.max = 8 , B = 50 )
plot (gap)
▶ Run
↺ Reset
Exercise 16: Visualize hierarchical clusters via cuts
Difficulty: Intermediate.
Show solution
hc <- hclust (dist (scale (iris[,1 : 4 ])))
iris$ cluster <- factor (cutree (hc, k = 3 ))
ggplot (iris, aes (Sepal.Length, Petal.Length, color = cluster)) + geom_point ()
▶ Run
↺ Reset
Exercise 17: Predict new point to nearest centroid
Difficulty: Advanced.
Show solution
set.seed (1 )
km <- kmeans (scale (iris[,1 : 4 ]), 3 )
new_point <- scale (iris[1 , 1 : 4 ])
which.min (sqrt (rowSums ((km$ centers - matrix (rep (new_point, 3 ), nrow = 3 ))^ 2 )))
▶ Run
↺ Reset
Exercise 18: External validation with ARI
Difficulty: Advanced.
Show solution
set.seed (1 )
km <- kmeans (scale (iris[,1 : 4 ]), 3 )
mclust:: adjustedRandIndex (km$ cluster, as.integer (iris$ Species))
▶ Run
↺ Reset
Exercise 19: Initialise k-means with k-means++
Difficulty: Advanced.
Show solution
set.seed (1 )
LICORS:: kmeanspp (scale (iris[,1 : 4 ]), k = 3 )$ cluster |> table (iris$ Species)
▶ Run
↺ Reset
Exercise 20: Mini-batch k-means demo (concept)
Difficulty: Advanced.
Show solution
# ClusterR::MiniBatchKmeans for very large data
# m <- ClusterR::MiniBatchKmeans(scale(iris[,1:4]), clusters = 3, batch_size = 30)
# Result: faster than kmeans on millions of rows
▶ Run
↺ Reset
What to do next
PCA-Exercises (shipped), dimension reduction before clustering.
Machine-Learning-Exercises (shipped), broader ML drills.