Cross Validation Exercises in R: 20 Practice Problems

Twenty practice problems on CV: k-fold, LOOCV, repeated, stratified, time-series, with caret and rsample. Hidden solutions.

RRun this once before any exercise
library(caret) library(rsample) library(dplyr)

  

Exercise 1: Manual 5-fold CV indices

Difficulty: Beginner.

Show solution
RInteractive R
set.seed(1) folds <- sample(rep(1:5, length.out = nrow(mtcars))) table(folds)

  

Exercise 2: Manual k-fold loop

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1); folds <- sample(rep(1:5, length.out = nrow(mtcars))) sapply(1:5, function(i) { tr <- mtcars[folds != i, ]; te <- mtcars[folds == i, ] fit <- lm(mpg ~ wt, data = tr) sqrt(mean((te$mpg - predict(fit, te))^2)) }) |> mean()

  

Exercise 3: caret CV control

Difficulty: Intermediate.

Show solution
RInteractive R
ctrl <- trainControl(method = "cv", number = 5) train(mpg ~ ., data = mtcars, method = "lm", trControl = ctrl)

  

Exercise 4: caret repeated CV

Difficulty: Intermediate.

Show solution
RInteractive R
ctrl <- trainControl(method = "repeatedcv", number = 5, repeats = 3) train(mpg ~ ., data = mtcars, method = "lm", trControl = ctrl)

  

Exercise 5: LOOCV

Difficulty: Intermediate.

Show solution
RInteractive R
ctrl <- trainControl(method = "LOOCV") train(mpg ~ wt, data = mtcars, method = "lm", trControl = ctrl)

  

Exercise 6: rsample vfold_cv

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1); vfold_cv(mtcars, v = 5)

  

Exercise 7: rsample bootstrap

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1); bootstraps(mtcars, times = 25)

  

Exercise 8: Stratified CV

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1); vfold_cv(iris, v = 5, strata = Species)

  

Exercise 9: Time-series CV (caret)

Difficulty: Advanced.

Show solution
RInteractive R
ctrl <- trainControl(method = "timeslice", initialWindow = 20, horizon = 5, fixedWindow = TRUE) ctrl

  

Exercise 10: rolling_origin (rsample)

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) rolling_origin(mtcars, initial = 25, assess = 5, cumulative = TRUE)

  

Exercise 11: Compare two models with caret

Difficulty: Advanced.

Show solution
RInteractive R
ctrl <- trainControl(method = "cv", number = 5) m1 <- train(mpg ~ wt, data = mtcars, method = "lm", trControl = ctrl) m2 <- train(mpg ~ wt + hp, data = mtcars, method = "lm", trControl = ctrl) resamples(list(m1 = m1, m2 = m2)) |> summary()

  

Exercise 12: CV with tidymodels workflow

Difficulty: Advanced.

Show solution
RInteractive R
library(tidymodels) folds <- vfold_cv(mtcars, v = 5) fit_resamples(linear_reg() |> set_engine("lm") |> set_mode("regression"), mpg ~ wt, folds)

  

Exercise 13: Compute RMSE manually

Difficulty: Beginner.

Show solution
RInteractive R
sqrt(mean((c(1, 2, 3) - c(1.1, 2.2, 2.7))^2))

  

Exercise 14: Compute MAE

Difficulty: Beginner.

Show solution
RInteractive R
mean(abs(c(1, 2, 3) - c(1.1, 2.2, 2.7)))

  

Exercise 15: Out-of-fold predictions

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1); folds <- sample(rep(1:5, length.out = nrow(mtcars))) oof <- numeric(nrow(mtcars)) for (i in 1:5) { tr <- mtcars[folds != i, ]; te <- mtcars[folds == i, ] oof[folds == i] <- predict(lm(mpg ~ wt, data = tr), te) } sqrt(mean((mtcars$mpg - oof)^2))

  

Exercise 16: CV with feature engineering

Difficulty: Advanced.

Show solution
RInteractive R
ctrl <- trainControl(method = "cv", number = 5) train(mpg ~ wt + I(wt^2), data = mtcars, method = "lm", trControl = ctrl)

  

Exercise 17: Tuning grid with CV

Difficulty: Advanced.

Show solution
RInteractive R
ctrl <- trainControl(method = "cv", number = 5) train(mpg ~ ., data = mtcars, method = "rf", tuneGrid = expand.grid(mtry = c(2, 4, 6)), trControl = ctrl)

  

Exercise 18: Bootstrap validation

Difficulty: Advanced.

Show solution
RInteractive R
ctrl <- trainControl(method = "boot", number = 50) train(mpg ~ ., data = mtcars, method = "lm", trControl = ctrl)

  

Exercise 19: Reproducible CV with seeds

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) seeds <- vector(mode = "list", length = 6) for (i in 1:5) seeds[[i]] <- sample.int(1e4, 1) seeds[[6]] <- 1 ctrl <- trainControl(method = "cv", number = 5, seeds = seeds)

  

Exercise 20: Nested CV concept

Difficulty: Advanced. Outer CV for evaluation, inner CV for tuning.

Show solution
RInteractive R
# Conceptual outline # outer: vfold_cv(data, v = 5) # for each outer fold: tune model on inner CV, evaluate on outer test # Returns honest performance estimate when tuning is part of the model

  

What to do next

  • Machine-Learning-Exercises (shipped), CV inside ML pipelines.
  • tidymodels-Exercises (coming), modern workflow CV.