tidymodels Exercises in R: 25 Practice Problems

Twenty-five practice problems on the tidymodels stack: rsample, recipes, parsnip, workflows, tune, yardstick. Hidden solutions.

RRun this once before any exercise
library(tidymodels) library(dplyr) library(yardstick)

  

Exercise 1: initial_split

Difficulty: Beginner.

Show solution
RInteractive R
set.seed(1) split <- initial_split(mtcars, prop = 0.7) list(train = nrow(training(split)), test = nrow(testing(split)))

  

Exercise 2: vfold_cv

Difficulty: Beginner.

Show solution
RInteractive R
set.seed(1) vfold_cv(mtcars, v = 5)

  

Exercise 3: recipe

Difficulty: Intermediate.

Show solution
RInteractive R
recipe(mpg ~ ., data = mtcars)

  

Exercise 4: step_normalize

Difficulty: Intermediate.

Show solution
RInteractive R
recipe(mpg ~ ., data = mtcars) |> step_normalize(all_numeric_predictors())

  

Exercise 5: prep + bake

Difficulty: Intermediate.

Show solution
RInteractive R
rec <- recipe(mpg ~ ., data = mtcars) |> step_normalize(all_numeric_predictors()) prep(rec) |> bake(new_data = mtcars) |> head()

  

Exercise 6: linear_reg model

Difficulty: Beginner.

Show solution
RInteractive R
linear_reg() |> set_engine("lm")

  

Exercise 7: rand_forest model

Difficulty: Intermediate.

Show solution
RInteractive R
rand_forest(trees = 100) |> set_mode("regression") |> set_engine("ranger")

  

Exercise 8: boost_tree model

Difficulty: Intermediate.

Show solution
RInteractive R
boost_tree(trees = 100) |> set_mode("regression") |> set_engine("xgboost")

  

Exercise 9: workflow

Difficulty: Intermediate.

Show solution
RInteractive R
wf <- workflow() |> add_recipe(recipe(mpg ~ ., data = mtcars)) |> add_model(linear_reg() |> set_engine("lm")) wf

  

Exercise 10: fit workflow

Difficulty: Intermediate.

Show solution
RInteractive R
wf <- workflow() |> add_recipe(recipe(mpg ~ ., data = mtcars)) |> add_model(linear_reg() |> set_engine("lm")) fit(wf, mtcars)

  

Exercise 11: Predict from workflow

Difficulty: Intermediate.

Show solution
RInteractive R
wf <- workflow() |> add_recipe(recipe(mpg ~ ., data = mtcars)) |> add_model(linear_reg() |> set_engine("lm")) fitted <- fit(wf, mtcars) predict(fitted, new_data = mtcars[1:3, ])

  

Exercise 12: fit_resamples

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) folds <- vfold_cv(mtcars, v = 5) wf <- workflow() |> add_recipe(recipe(mpg ~ ., data = mtcars)) |> add_model(linear_reg() |> set_engine("lm")) fit_resamples(wf, folds, metrics = metric_set(rmse, rsq))

  

Exercise 13: collect_metrics

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) folds <- vfold_cv(mtcars, v = 5) wf <- workflow() |> add_formula(mpg ~ .) |> add_model(linear_reg() |> set_engine("lm")) fit_resamples(wf, folds) |> collect_metrics()

  

Exercise 14: Tune hyperparameter

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) folds <- vfold_cv(mtcars, v = 5) rf <- rand_forest(mtry = tune(), trees = 100) |> set_mode("regression") |> set_engine("ranger") wf <- workflow() |> add_formula(mpg ~ .) |> add_model(rf) grid <- expand.grid(mtry = c(2, 4, 6)) tune_grid(wf, folds, grid = grid) |> collect_metrics()

  

Exercise 15: yardstick metrics

Difficulty: Intermediate.

Show solution
RInteractive R
truth <- c(1, 2, 3, 4); pred <- c(1.1, 1.9, 3.2, 3.8) data.frame(truth, pred) |> yardstick::rmse(truth, pred)

  

Exercise 16: Classification: logistic

Difficulty: Intermediate.

Show solution
RInteractive R
binary <- iris |> dplyr::filter(Species != "setosa") |> dplyr::mutate(Species = droplevels(Species)) mod <- logistic_reg() |> set_engine("glm") fit(workflow() |> add_formula(Species ~ Sepal.Length) |> add_model(mod), binary)

  

Exercise 17: step_dummy

Difficulty: Intermediate.

Show solution
RInteractive R
recipe(mpg ~ ., data = mtcars |> dplyr::mutate(cyl = factor(cyl))) |> step_dummy(all_nominal_predictors())

  

Exercise 18: step_corr (remove correlated)

Difficulty: Advanced.

Show solution
RInteractive R
recipe(mpg ~ ., data = mtcars) |> step_corr(all_numeric_predictors(), threshold = 0.9)

  

Exercise 19: step_pca

Difficulty: Advanced.

Show solution
RInteractive R
recipe(mpg ~ ., data = mtcars) |> step_normalize(all_numeric_predictors()) |> step_pca(all_numeric_predictors(), num_comp = 3)

  

Exercise 20: last_fit

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) split <- initial_split(mtcars, prop = 0.7) wf <- workflow() |> add_formula(mpg ~ .) |> add_model(linear_reg() |> set_engine("lm")) last_fit(wf, split) |> collect_metrics()

  

Exercise 21: select_best after tuning

Difficulty: Advanced.

Show solution
RInteractive R
# After tune_grid result `res`: # best <- select_best(res, "rmse") # finalize_workflow(wf, best)

  

Exercise 22: workflow_set for many models

Difficulty: Advanced.

Show solution
RInteractive R
ws <- workflow_set( preproc = list(rec = recipe(mpg ~ ., data = mtcars)), models = list(lm = linear_reg() |> set_engine("lm"), rf = rand_forest() |> set_mode("regression") |> set_engine("ranger")) )

  

Exercise 23: parsnip translate

Difficulty: Advanced.

Show solution
RInteractive R
linear_reg() |> set_engine("lm") |> translate()

  

Exercise 24: tidy a fit

Difficulty: Intermediate.

Show solution
RInteractive R
wf <- workflow() |> add_formula(mpg ~ .) |> add_model(linear_reg() |> set_engine("lm")) fit(wf, mtcars) |> extract_fit_parsnip() |> tidy()

  

Exercise 25: Save model object

Difficulty: Intermediate.

Show solution
RInteractive R
wf <- workflow() |> add_formula(mpg ~ .) |> add_model(linear_reg() |> set_engine("lm")) saveRDS(fit(wf, mtcars), "wf.rds")

  

What to do next

  • Machine-Learning-Exercises (shipped), broader ML.
  • Cross-Validation-Exercises (shipped), CV deep dive.