Random Forest Exercises in R: 20 Practice Problems

Twenty practice problems on Random Forest in R: classification, regression, tuning, variable importance, ranger. Hidden solutions.

RRun this once before any exercise
library(randomForest) library(ranger) library(caret) library(dplyr)

  

Exercise 1: Classification RF on iris

Difficulty: Beginner.

Show solution
RInteractive R
set.seed(1) randomForest(Species ~ ., data = iris)

  

Exercise 2: Regression RF on mtcars

Difficulty: Beginner.

Show solution
RInteractive R
set.seed(1) randomForest(mpg ~ ., data = mtcars)

  

Exercise 3: Specify number of trees

Difficulty: Beginner.

Show solution
RInteractive R
set.seed(1) randomForest(Species ~ ., data = iris, ntree = 100)

  

Exercise 4: Specify mtry

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1) randomForest(Species ~ ., data = iris, mtry = 2)

  

Exercise 5: OOB error

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1) fit <- randomForest(Species ~ ., data = iris) fit$err.rate[nrow(fit$err.rate), 1]

  

Exercise 6: Variable importance

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1) fit <- randomForest(Species ~ ., data = iris, importance = TRUE) importance(fit)

  

Exercise 7: Variable importance plot

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1) fit <- randomForest(Species ~ ., data = iris) varImpPlot(fit)

  

Exercise 8: Predict probabilities

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1) fit <- randomForest(Species ~ ., data = iris) head(predict(fit, iris, type = "prob"))

  

Exercise 9: Confusion matrix

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1) fit <- randomForest(Species ~ ., data = iris) fit$confusion

  

Exercise 10: ranger basic

Difficulty: Intermediate. Faster than randomForest.

Show solution
RInteractive R
set.seed(1) ranger(Species ~ ., data = iris)

  

Exercise 11: ranger importance

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1) fit <- ranger(Species ~ ., data = iris, importance = "permutation") importance(fit)

  

Exercise 12: Tune mtry with caret

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) train(Species ~ ., data = iris, method = "rf", tuneGrid = expand.grid(mtry = c(1, 2, 3, 4)), trControl = trainControl(method = "cv", number = 5))

  

Exercise 13: Cross-validated RF RMSE

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) train(mpg ~ ., data = mtcars, method = "rf", trControl = trainControl(method = "cv", number = 5))

  

Exercise 14: Train-test split + RMSE

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1) idx <- sample(seq_len(nrow(mtcars)), 22) tr <- mtcars[idx, ]; te <- mtcars[-idx, ] fit <- randomForest(mpg ~ ., data = tr) sqrt(mean((te$mpg - predict(fit, te))^2))

  

Exercise 15: Class weights for imbalance

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) randomForest(Species ~ ., data = iris, classwt = c(1, 1, 2))

  

Exercise 16: Stratified sampling

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) randomForest(Species ~ ., data = iris, sampsize = c(20, 20, 20), strata = iris$Species)

  

Exercise 17: Partial dependence

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) fit <- randomForest(mpg ~ ., data = mtcars) partialPlot(fit, mtcars, "wt")

  

Exercise 18: Predict on new data

Difficulty: Beginner.

Show solution
RInteractive R
set.seed(1) fit <- randomForest(mpg ~ ., data = mtcars) predict(fit, mtcars[1:3, ])

  

Exercise 19: nodesize parameter

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1) randomForest(Species ~ ., data = iris, nodesize = 5)

  

Exercise 20: Compare to logistic regression

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) binary <- iris |> dplyr::mutate(y = as.integer(Species == "virginica")) fit_glm <- glm(y ~ Sepal.Length + Petal.Length, data = binary, family = binomial) fit_rf <- randomForest(factor(y) ~ Sepal.Length + Petal.Length, data = binary) list(glm_acc = mean(round(predict(fit_glm, type = "response")) == binary$y), rf_acc = mean(predict(fit_rf) == factor(binary$y)))

  

What to do next

  • XGBoost-Exercises (coming), gradient boosting alternative.
  • Machine-Learning-Exercises (shipped), broader ML drills.