Sampling Methods Exercises in R: 20 Practice Problems

Twenty practice problems on sampling in R: simple random, stratified, cluster, bootstrap, jackknife, permutation. Solutions hidden.

RRun this once before any exercise
library(dplyr) library(caret)

  

Exercise 1: SRS without replacement

Difficulty: Beginner.

Show solution
RInteractive R
set.seed(1); sample(1:100, 10, replace = FALSE)

  

Exercise 2: SRS with replacement

Difficulty: Beginner.

Show solution
RInteractive R
set.seed(1); sample(1:100, 10, replace = TRUE)

  

Exercise 3: Sample rows of a data frame

Difficulty: Beginner.

Show solution
RInteractive R
set.seed(1); mtcars |> slice_sample(n = 5)

  

Exercise 4: Sample proportion

Difficulty: Beginner.

Show solution
RInteractive R
set.seed(1); mtcars |> slice_sample(prop = 0.2)

  

Exercise 5: Stratified sample per group

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1); iris |> slice_sample(n = 5, by = Species)

  

Exercise 6: Stratified prop sample

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1); iris |> slice_sample(prop = 0.2, by = Species)

  

Exercise 7: Weighted sample

Difficulty: Advanced. Weight by hp.

Show solution
RInteractive R
set.seed(1); mtcars |> slice_sample(n = 10, weight_by = hp)

  

Exercise 8: createDataPartition (caret)

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1) idx <- caret::createDataPartition(iris$Species, p = 0.7, list = FALSE) length(idx)

  

Exercise 9: Bootstrap CI for the mean

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1) m <- replicate(2000, mean(sample(mtcars$mpg, replace = TRUE))) quantile(m, c(0.025, 0.975))

  

Exercise 10: Bootstrap CI for the median

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1) m <- replicate(2000, median(sample(mtcars$mpg, replace = TRUE))) quantile(m, c(0.025, 0.975))

  

Exercise 11: Bootstrap with boot package

Difficulty: Advanced.

Show solution
RInteractive R
library(boot) set.seed(1) b <- boot(mtcars$mpg, function(d, i) mean(d[i]), R = 1000) boot.ci(b, type = "bca")

  

Exercise 12: Jackknife

Difficulty: Advanced.

Show solution
RInteractive R
n <- nrow(mtcars) jack <- sapply(1:n, function(i) mean(mtcars$mpg[-i])) mean(jack) # jackknife estimate

  

Exercise 13: Permutation test for two means

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) obs <- diff(by(mtcars$mpg, mtcars$am, mean)) perms <- replicate(2000, { am <- sample(mtcars$am) diff(by(mtcars$mpg, am, mean)) }) mean(abs(perms) >= abs(obs))

  

Exercise 14: Resampling for SE

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1) m <- replicate(1000, mean(sample(mtcars$mpg, replace = TRUE))) sd(m)

  

Exercise 15: Cluster sampling demo

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) clusters <- unique(mtcars$cyl) chosen <- sample(clusters, 2) mtcars |> filter(cyl %in% chosen)

  

Exercise 16: K-fold split indices

Difficulty: Intermediate.

Show solution
RInteractive R
set.seed(1) folds <- sample(rep(1:5, length.out = nrow(mtcars))) table(folds)

  

Exercise 17: Train-test 70/30

Difficulty: Beginner.

Show solution
RInteractive R
set.seed(1) idx <- sample(seq_len(nrow(mtcars)), 0.7 * nrow(mtcars)) list(train = nrow(mtcars[idx,]), test = nrow(mtcars[-idx,]))

  

Exercise 18: Repeated bootstrap

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) results <- sapply(1:5, function(seed) { set.seed(seed) mean(replicate(500, mean(sample(mtcars$mpg, replace = TRUE)))) }) results

  

Exercise 19: Systematic sample

Difficulty: Advanced.

Show solution
RInteractive R
step <- floor(nrow(mtcars) / 5) mtcars[seq(1, nrow(mtcars), by = step), ]

  

Exercise 20: Reservoir sampling concept

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) # Simple equivalent: random sample of fixed size from stream reservoir <- sample(1:100, 5) reservoir

  

What to do next

  • Cross-Validation-Exercises (coming), CV builds on sampling.
  • Hypothesis-Testing-Exercises (shipped), permutation/bootstrap tests.