R for Biostatistics Exercises: 20 Practice Problems

Twenty practice problems for biostatistics in R: clinical trials, hazard ratios, paired tests, dose-response, mixed-effects models. Hidden solutions.

RRun this once before any exercise
library(dplyr) library(survival) library(broom) library(lme4) library(pwr)

  

Exercise 1: 2x2 table odds ratio CI

Difficulty: Intermediate.

Show solution
RInteractive R
m <- matrix(c(40, 60, 20, 80), 2, 2) fisher.test(m)$conf.int

  

Exercise 2: Paired comparison

Difficulty: Intermediate. Pre vs post.

Show solution
RInteractive R
pre <- c(120, 130, 125, 140, 135) post <- c(115, 125, 120, 132, 128) t.test(pre, post, paired = TRUE)

  

Exercise 3: Wilcoxon signed-rank

Difficulty: Intermediate.

Show solution
RInteractive R
pre <- c(120, 130, 125, 140, 135) post <- c(115, 125, 120, 132, 128) wilcox.test(pre, post, paired = TRUE)

  

Exercise 4: One-way ANOVA for dose groups

Difficulty: Intermediate.

Show solution
RInteractive R
df <- tibble(dose = factor(rep(c("low","mid","high"), each = 8)), outcome = c(rnorm(8, 10), rnorm(8, 12), rnorm(8, 14))) summary(aov(outcome ~ dose, data = df))

  

Exercise 5: Tukey post hoc

Difficulty: Advanced.

Show solution
RInteractive R
df <- tibble(dose = factor(rep(c("low","mid","high"), each = 8)), outcome = c(rnorm(8, 10), rnorm(8, 12), rnorm(8, 14))) TukeyHSD(aov(outcome ~ dose, data = df))

  

Exercise 6: Mixed-effects model (lme4)

Difficulty: Advanced.

Show solution
RInteractive R
df <- tibble(id = rep(1:10, each = 3), time = rep(1:3, 10), y = rnorm(30) + rep(rnorm(10), each = 3)) lme4::lmer(y ~ time + (1 | id), data = df)

  

Exercise 7: Cox PH

Difficulty: Advanced.

Show solution
RInteractive R
coxph(Surv(time, status) ~ age + sex, data = lung)

  

Exercise 8: Plot survival curves

Difficulty: Advanced.

Show solution
RInteractive R
fit <- survfit(Surv(time, status) ~ sex, data = lung) plot(fit, col = c("blue","red"))

  

Exercise 9: Stratified Cox

Difficulty: Advanced.

Show solution
RInteractive R
coxph(Surv(time, status) ~ age + strata(sex), data = lung)

  

Exercise 10: Schoenfeld residuals (PH assumption)

Difficulty: Advanced.

Show solution
RInteractive R
fit <- coxph(Surv(time, status) ~ age + sex, data = lung) cox.zph(fit)

  

Exercise 11: Logistic regression

Difficulty: Intermediate.

Show solution
RInteractive R
df <- tibble(age = sample(40:80, 200, replace = TRUE), disease = rbinom(200, 1, 0.3)) glm(disease ~ age, data = df, family = binomial)

  

Exercise 12: Odds ratio CI from glm

Difficulty: Intermediate.

Show solution
RInteractive R
df <- tibble(age = sample(40:80, 200, replace = TRUE), disease = rbinom(200, 1, 0.3)) fit <- glm(disease ~ age, data = df, family = binomial) exp(confint(fit))

  

Exercise 13: Confounder adjustment

Difficulty: Advanced.

Show solution
RInteractive R
df <- tibble(exposure = rbinom(200, 1, 0.5), age = sample(40:80, 200, replace = TRUE), outcome = rbinom(200, 1, 0.3)) glm(outcome ~ exposure + age, data = df, family = binomial)

  

Exercise 14: ROC curve

Difficulty: Intermediate.

Show solution
RInteractive R
df <- tibble(score = rnorm(200), outcome = rbinom(200, 1, 0.4)) pROC::roc(df$outcome, df$score) |> pROC::auc()

  

Exercise 15: Sample size for two proportions

Difficulty: Advanced.

Show solution
RInteractive R
pwr::pwr.2p.test(h = pwr::ES.h(0.5, 0.4), power = 0.8, sig.level = 0.05)$n

  

Exercise 16: Power for t-test

Difficulty: Intermediate.

Show solution
RInteractive R
pwr::pwr.t.test(d = 0.5, n = 30, sig.level = 0.05)$power

  

Exercise 17: Adjust p-values (BH)

Difficulty: Intermediate.

Show solution
RInteractive R
p <- c(0.01, 0.04, 0.03, 0.20, 0.001) p.adjust(p, method = "BH")

  

Exercise 18: Bootstrap median CI

Difficulty: Advanced.

Show solution
RInteractive R
set.seed(1) x <- rnorm(50, 100, 15) b <- replicate(2000, median(sample(x, replace = TRUE))) quantile(b, c(0.025, 0.975))

  

Exercise 19: Number needed to treat

Difficulty: Advanced.

Show solution
RInteractive R
risk_control <- 0.30; risk_treat <- 0.20 1 / (risk_control - risk_treat)

  

Exercise 20: Standardize dose

Difficulty: Beginner.

Show solution
RInteractive R
dose <- c(5, 10, 20, 40) scale(dose)[,1]

  

What to do next

  • R-for-Healthcare-Exercises (shipped), clinical analysis.
  • Hypothesis-Testing-Exercises (shipped), broader inference.