R for Biostatistics Exercises: 20 Practice Problems
Twenty practice problems for biostatistics in R: clinical trials, hazard ratios, paired tests, dose-response, mixed-effects models. Hidden solutions.
By Selva Prabhakaran · Published May 11, 2026 · Last updated May 11, 2026
library(dplyr)
library(survival)
library(broom)
library(lme4)
library(pwr)
Exercise 1: 2x2 table odds ratio CI
Difficulty: Intermediate.
Show solution
m <- matrix(c(40, 60, 20, 80), 2, 2)
fisher.test(m)$conf.int
Exercise 2: Paired comparison
Difficulty: Intermediate. Pre vs post.
Show solution
pre <- c(120, 130, 125, 140, 135)
post <- c(115, 125, 120, 132, 128)
t.test(pre, post, paired = TRUE)
Exercise 3: Wilcoxon signed-rank
Difficulty: Intermediate.
Show solution
pre <- c(120, 130, 125, 140, 135)
post <- c(115, 125, 120, 132, 128)
wilcox.test(pre, post, paired = TRUE)
Exercise 4: One-way ANOVA for dose groups
Difficulty: Intermediate.
Show solution
df <- tibble(dose = factor(rep(c("low","mid","high"), each = 8)),
outcome = c(rnorm(8, 10), rnorm(8, 12), rnorm(8, 14)))
summary(aov(outcome ~ dose, data = df))
Exercise 5: Tukey post hoc
Difficulty: Advanced.
Show solution
df <- tibble(dose = factor(rep(c("low","mid","high"), each = 8)),
outcome = c(rnorm(8, 10), rnorm(8, 12), rnorm(8, 14)))
TukeyHSD(aov(outcome ~ dose, data = df))
Exercise 6: Mixed-effects model (lme4)
Difficulty: Advanced.
Show solution
df <- tibble(id = rep(1:10, each = 3),
time = rep(1:3, 10),
y = rnorm(30) + rep(rnorm(10), each = 3))
lme4::lmer(y ~ time + (1 | id), data = df)
Exercise 7: Cox PH
Difficulty: Advanced.
Show solution
coxph(Surv(time, status) ~ age + sex, data = lung)
Exercise 8: Plot survival curves
Difficulty: Advanced.
Show solution
fit <- survfit(Surv(time, status) ~ sex, data = lung)
plot(fit, col = c("blue","red"))
Exercise 9: Stratified Cox
Difficulty: Advanced.
Show solution
coxph(Surv(time, status) ~ age + strata(sex), data = lung)
Exercise 10: Schoenfeld residuals (PH assumption)
Difficulty: Advanced.
Show solution
fit <- coxph(Surv(time, status) ~ age + sex, data = lung)
cox.zph(fit)
Exercise 11: Logistic regression
Difficulty: Intermediate.
Show solution
df <- tibble(age = sample(40:80, 200, replace = TRUE),
disease = rbinom(200, 1, 0.3))
glm(disease ~ age, data = df, family = binomial)
Exercise 12: Odds ratio CI from glm
Difficulty: Intermediate.
Show solution
df <- tibble(age = sample(40:80, 200, replace = TRUE),
disease = rbinom(200, 1, 0.3))
fit <- glm(disease ~ age, data = df, family = binomial)
exp(confint(fit))
Exercise 13: Confounder adjustment
Difficulty: Advanced.
Show solution
df <- tibble(exposure = rbinom(200, 1, 0.5),
age = sample(40:80, 200, replace = TRUE),
outcome = rbinom(200, 1, 0.3))
glm(outcome ~ exposure + age, data = df, family = binomial)
Exercise 14: ROC curve
Difficulty: Intermediate.
Show solution
df <- tibble(score = rnorm(200), outcome = rbinom(200, 1, 0.4))
pROC::roc(df$outcome, df$score) |> pROC::auc()
Exercise 15: Sample size for two proportions
Difficulty: Advanced.
Show solution
pwr::pwr.2p.test(h = pwr::ES.h(0.5, 0.4), power = 0.8, sig.level = 0.05)$n
Exercise 16: Power for t-test
Difficulty: Intermediate.
Show solution
pwr::pwr.t.test(d = 0.5, n = 30, sig.level = 0.05)$power
Exercise 17: Adjust p-values (BH)
Difficulty: Intermediate.
Show solution
p <- c(0.01, 0.04, 0.03, 0.20, 0.001)
p.adjust(p, method = "BH")
Exercise 18: Bootstrap median CI
Difficulty: Advanced.
Show solution
set.seed(1)
x <- rnorm(50, 100, 15)
b <- replicate(2000, median(sample(x, replace = TRUE)))
quantile(b, c(0.025, 0.975))
Exercise 19: Number needed to treat
Difficulty: Advanced.
Show solution
risk_control <- 0.30; risk_treat <- 0.20
1 / (risk_control - risk_treat)
Exercise 20: Standardize dose
Difficulty: Beginner.
Show solution
dose <- c(5, 10, 20, 40)
scale(dose)[,1]
What to do next
- R-for-Healthcare-Exercises (shipped), clinical analysis.
- Hypothesis-Testing-Exercises (shipped), broader inference.