R for Healthcare Exercises: 20 Practice Problems

Twenty practice problems for healthcare data analysis in R: survival, odds ratios, longitudinal data, risk scores, prevalence. Hidden solutions.

RRun this once before any exercise
library(dplyr) library(survival) library(broom)

  

Exercise 1: Prevalence

Difficulty: Beginner.

Show solution
RInteractive R
cases <- 250; population <- 10000 cases / population

  

Exercise 2: Incidence rate

Difficulty: Intermediate.

Show solution
RInteractive R
new_cases <- 30; person_years <- 5000 new_cases / person_years

  

Exercise 3: Sensitivity and specificity

Difficulty: Intermediate.

Show solution
RInteractive R
TP <- 80; FN <- 20; TN <- 850; FP <- 50 list(sens = TP/(TP+FN), spec = TN/(TN+FP))

  

Exercise 4: Positive predictive value

Difficulty: Intermediate.

Show solution
RInteractive R
TP <- 80; FP <- 50 TP / (TP + FP)

  

Exercise 5: Relative risk

Difficulty: Intermediate.

Show solution
RInteractive R
# 2x2: exposed/disease m <- matrix(c(40, 60, 20, 80), 2, 2) risk_exp <- m[1,1] / sum(m[1,]) risk_un <- m[2,1] / sum(m[2,]) risk_exp / risk_un

  

Exercise 6: Odds ratio

Difficulty: Intermediate.

Show solution
RInteractive R
m <- matrix(c(40, 60, 20, 80), 2, 2) (m[1,1] * m[2,2]) / (m[1,2] * m[2,1])

  

Exercise 7: Kaplan-Meier survival

Difficulty: Advanced.

Show solution
RInteractive R
fit <- survfit(Surv(time, status) ~ 1, data = lung) plot(fit)

  

Exercise 8: KM by group

Difficulty: Advanced.

Show solution
RInteractive R
fit <- survfit(Surv(time, status) ~ sex, data = lung) plot(fit, col = 1:2)

  

Exercise 9: Log-rank test

Difficulty: Advanced.

Show solution
RInteractive R
survdiff(Surv(time, status) ~ sex, data = lung)

  

Exercise 10: Cox PH model

Difficulty: Advanced.

Show solution
RInteractive R
fit <- coxph(Surv(time, status) ~ age + sex, data = lung) summary(fit)

  

Exercise 11: Hazard ratio from Cox

Difficulty: Advanced.

Show solution
RInteractive R
fit <- coxph(Surv(time, status) ~ age + sex, data = lung) exp(coef(fit))

  

Exercise 12: Median survival time

Difficulty: Advanced.

Show solution
RInteractive R
fit <- survfit(Surv(time, status) ~ 1, data = lung) summary(fit)$table["median"]

  

Exercise 13: 5-year survival rate

Difficulty: Advanced.

Show solution
RInteractive R
fit <- survfit(Surv(time, status) ~ 1, data = lung) summary(fit, times = 365*5)$surv

  

Exercise 14: Logistic regression for disease

Difficulty: Intermediate.

Show solution
RInteractive R
df <- tibble(age = sample(20:80, 200, replace = TRUE), smoke = sample(0:1, 200, replace = TRUE), disease = rbinom(200, 1, 0.2)) fit <- glm(disease ~ age + smoke, data = df, family = binomial) summary(fit)

  

Exercise 15: BMI calculation

Difficulty: Beginner.

Show solution
RInteractive R
weight_kg <- 70; height_m <- 1.75 weight_kg / height_m^2

  

Exercise 16: BMI category

Difficulty: Intermediate.

Show solution
RInteractive R
bmi <- c(17, 22, 28, 32) cut(bmi, breaks = c(-Inf, 18.5, 25, 30, Inf), labels = c("under","normal","over","obese"))

  

Exercise 17: Repeated measures (longitudinal mean)

Difficulty: Advanced.

Show solution
RInteractive R
df <- tibble(id = rep(1:5, 3), time = rep(1:3, each = 5), bp = c(120,130,125,140,135, 118,128,122,138,132, 116,126,120,136,130)) df |> group_by(id) |> summarise(mean_bp = mean(bp))

  

Exercise 18: Standardize lab values to z-scores

Difficulty: Intermediate.

Show solution
RInteractive R
lab <- c(120, 135, 110, 140, 125) (lab - mean(lab)) / sd(lab)

  

Exercise 19: Detect outlier vital signs

Difficulty: Intermediate.

Show solution
RInteractive R
hr <- c(70, 72, 75, 68, 130, 71, 30) hr[hr < 50 | hr > 120]

  

Exercise 20: Compute days between visits

Difficulty: Intermediate.

Show solution
RInteractive R
visits <- as.Date(c("2024-01-15","2024-02-20","2024-04-05")) diff(visits)

  

What to do next

  • R-for-Biostatistics-Exercises (coming), statistical methods deep.
  • Linear-Regression-Exercises (shipped), risk modeling.