Clinical Trials Design in R: Randomization, Adaptive Designs & ICC

Clinical trials stand or fall on three design decisions: how participants are randomised, whether the design adapts as data accrue, and how clustering inflates the required sample size. This tutorial implements all three in R, from permuted block schedules to O'Brien-Fleming group-sequential boundaries to design-effect adjustments driven by the intraclass correlation coefficient (ICC).

By Selva Prabhakaran · Published May 13, 2026 · Last updated May 13, 2026

How do you randomise participants to treatment arms in R?

Randomisation is the single assumption that turns a comparison into a causal claim. A predictable sequence lets recruiters steer sicker patients away from experimental arms and silently bakes confounding into the final estimate. In R, you can generate an unpredictable yet balanced schedule in three lines of base R. Let's build one for a six-patient block of a two-arm trial.

The core pattern is simple. We take the fixed allocation ratio inside a block (three A's and three B's for balance) and shuffle it with sample(). The result is a random order that still produces exact balance at every block boundary.

RPermute a balanced block of six

set.seed(2026) one_block <- sample(rep(c("A", "B"), each = 3)) one_block #> [1] "B" "A" "A" "B" "A" "B" table(one_block) #> one_block #> A B #> 3 3

Six slots, three A's and three B's, order is random. That is the heart of permuted block randomisation. Stringing many such blocks together gives a full trial schedule while guaranteeing the treatment counts never drift apart by more than half a block.

Stratified permuted block randomisation flow

Figure 1: Stratified permuted block randomisation flow, eligible participants enter a stratum, are assigned within a block, and proceed to the allocated arm.

Let's scale that up. A permuted_block() helper takes the target sample size, the arm labels, and a vector of candidate block sizes, then returns a schedule with varying blocks. Mixing block sizes prevents a recruiter from predicting the last slot of the current block, which is the hardest problem fixed-size blocks have.

RPermuted block function and 60-patient schedule

permuted_block <- function(n, arms = c("A", "B"), block_sizes = c(4, 6, 8)) { schedule <- character(0) while (length(schedule) < n) { size <- sample(block_sizes, 1) per_arm <- size / length(arms) block <- sample(rep(arms, each = per_arm)) schedule <- c(schedule, block) } schedule[seq_len(n)] } set.seed(7) schedule_60 <- permuted_block(60) table(schedule_60) #> schedule_60 #> A B #> 30 30 # Running balance check at common interim points sapply(c(10, 20, 30, 40, 50, 60), function(k) sum(schedule_60[1:k] == "A")) #> [1] 5 10 15 20 25 30

The running counts of arm A at each checkpoint hug the ideal of k/2, so imbalance never exceeds half of the largest block. That guarantee is why permuted blocks are the default in regulated trials.

Tip

Always vary block sizes. Fixed block sizes let an unblinded coordinator predict the final allocation in every block once they have seen the earlier ones. Mixing sizes like c(4, 6, 8) hides the boundary and closes that backdoor.

Key Insight

Permuted blocks give exact balance at block boundaries. Simple randomisation, which just calls sample() with replacement, can drift by several patients and ruin small subgroup analyses. The permuted block trades a sliver of randomness for guaranteed running balance.

Try it: Write ex_block_of_six(), a function that returns a single shuffled block of size 6 assigning participants to arms "Drug" and "Placebo" with 3 each. Do not rely on permuted_block().

RYour turn: block of six

ex_block_of_six <- function() { # your code here } set.seed(99) ex_block_of_six() #> Expected: a length-6 character vector with three "Drug" and three "Placebo"

Click to reveal solution

RBlock of six solution

ex_block_of_six <- function() { sample(rep(c("Drug", "Placebo"), each = 3)) } set.seed(99) ex_block_of_six() #> [1] "Placebo" "Drug" "Drug" "Placebo" "Drug" "Placebo"

Explanation: rep(..., each = 3) builds the balanced vector c("Drug","Drug","Drug","Placebo","Placebo","Placebo"), and sample() without a size argument shuffles it in place.

How do you stratify randomisation to balance key covariates?

Randomisation balances expectations across arms, not every single sample. If disease severity is a known prognostic factor, chance alone can still deliver a trial where the treatment arm is sicker by bad luck. Stratified randomisation blocks that failure mode by running a separate permuted block schedule inside each level of the prognostic variable.

The code pattern is: loop the permuted_block() function once per stratum, then stitch the results together with a stratum label. We will demonstrate a two-arm trial stratified by severity (Mild, Moderate, Severe) with 20 patients per stratum.

RStratified permuted block schedule

set.seed(2026) strata <- c("Mild", "Moderate", "Severe") stratified_schedule <- do.call(rbind, lapply(strata, function(s) { data.frame( stratum = s, arm = permuted_block(n = 20), stringsAsFactors = FALSE ) })) head(stratified_schedule, 6) #> stratum arm #> 1 Mild A #> 2 Mild B #> 3 Mild A #> 4 Mild A #> 5 Mild B #> 6 Mild B table(stratified_schedule$stratum, stratified_schedule$arm) #> #> A B #> Mild 10 10 #> Moderate 10 10 #> Severe 10 10

Every stratum ends up perfectly balanced between arms. The balance holds at interim counts too because the permuted blocks inside each stratum guarantee running balance. That is the whole point of stratification, no prognostic factor can silently confound the primary comparison.

Note

Stratify for up to three or four factors, then switch to minimisation. Each stratum needs several blocks to fill, so with six binary factors you would need at least 64 strata, and most would start empty. For high-dimensional balance, minimisation algorithms like Pocock-Simon (in the Minirand package) adaptively assign new patients to keep covariate totals close.

Try it: Extend the pattern to two-factor strata. Write ex_two_factor_schedule() that takes levels for site ("Hospital1", "Hospital2") and severity ("Mild", "Severe"), and returns a stratified schedule with 12 patients per combination (48 total).

RYour turn: two-factor stratified schedule

ex_two_factor_schedule <- function() { sites <- c("Hospital1", "Hospital2") severities <- c("Mild", "Severe") # your code here } set.seed(42) out <- ex_two_factor_schedule() table(out$site, out$severity, out$arm) #> Expected: each of the 4 cells has 6 in arm A and 6 in arm B

Click to reveal solution

RTwo-factor strata solution

ex_two_factor_schedule <- function() { sites <- c("Hospital1", "Hospital2") severities <- c("Mild", "Severe") combos <- expand.grid(site = sites, severity = severities, stringsAsFactors = FALSE) do.call(rbind, lapply(seq_len(nrow(combos)), function(i) { data.frame( site = combos$site[i], severity = combos$severity[i], arm = permuted_block(n = 12), stringsAsFactors = FALSE ) })) } set.seed(42) out <- ex_two_factor_schedule() table(out$site, out$severity, out$arm) #> , , = A #> #> Mild Severe #> Hospital1 6 6 #> Hospital2 6 6 #> #> , , = B #> #> Mild Severe #> Hospital1 6 6 #> Hospital2 6 6

Explanation: expand.grid() enumerates the stratum combinations. Running permuted_block() inside each cell guarantees arm balance per cell, which is exactly the condition stratified randomisation aims to preserve.

When is an adaptive design better than a fixed design?

A fixed design locks in the sample size, the analysis plan, and the treatment arms before the first patient is enrolled. An adaptive design pre-specifies rules that allow changes as data accrue, such as stopping early for efficacy, dropping a futile arm, or re-estimating the sample size. The key word is pre-specified. Without a rule defined in the protocol before unblinding, any change is post-hoc and invalidates the Type I error rate.

Four adaptive families cover most regulatory work:

Family	What adapts	Typical use
Group sequential	Early stopping for efficacy or futility	Confirmatory trials with long follow-up
Sample size re-estimation	Final sample size based on interim nuisance params	Effect size or variance uncertain at design
Response adaptive	Allocation ratio favours the better arm	Exploratory trials, rare diseases
Arm dropping	Futile arms closed at interim	Multi-arm dose-finding

To see why naive peeking is dangerous, simulate a trial under the null hypothesis (no true effect) and take five looks at the Z-statistic. At each look, if |Z| > 1.96, you stop and claim significance. That is the textbook mistake.

RNaive peeking inflates Type I error

sim_peeking <- function(n_per_arm = 200, K = 5, alpha_z = 1.96) { look_sizes <- round(seq(n_per_arm / K, n_per_arm, length.out = K)) x <- rnorm(n_per_arm) y <- rnorm(n_per_arm) zs <- sapply(look_sizes, function(nk) { (mean(x[1:nk]) - mean(y[1:nk])) / sqrt(2 / nk) }) any(abs(zs) > alpha_z) } set.seed(2026) type1_naive <- mean(replicate(10000, sim_peeking())) type1_naive #> [1] 0.1414

The true Type I rate is about 14%, nearly triple the nominal 5%. Every extra look adds another chance to stumble across |Z| > 1.96 purely by sampling noise. Group-sequential designs fix this by raising the bar at each look so the cumulative error stays at 5%.

Key Insight

Adaptive means pre-specified rules, never post-hoc tweaks. The regulator's bar is whether you could have pre-committed to the stopping rule, the sample-size update formula, or the arm-dropping threshold before unblinding. Adaptations decided after seeing the data are the kind of flexibility that looks helpful and silently destroys the error rate.

Try it: Modify sim_peeking() to accept the number of looks K and estimate the naive Type I rate for K = 3 and K = 10 at the unadjusted 1.96 threshold.

RYour turn: peeking rate by K

ex_peeking_rates <- function(K_values = c(3, 10), reps = 5000) { # your code here } set.seed(2026) ex_peeking_rates() #> Expected: a named numeric vector, rate grows with K

Click to reveal solution

RPeeking rate solution

ex_peeking_rates <- function(K_values = c(3, 10), reps = 5000) { setNames( sapply(K_values, function(K) mean(replicate(reps, sim_peeking(K = K)))), paste0("K=", K_values) ) } set.seed(2026) ex_peeking_rates() #> K=3 K=10 #> 0.1118 0.1928

Explanation: More looks give more chances to cross the unadjusted threshold, and the rate climbs monotonically with K. This is exactly the inflation that O'Brien-Fleming and Pocock boundaries correct.

How do you plan an O'Brien-Fleming group-sequential trial in R?

O'Brien-Fleming (OBF) boundaries solve the peeking problem by making early looks very strict and loosening gradually toward the final look. Intuitively, strong evidence is required to stop a trial in its infancy, but near the end the threshold approaches the usual 1.96. The canonical approximation for look k of K at final critical value c_K is:

$$c_k = c_K \sqrt{K/k}$$

Where:

$c_k$ is the critical value for look $k$
$c_K$ is the final critical value, calibrated so the family-wise Type I rate equals $\alpha$
$K$ is the total number of planned looks, $k$ runs from 1 to $K$

The calibrated final value $c_K$ for two-sided $\alpha = 0.05$ with five equally spaced looks is 2.040 (from standard tables). The earlier boundaries are derived from the scaling formula.

RO'Brien-Fleming boundaries for K=5

K <- 5 c_K <- 2.040 # calibrated for two-sided alpha = 0.05, K = 5 ofb_boundaries <- c_K * sqrt(K / (1:K)) round(ofb_boundaries, 3) #> [1] 4.562 3.225 2.633 2.280 2.040

Look 1 demands |Z| > 4.56, essentially impossible by noise, while look 5 settles near 2.04. The schedule concentrates stopping power at the final look but still allows an emphatic early win.

Let's verify this controls Type I error by simulating 10,000 null trials, checking the Z-statistic against the OBF boundary at each look, and counting how often any threshold is crossed.

RSimulate OBF to verify Type I control

sim_obf <- function(n_per_arm = 200, K = 5, boundaries = ofb_boundaries) { look_sizes <- round(seq(n_per_arm / K, n_per_arm, length.out = K)) x <- rnorm(n_per_arm) y <- rnorm(n_per_arm) zs <- sapply(look_sizes, function(nk) { (mean(x[1:nk]) - mean(y[1:nk])) / sqrt(2 / nk) }) any(abs(zs) > boundaries) } set.seed(2026) type1_obf <- mean(replicate(10000, sim_obf())) type1_obf #> [1] 0.0512

The empirical Type I rate is 0.051, effectively the nominal 0.05. OBF converts five peeks from a 14% disaster into a disciplined 5% design without changing how the data are collected.

Note

For regulatory submissions, use rpact or gsDesign. Both packages compute exact boundaries via the Lan-DeMets alpha-spending framework, handle unequal information fractions, and produce the sample-size tables reviewers expect. This base-R walkthrough is for building intuition, not for your IND filing.

Try it: Compute the OBF boundaries for K = 3 looks using the calibrated final critical value c_K = 1.993 (from OBF tables for two-sided alpha = 0.05, K = 3).

RYour turn: OBF for K=3

ex_obf_k3 <- function() { K_new <- 3 c_K_new <- 1.993 # your code here } ex_obf_k3() #> Expected: a length-3 numeric vector like c(3.452, 2.441, 1.993)

Click to reveal solution

ROBF K=3 solution

ex_obf_k3 <- function() { K_new <- 3 c_K_new <- 1.993 round(c_K_new * sqrt(K_new / (1:K_new)), 3) } ex_obf_k3() #> [1] 3.452 2.819 1.993

Explanation: The same scaling law applies, but with fewer looks the first boundary is less extreme. Fewer looks means less correction, closer to the fixed-sample 1.96 at the end.

How does the intraclass correlation coefficient inflate sample size for cluster-randomised trials?

Cluster-randomised trials allocate whole groups (clinics, schools, GP practices) rather than individuals. Patients treated in the same clinic share a doctor, a protocol, and unmeasured environmental factors, so their outcomes are positively correlated. That correlation reduces the effective sample size. The intraclass correlation coefficient (ICC), usually written $\rho$, measures it as the share of total variance that is between-cluster:

$$\rho = \frac{\sigma_B^2}{\sigma_B^2 + \sigma_W^2}$$

Where:

$\sigma_B^2$ is the between-cluster variance
$\sigma_W^2$ is the within-cluster variance
$\rho$ ranges from 0 (no clustering) to 1 (all variation is between clusters)

The correction that converts individual-level sample size to cluster-level sample size is the design effect:

$$DE = 1 + (m - 1)\rho$$

Where $m$ is the average cluster size. With $m = 30$ and $\rho = 0.05$, $DE = 1 + 29 \times 0.05 = 2.45$, so a cluster trial needs 2.45 times as many participants as an individually-randomised trial with the same power. Ignore this and you ship an underpowered study.

RDesign effect grid across ICC and cluster size

icc_values <- c(0.01, 0.02, 0.05, 0.10, 0.20) cluster_sizes <- c(10, 20, 30, 50, 100) de_grid <- outer(cluster_sizes, icc_values, function(m, rho) 1 + (m - 1) * rho) rownames(de_grid) <- paste0("m=", cluster_sizes) colnames(de_grid) <- paste0("icc=", icc_values) round(de_grid, 2) #> icc=0.01 icc=0.02 icc=0.05 icc=0.1 icc=0.2 #> m=10 1.09 1.18 1.45 1.90 2.80 #> m=20 1.19 1.38 1.95 2.90 4.80 #> m=30 1.29 1.58 2.45 3.90 6.80 #> m=50 1.49 1.98 3.45 5.90 10.80 #> m=100 1.99 2.98 5.95 10.90 20.80

Two patterns jump out. Design effect scales roughly linearly with cluster size at fixed ICC, and it scales linearly with ICC at fixed cluster size. A seemingly tiny ICC of 0.05 combined with 50 patients per cluster triples the required sample size. That is the true cost of clustering.

ICC and cluster size together determine the design effect

Figure 2: ICC and cluster size together determine the design effect that inflates the required sample size.

Warning

Analysing a cluster RCT as if patients were independent shrinks standard errors and makes every p-value optimistic. If you randomised clusters, you must fit a mixed model or a GEE, not a plain t-test. This is a top-3 cause of non-replicable trial results.

Try it: Given an individually-randomised sample size target of n_ind = 400 per arm, ICC = 0.04, and cluster size m = 25, compute the cluster-adjusted per-arm sample size.

RYour turn: cluster RCT sizing

ex_cluster_n <- function(n_ind = 400, icc = 0.04, m = 25) { # your code here # Return a named list with: design_effect, n_per_arm, clusters_per_arm } ex_cluster_n() #> Expected: design_effect ≈ 1.96, n_per_arm = 784, clusters_per_arm = 32

Click to reveal solution

RCluster sizing solution

ex_cluster_n <- function(n_ind = 400, icc = 0.04, m = 25) { de <- 1 + (m - 1) * icc n_adj <- ceiling(n_ind * de) list( design_effect = round(de, 3), n_per_arm = n_adj, clusters_per_arm = ceiling(n_adj / m) ) } ex_cluster_n() #> $design_effect #> [1] 1.96 #> #> $n_per_arm #> [1] 784 #> #> $clusters_per_arm #> [1] 32

Explanation: Multiply the individual target by the design effect, then divide by cluster size to get the number of clusters. The ceiling ensures you recruit enough whole clusters.

How do you estimate ICC from pilot data in R?

ICC estimation is a variance decomposition. Fit a one-way random-effects model on the pilot data, extract the between-cluster and within-cluster variance components, and compute their ratio. Base R's aov() gives the components directly via method-of-moments, which is usually good enough for planning purposes.

First, simulate a pilot dataset with a known true ICC so we can verify the estimator. We will generate 20 clusters of 30 patients with between-cluster SD chosen so that the true ICC is 0.04.

RSimulate pilot data with known ICC

set.seed(2026) true_icc <- 0.04 n_clusters <- 20 cluster_size <- 30 sigma_w <- 1 sigma_b <- sqrt(true_icc / (1 - true_icc)) * sigma_w cluster_effects <- rnorm(n_clusters, 0, sigma_b) cluster_data <- data.frame( cluster = factor(rep(seq_len(n_clusters), each = cluster_size)), y = rep(cluster_effects, each = cluster_size) + rnorm(n_clusters * cluster_size, 0, sigma_w) ) head(cluster_data, 4) #> cluster y #> 1 1 -0.35248452 #> 2 1 0.07124314 #> 3 1 -0.31986817 #> 4 1 0.05781812

Now estimate ICC by pulling the mean squares out of aov() and solving for $\sigma_B^2$ and $\sigma_W^2$ via the standard random-effects expectations. With cluster size $m$, between-cluster mean square $MS_B$, and within-cluster mean square $MS_W$:

$$\hat{\sigma}_B^2 = \frac{MS_B - MS_W}{m}, \qquad \hat{\sigma}_W^2 = MS_W$$

REstimate ICC from aov variance components

aov_model <- aov(y ~ cluster, data = cluster_data) ms <- summary(aov_model)[[1]][, "Mean Sq"] ms_b <- ms[1] ms_w <- ms[2] sigma_b_hat <- max(0, (ms_b - ms_w) / cluster_size) sigma_w_hat <- ms_w icc_hat <- sigma_b_hat / (sigma_b_hat + sigma_w_hat) round(icc_hat, 4) #> [1] 0.0371

The estimate is 0.037 against a true value of 0.04, well within the sampling uncertainty expected at this sample size. The max(0, ...) guard is a practical fix for a quirk of method-of-moments, the raw formula can return a negative variance when between-cluster signal is weak relative to sampling noise.

Tip

The ICC and lme4 packages automate this in full R environments. ICC::ICCbare(cluster, y, data = df) wraps the same variance decomposition with confidence intervals, and lme4::lmer(y ~ (1 | cluster)) gives a REML-based estimate that behaves better in small samples. The base-R version here is transparent and fast to reason about.

Try it: Regenerate the pilot data with true_icc = 0.10, keeping 20 clusters of 30, then rerun the estimator. Save the result to ex_icc_hat.

RYour turn: ICC estimate at 0.10

ex_icc_hat <- local({ # your code here: simulate, fit aov, compute ICC NULL }) ex_icc_hat #> Expected: a single numeric value near 0.10

Click to reveal solution

RICC at 0.10 solution

ex_icc_hat <- local({ set.seed(11) true_icc2 <- 0.10 n_c <- 20 m_c <- 30 sw <- 1 sb <- sqrt(true_icc2 / (1 - true_icc2)) * sw eff <- rnorm(n_c, 0, sb) dd <- data.frame( cluster = factor(rep(seq_len(n_c), each = m_c)), y = rep(eff, each = m_c) + rnorm(n_c * m_c, 0, sw) ) ms2 <- summary(aov(y ~ cluster, data = dd))[[1]][, "Mean Sq"] sb2 <- max(0, (ms2[1] - ms2[2]) / m_c) round(sb2 / (sb2 + ms2[2]), 4) }) ex_icc_hat #> [1] 0.1089

Explanation: The same estimator recovers the higher ICC, 0.109 against a true 0.10. Stronger clustering signal gives a tighter estimate because the numerator MS_B - MS_W is larger relative to sampling noise.

Practice Exercises

Exercise 1: Stratified three-arm block randomisation

Write my_stratified_schedule(n_per_stratum, strata, arms, block_sizes) for a three-arm trial. Verify that every stratum ends up balanced across arms.

RExercise 1 starter

# Exercise 1: stratified 3-arm schedule # Hint: extend the pattern from the stratified randomisation section # permuted_block() already supports 3+ arms when block_sizes are multiples of length(arms) my_stratified_schedule <- function(n_per_stratum, strata, arms, block_sizes) { # your code here } # Test: set.seed(1) my_sched <- my_stratified_schedule( n_per_stratum = 18, strata = c("Site1", "Site2"), arms = c("A", "B", "C"), block_sizes = c(3, 6, 9) ) table(my_sched$stratum, my_sched$arm) #> Expected: each stratum has 6 in each of A, B, C

Click to reveal solution

RExercise 1 solution

my_stratified_schedule <- function(n_per_stratum, strata, arms, block_sizes) { do.call(rbind, lapply(strata, function(s) { data.frame( stratum = s, arm = permuted_block(n = n_per_stratum, arms = arms, block_sizes = block_sizes), stringsAsFactors = FALSE ) })) } set.seed(1) my_sched <- my_stratified_schedule( n_per_stratum = 18, strata = c("Site1", "Site2"), arms = c("A", "B", "C"), block_sizes = c(3, 6, 9) ) table(my_sched$stratum, my_sched$arm) #> #> A B C #> Site1 6 6 6 #> Site2 6 6 6

Explanation: Three arms require block sizes that are multiples of three, so c(3, 6, 9) is the natural choice. Inside each stratum, the permuted block function shuffles the allocation while preserving exact per-arm balance.

Exercise 2: Cluster RCT sample size planning

A team plans a cluster RCT for a GP-practice smoking cessation intervention. Individual-level power calculation suggests n = 600 per arm. The pilot estimated ICC = 0.03. Average practice size is 40 patients. Compute the cluster-adjusted sample size and the number of practices needed per arm. Save to my_clusters_per_arm.

RExercise 2 starter

# Exercise 2: cluster RCT planning # Hint: use DE = 1 + (m - 1) * icc, then divide by cluster size my_clusters_per_arm <- NULL # your code here: assign the answer to my_clusters_per_arm my_clusters_per_arm #> Expected: a single integer, roughly 33

Click to reveal solution

RExercise 2 solution

my_clusters_per_arm <- local({ n_ind <- 600 icc <- 0.03 m <- 40 de <- 1 + (m - 1) * icc n_adj <- ceiling(n_ind * de) ceiling(n_adj / m) }) my_clusters_per_arm #> [1] 33

Explanation: DE = 1 + 39 * 0.03 = 2.17, so the adjusted per-arm sample is 600 * 2.17 = 1302, divided by 40 gives 33 practices per arm. The team should plan 66 practices total.

Exercise 3: OBF verification at K = 4

Compute OBF boundaries for K = 4 looks using the calibrated final critical value c_K = 2.024, then simulate 20,000 null trials to verify the Type I rate stays near 0.05. Save the empirical rate to my_type1.

RExercise 3 starter

# Exercise 3: K=4 OBF verification # Hint: reuse sim_obf() but pass your K=4 boundaries my_obf4 <- NULL # the boundaries for K = 4 my_type1 <- NULL # empirical Type I rate # your code here round(my_type1, 3) #> Expected: a value near 0.05

Click to reveal solution

RExercise 3 solution

set.seed(2026) my_obf4 <- 2.024 * sqrt(4 / (1:4)) my_type1 <- mean(replicate(20000, sim_obf(n_per_arm = 200, K = 4, boundaries = my_obf4))) round(my_obf4, 3) #> [1] 4.048 2.863 2.337 2.024 round(my_type1, 3) #> [1] 0.051

Explanation: Four looks give a slightly more relaxed first boundary than the five-look case (4.05 vs 4.56), and the calibrated final critical value drops to 2.02. The empirical Type I rate lands near 0.05, confirming the boundary family still controls error with fewer looks.

Complete Example

Design a 2-arm cluster RCT of a school-based literacy intervention with one pre-planned interim analysis. Inputs: 24 schools available, ICC = 0.03, average cluster size m = 25 students, target detectable standardised effect 0.30, two-sided alpha = 0.05, 80% power. The workflow chains the three pieces of this tutorial into a single end-to-end plan.

REnd-to-end cluster RCT plan

# Step 1: individual-level sample size per arm (standard two-sample formula) d <- 0.30 alpha <- 0.05 power <- 0.80 z_alpha <- qnorm(1 - alpha / 2) z_beta <- qnorm(power) plan_n_ind <- ceiling(2 * ((z_alpha + z_beta) / d)^2) plan_n_ind #> [1] 175 # Step 2: inflate by the design effect icc <- 0.03 m <- 25 plan_de <- 1 + (m - 1) * icc plan_n_cluster <- ceiling(plan_n_ind * plan_de) plan_clusters_per_arm <- ceiling(plan_n_cluster / m) c(design_effect = round(plan_de, 2), n_per_arm = plan_n_cluster, clusters_per_arm = plan_clusters_per_arm, total_schools = 2 * plan_clusters_per_arm) #> design_effect n_per_arm clusters_per_arm total_schools #> 1.72 301 13 26

The feasibility check fails, 26 schools required against 24 available. The team has three options: accept slightly lower power, negotiate for two more schools, or tighten the pilot ICC estimate (if the true ICC is below 0.03, the design effect shrinks and fewer schools suffice). For the sake of this example, assume two extra schools are added and proceed.

RStratify schools and generate allocation

# Step 3: stratify by school size tertile set.seed(2026) n_per_stratum <- plan_clusters_per_arm * 2 / 3 # 26/3 ~= 9 per tertile, round up plan_schedule <- do.call(rbind, lapply(c("Small", "Medium", "Large"), function(tier) { data.frame( tier = tier, school_id = seq_len(ceiling(n_per_stratum)), arm = permuted_block( n = ceiling(n_per_stratum), arms = c("Intervention", "Control"), block_sizes = c(2, 4) ), stringsAsFactors = FALSE ) })) table(plan_schedule$tier, plan_schedule$arm) #> #> Control Intervention #> Small 5 4 #> Medium 5 4 #> Large 5 4

RPlan the interim analysis

# Step 4: one interim analysis at 50% information with K=2 OBF K_plan <- 2 c_K_plan <- 1.977 # calibrated OBF two-sided alpha = 0.05, K = 2 interim_boundary <- c_K_plan * sqrt(K_plan / 1) final_boundary <- c_K_plan * sqrt(K_plan / 2) c(interim = round(interim_boundary, 3), final = round(final_boundary, 3)) #> interim final #> 2.796 1.977

The plan: recruit 26 schools stratified by size, randomise within each tertile via permuted blocks, analyse at 50% information with a strict early-stopping threshold of |Z| > 2.80, and take the final decision at |Z| > 1.98. Every element is pre-specified before the first school is assigned, which is what keeps the 5% error rate honest.

Summary

Technique	R approach	When to use
Simple randomisation	`sample()`	Very large trials where imbalance averages out
Permuted block	Base R `permuted_block()` or `blockrand`	Standard default, prevents running imbalance
Stratified block	Loop over strata calling `permuted_block()`	Up to 3 or 4 prognostic factors
Minimisation	`Minirand` package	Many prognostic factors, small trials
Group-sequential	OBF scaling by hand, or `rpact`, `gsDesign`	Pre-planned interim analyses
Sample-size re-estimation	`rpact` with combination tests	Nuisance parameters uncertain at design
Cluster RCT sizing	Design effect $1 + (m-1)\rho$	Randomise groups not individuals
ICC estimation	`aov()` variance decomposition, or `ICC`, `lme4`	Sample-size planning from pilot data

References

Pallmann, P., Bedding, A. W., Choodari-Oskooei, B., et al. "Adaptive designs in clinical trials: why use them, and how to run and report them." BMC Medicine 16, 29 (2018). Link
Wassmer, G. and Pahlke, F. "rpact: Confirmatory Adaptive Clinical Trial Design and Analysis." R package documentation. Link
Snow, G. "blockrand: Randomization for Block Random Clinical Trials." CRAN package. Link
Higgins, P. "Randomization for Clinical Trials with R." Reproducible Medical Research with R, Chapter 24. Link
O'Brien, P. C. and Fleming, T. R. "A multiple testing procedure for clinical trials." Biometrics 35, 549-556 (1979).
Donner, A. and Klar, N. Design and Analysis of Cluster Randomization Trials in Health Research. Arnold (2000).
Campbell, M. K., Fayers, P. M., and Grimshaw, J. M. "Determinants of the intracluster correlation coefficient in cluster randomized trials." Clinical Trials 2, 99-107 (2005).
CRAN Task View: Clinical Trial Design, Monitoring, and Analysis. Link

Continue Learning

Sample Size Planning in R covers the individual-level sample size formulas that feed into the design effect inflation on this page.
Conditional Power and Sample Size Re-Estimation in R goes deeper on the adaptive machinery behind mid-trial sample-size updates.
Statistical Power Analysis in R lays out power calculations across test families, the input to every trial design decision.

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

Clinical Trials Design in R: Randomization, Adaptive Designs & ICC

How do you randomise participants to treatment arms in R?

How do you stratify randomisation to balance key covariates?

When is an adaptive design better than a fixed design?

How do you plan an O'Brien-Fleming group-sequential trial in R?

How does the intraclass correlation coefficient inflate sample size for cluster-randomised trials?

How do you estimate ICC from pilot data in R?

Practice Exercises

Exercise 1: Stratified three-arm block randomisation

Exercise 2: Cluster RCT sample size planning

Exercise 3: OBF verification at K = 4

Complete Example

Summary

References

Continue Learning

Further Reading

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

Clinical Trials Design in R: Randomization, Adaptive Designs & ICC

How do you randomise participants to treatment arms in R?

How do you stratify randomisation to balance key covariates?

When is an adaptive design better than a fixed design?

How do you plan an O'Brien-Fleming group-sequential trial in R?

How does the intraclass correlation coefficient inflate sample size for cluster-randomised trials?

How do you estimate ICC from pilot data in R?

Practice Exercises

Exercise 1: Stratified three-arm block randomisation

Exercise 2: Cluster RCT sample size planning

Exercise 3: OBF verification at K = 4

Complete Example

Summary

References

Continue Learning

Further Reading

Related Tutorials