Pre-Analysis Plans in R: Commit Before You Analyze
A pre-analysis plan is a public commitment to your research design, hypotheses, and analysis approach — written and registered before you look at the data. It's the single most effective tool against p-hacking, HARKing, and the garden of forking paths.
Imagine a poker player who gets to see all the cards before placing a bet. That's what an unregistered analysis looks like: you see the data, then "predict" what you'll find. Pre-analysis plans level the playing field by forcing you to commit your bet before the cards are dealt.
What Goes in a Pre-Analysis Plan?
Section
Contents
Why It Matters
Hypotheses
Specific, testable predictions
Prevents HARKing
Design
Sample size, randomization, data collection
Prevents optional stopping
Variables
Definitions, measurement, coding
Prevents specification searching
Primary analysis
Exact statistical test(s)
Prevents multiple testing
Secondary analyses
Exploratory analyses (labeled as such)
Maintains transparency
Exclusion criteria
Rules for dropping observations
Prevents selective exclusion
Multiple testing
Correction method
Controls false positive rate
Power analysis
Sample size justification
Ensures adequate power
The Pre-Registration Workflow
# Step 1: Design your study and write the plan BEFORE collecting/analyzing data
cat("=== Pre-Registration Workflow ===\n\n")
cat("BEFORE data collection:\n")
cat(" 1. State hypotheses\n")
cat(" 2. Define variables and measurement\n")
cat(" 3. Specify sample size (with power analysis)\n")
cat(" 4. Write exact analysis code (test on simulated data)\n")
cat(" 5. Register on OSF, AsPredicted, or ClinicalTrials.gov\n")
cat(" 6. Get a timestamped, immutable record\n\n")
cat("AFTER data collection:\n")
cat(" 7. Run the pre-specified analysis (no deviations)\n")
cat(" 8. Report ALL pre-registered results (including nulls)\n")
cat(" 9. Label any additional analyses as 'exploratory'\n")
cat(" 10. Link the publication to the registration\n")
Step 1: Power Analysis
Before writing the plan, determine how many participants you need.
# Power analysis for a two-sample t-test
power_analysis <- function(effect_size, alpha = 0.05, power = 0.80) {
# Using the formula: n = (z_alpha + z_beta)^2 * 2 / d^2
z_alpha <- qnorm(1 - alpha/2)
z_beta <- qnorm(power)
n_per_group <- ceiling((z_alpha + z_beta)^2 * 2 / effect_size^2)
n_per_group
}
cat("=== Power Analysis ===\n")
cat("Two-sample t-test, alpha = 0.05, power = 0.80\n\n")
effect_sizes <- c(0.2, 0.5, 0.8)
labels <- c("Small", "Medium", "Large")
for (i in seq_along(effect_sizes)) {
n <- power_analysis(effect_sizes[i])
cat(sprintf(" %s effect (d=%.1f): n = %d per group (%d total)\n",
labels[i], effect_sizes[i], n, 2*n))
}
cat("\nIn practice, use: power.t.test(delta, sd, power = 0.8)\n")
# Built-in power function
result <- power.t.test(delta = 5, sd = 10, power = 0.80)
cat(sprintf("\nExample: Detect 5-point difference (SD=10):\n"))
cat(sprintf(" n = %d per group\n", ceiling(result$n)))
Step 2: Write the Analysis Code on Simulated Data
The key insight: write and test your entire analysis before seeing the real data.
cat("=== AsPredicted Template ===\n\n")
plan <- list(
"1. Data collection" = "Has data collection begun? NO",
"2. Hypothesis" = "Treatment group will show greater improvement in test scores
than control group (directional, one-tailed).",
"3. Dependent variable" = "Change score = post_score - pre_score",
"4. Conditions" = "2 groups: control (business-as-usual), treatment (intervention)",
"5. Analyses" = "Primary: Welch's t-test on change scores.
Secondary: ANCOVA with pre_score as covariate.",
"6. Outliers" = "Exclude participants with change scores > 3 SD from group mean.
Criterion specified before data collection.",
"7. Sample size" = "n = 50 per group (100 total).
Power = 0.80 for d = 0.5 at alpha = 0.05.",
"8. Other" = "Multiple comparison correction: N/A (single primary test).
Missing data: complete case analysis (report dropout rates)."
)
for (q in names(plan)) {
cat(sprintf("%s\n %s\n\n", q, plan[[q]]))
}
What if I discover something unexpected in the data? Report it! But label it clearly as an "exploratory finding" that was not pre-registered. This is perfectly acceptable and expected. The problem is presenting exploratory findings as if they were predicted all along (HARKing).
Can I update my pre-analysis plan? Yes, but only before you look at the data. If circumstances change (e.g., lower-than-expected recruitment), you can file an amendment. After seeing the data, any changes must be reported as deviations from the plan.
Is pre-registration required for publication? Not universally, but increasingly expected. Many journals offer "Registered Reports" where the study is accepted based on the design alone (before results are known). This eliminates publication bias because the paper is accepted regardless of whether results are significant.