Power Analysis Exercises in R: 8 Sample Size Calculation Problems, Solved Step-by-Step)
These 8 power analysis exercises in R walk you through sample size calculations for t-tests, ANOVA, correlations, proportions, and multiple regression using the pwr package, with every problem solved step-by-step and each solution runnable in the browser.
Which pwr function matches your test?
Every power calculation in the pwr package follows one idea: pick the function that matches your test, plug in three of the four knobs (sample size, effect size, significance level, power), and leave the fourth as NULL. R solves for whichever you left blank. Here is the same function used twice, once to find sample size, once to find power, so you see both directions before jumping into the 8 exercises.
The first call says a two-sample t-test needs about 64 participants per group to detect a medium effect (Cohen's d = 0.5) with 80% power at α = 0.05. The second call flips the question: if you only have 30 per group, your power drops to 48%, which means more than half the time you would miss the effect even when it is truly there. One function, one object, two completely different design decisions.
Here is the one-line decision rule for picking a function:
| Your test | pwr function | Effect size you supply |
|---|---|---|
| One-, two-, or paired-sample t-test | pwr.t.test() |
Cohen's d |
| One-way ANOVA | pwr.anova.test() |
Cohen's f |
| Two-proportion z-test | pwr.2p.test() |
Cohen's h (via ES.h()) |
| Pearson correlation | pwr.r.test() |
r (the correlation itself) |
| Chi-square test | pwr.chisq.test() |
Cohen's w |
| Linear / multiple regression | pwr.f2.test() |
Cohen's f² |
n, R solves for sample size. If you do not pass power, R solves for power. If you do not pass the effect size argument (d, f, h, r, w, f2), R solves for the minimum detectable effect. Three knowns, one unknown, every time.Try it: Compute the sample size for a one-sample t-test that detects a small-to-medium effect of d = 0.4 with 80% power at α = 0.05. Save the raw n (unrounded) to ex_n.
Click to reveal solution
Explanation: type = "one.sample" is the switch that turns a two-sample calculation into a one-sample one. The one-sample t-test needs fewer participants than two-sample for the same effect, because you are only estimating one mean against a known reference, not two means against each other.
How do you read pwr output in R?
Every pwr.* function returns a "power.htest" object, a list with named components you can index directly. The print method shows a nicely formatted block, but the real value is in pulling fields out by name for report tables, plots, or downstream decisions like inflating for dropout.
The raw $n of 63.77 is the mathematical solution, the real-world sample size is always ceiling($n) because you cannot enrol a fractional participant. For a two-sample design, the 64 is per group, so the total study size is 128. Mixing these up is one of the most common power analysis mistakes, and Exercise 4 will reinforce it with a concrete cost in study budget.
Cohen also gave convenient small/medium/large presets for each effect-size family. You will use these constantly when a principal investigator says "assume a medium effect":
| Family | Small | Medium | Large | Function |
|---|---|---|---|---|
| d (mean difference / SD) | 0.20 | 0.50 | 0.80 | pwr.t.test |
| f (ANOVA between / within SD) | 0.10 | 0.25 | 0.40 | pwr.anova.test |
| h (arcsine of proportions) | 0.20 | 0.50 | 0.80 | pwr.2p.test |
| r (correlation) | 0.10 | 0.30 | 0.50 | pwr.r.test |
| f² (R² / (1 − R²)) | 0.02 | 0.15 | 0.35 | pwr.f2.test |
The math is the same in every family: small is hard to find, large is impossible to miss. Cohen's cohen.ES(test = "t", size = "medium") returns the same presets programmatically if you want them inside a function.
ceiling(), never round(). round(63.77) gives 64, but round(63.20) gives 63, which would leave you underpowered. Always round up to guarantee the target power, not down. This is a one-line habit that prevents a recurring class of design bugs.Try it: Take the saved ss_res from above and compute the total sample size for the two-sample design, rounded up. Save to ex_ceil.
Click to reveal solution
Explanation: Two-sample designs have two equal arms, so total = per-arm × 2. If the arms were unequal, you would instead call pwr.t2n.test() with n1 and n2 separately.
Practice Exercises
Eight capstone problems, ordered from simpler single-function solves to harder inverse problems. Each exercise uses a distinct ex1_ to ex8_ prefix so your solution variables do not clobber earlier state.
Exercise 1: Two-sample t-test sample size
You are designing an A/B test comparing a new onboarding flow against the old one. You want to detect a medium effect (d = 0.5) with 80% power at α = 0.05, two-sided. How many users do you need per arm?
Click to reveal solution
Explanation: 63.77 per group rounds up to 64 per arm, 128 total. Cohen calls d = 0.5 a medium effect, the kind you can spot with the naked eye on a boxplot. Smaller effects (d = 0.2) would roughly quadruple the sample, because required n scales as 1/d².
Exercise 2: Paired t-test sample size
You are running a pre-post study on the same group of employees (measuring focus score before and after a training). The expected within-subject effect is small-to-medium (d = 0.3). You want 90% power at α = 0.05, two-sided. How many pairs do you need?
Click to reveal solution
Explanation: You need 120 paired measurements (i.e., 120 employees measured twice, not 240 people). Paired designs soak up between-subject variability, so the same d is effectively easier to detect than in a two-sample design. Two knobs changed from Exercise 1: the effect shrank from 0.5 to 0.3 (which alone triples n), and power rose from 0.80 to 0.90 (which adds about 34%). Both bumps push n up sharply.
Exercise 3: One-way ANOVA with 4 groups
You are comparing mean revenue across 4 marketing channels. You expect a medium between-group effect (f = 0.25), want 80% power at α = 0.05. How many observations per group do you need?
Click to reveal solution
Explanation: 45 observations per group × 4 groups = 180 total. ANOVA power depends on the between-group spread relative to the within-group spread, captured by Cohen's f. The same f = 0.25 with only 2 groups (a t-test) would need only ~64 per group, four groups demand more because the detection threshold has to clear the multiple-comparison baked into the F statistic's null distribution.
Exercise 4: Two-proportion test (email click-through rates)
Your baseline email has a 10% click-through rate. You want to detect an improvement to 15% (absolute 5 percentage-point lift) with 80% power at α = 0.05, two-sided. How many emails do you need per variant?
Click to reveal solution
Explanation: ES.h() applies the arcsine transformation, h = 2·asin(√p1) − 2·asin(√p2), giving h = −0.152 here (the sign just tracks which proportion is larger; pwr uses the magnitude). You need 341 emails per arm, 682 total. Notice how small an absolute 5-percentage-point lift looks in effect-size units, proportions close to 0 or 1 have a compressed arcsine scale, and detecting them needs more data than the raw difference suggests.
ES.h() is not the same as p1 − p2. Cohen's h stabilises variance across the 0–1 range of proportions, so a 5-point lift from 0.10 to 0.15 gives a different h than a 5-point lift from 0.45 to 0.50. Always feed raw proportions to ES.h() rather than computing a difference yourself.Exercise 5: Correlation sample size
You want to run a validity study to detect a modest correlation (r = 0.30) between two survey scales with 80% power at α = 0.05, two-sided. How many participants do you need?
Click to reveal solution
Explanation: You need 85 participants. Correlation power scales roughly as 1/atanh(r)², so detecting r = 0.1 would need about 783 participants, while r = 0.5 only needs about 29. Fisher's z-transform (atanh) is the quiet workhorse behind this calculation, it turns the bounded [−1, 1] correlation into an unbounded normal-ish quantity where power arithmetic works cleanly.
Exercise 6: Achieved power for a fixed n (inverse problem)
Your principal investigator says "we already have n = 30 per group, and we expect a medium effect (d = 0.5). What is our power for a two-sample t-test at α = 0.05?" Solve for power (the inverse direction).
Click to reveal solution
Explanation: Power is about 48%, well below the conventional 80% target. In plain English, more than half the time this study would fail to detect a real medium effect. This is the classic "post-hoc power" setup, and the honest recommendation here is: either collect more data (Exercise 1 said ~64 per arm for 80%), accept a smaller detectable effect (Exercise 7), or redesign with a more sensitive measure.
Exercise 7: Minimum detectable effect (inverse problem)
Constraint flips again: you have exactly n = 50 per group, you want 80% power at α = 0.05, two-sided. What is the smallest Cohen's d your study can reliably detect?
Click to reveal solution
Explanation: Your minimum detectable effect is d ≈ 0.566, just above medium. Anything smaller than that, your study will miss more than 20% of the time. This framing is often the most useful version of power analysis in practice: instead of asking "how many do I need?" (which assumes you know the effect), you ask "given what I can collect, what effects can I actually see?" and then judge whether that floor is scientifically interesting.
Exercise 8: Multiple linear regression (pwr.f2.test)
You are fitting a multiple regression with 3 predictors and want to detect a medium overall effect (f² = 0.15) with 80% power at α = 0.05. Solve for v (the denominator degrees of freedom) and convert to total sample size.
Click to reveal solution
Explanation: u is the numerator df (number of tested predictors), v is the denominator df (residual df). Total sample size is n = v + u + 1 because the model consumes u + 1 parameters (the intercept plus u slopes). So 73 residual df + 3 predictors + 1 intercept = 77 participants. Cohen's f² = R² / (1 − R²); f² = 0.15 corresponds to an R² of about 13%.
v + u + 1, not just v. pwr.f2.test() returns v, the residual degrees of freedom, because that is the quantity the F statistic uses. Converting to a recruitment target always requires adding back u + 1 for the model parameters.Complete Example: Design an A/B Test in 10 Lines
You are the analyst on an e-commerce pricing experiment. Baseline conversion is 12%. Product wants to know the smallest variant-conversion lift you can reliably detect with a two-proportion z-test at 80% power, α = 0.05, assuming 20% user dropout before the conversion window closes. Here is the whole calculation in one block.
You need to route about 5,504 users into the A/B test to reliably spot a 2-percentage-point lift after accounting for dropout. The key numbers to report back to Product are: the minimum lift assumed (2pp), the per-arm inflated n (2,752), and the total (5,504). If they push back ("can we do it with 3,000 users?"), run the inverse version from Exercise 6: hold n fixed and report the resulting power, so the tradeoff is explicit.
Summary
Here is the full scoreboard for the 8 exercises plus the capstone, side-by-side:
| # | Test | pwr function | Effect size | Solve for | Answer |
|---|---|---|---|---|---|
| 1 | Two-sample t | pwr.t.test |
d = 0.5 | n | 64 per arm (128 total) |
| 2 | Paired t | pwr.t.test (paired) |
d = 0.3 | n | 120 pairs |
| 3 | One-way ANOVA, k=4 | pwr.anova.test |
f = 0.25 | n | 45 per group (180 total) |
| 4 | Two-proportion | pwr.2p.test |
h from ES.h(0.10, 0.15) | n | 341 per arm (682 total) |
| 5 | Correlation | pwr.r.test |
r = 0.30 | n | 85 |
| 6 | Two-sample t | pwr.t.test |
d = 0.5, n = 30 | power | 0.478 |
| 7 | Two-sample t | pwr.t.test |
n = 50, power = 0.80 | d | 0.566 |
| 8 | Multiple regression | pwr.f2.test |
u = 3, f² = 0.15 | v → total n | 77 total |
| E | A/B test design | pwr.2p.test + inflation |
h from 0.12 vs 0.14 | n with dropout | 5,504 total |
Four rules carry through every exercise:
- Three knowns, one unknown. Leave the unknown knob as
NULLand pwr solves for it. ceiling(), neverround(). Per-arm n always rounds up, because 63.77 participants means you need 64.- n is per group. Multiply by the number of arms for t-tests and ANOVA; add
u + 1for regression. - Inflate for dropout. Divide per-arm n by the expected retention rate (e.g.,
n / 0.80for a 20% dropout study) before reporting.
References
- Champely, S. (2020). pwr: Basic Functions for Power Analysis. CRAN package. Link
- pwr package vignette, "A simple example". Link
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences, 2nd ed. Routledge.
- Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159.
- Higgins, P. D. R. Reproducible Medical Research with R, Chapter 23 – Sample Size Calculations with pwr. Link
- UCLA OARC, Power Analysis for Paired Sample t-test. Link
- UCLA OARC, One-way ANOVA Power Analysis. Link
Continue Learning
- Statistical Power Analysis in R – the theory companion for these exercises, covering the math of power curves and Cohen's effect-size families in depth.
- t-Test Exercises in R – drill the t-test itself on 12 graded problems once you have sized your study.
- Confidence Interval Exercises in R – the sibling framework to power, the other half of the design-and-report story.