t-Test Exercises in R: 12 One, Two & Paired Sample Problems, Solved Step-by-Step
These 12 t-test exercises in R walk you through one-sample, two-sample (Welch and Student), and paired tests with full runnable solutions, covering assumption checks, effect sizes, and one-tailed variants so you can pick the right test and defend the result.
Which t-test matches your question setup?
Before grinding through twelve problems, you need one skill: looking at a data layout and knowing which of three tests to fire. The decision hinges on two questions. Do you have one group or two? If two, are the groups independent subjects, or the same subjects measured twice? Here is that decision rule applied to three tiny datasets in a single block so you can see the three calls side by side.
Each call answers a different question. The one-sample call asks whether an overall mean matches a reference. The two-sample call asks whether two independent groups differ. The paired call asks whether the within-subject change differs from zero. Same function, three very different data layouts.
| Your setup | R call | When to use |
|---|---|---|
| One group vs a reference value | t.test(x, mu = value) |
Sample mean compared to a claim (label, target, historical mean) |
| Two independent groups | t.test(y ~ group, data = d) |
Different subjects in each group |
| Two measurements per subject | t.test(y ~ group, data = d, paired = TRUE) |
Before/after, matched pairs, crossover designs |
d <- after - before, then run t.test(d, mu = 0). You'll get the exact same t-statistic and p-value as t.test(after, before, paired = TRUE). The "paired" flag is really just bookkeeping for the subtraction.Try it: A nutrition study weighs 20 patients before and after 12 weeks on an anorexia treatment, giving each patient a pre and post weight. Which of the three t-test setups fits this design? Store your answer in ex_answer as "one", "two", or "paired".
Click to reveal solution
Explanation: Each subject contributes two measurements, so the pre and post values are linked. A paired t-test uses that link; an independent two-sample test would throw it away and lose power.
How do you read and report a t-test result in R?
A call to t.test() returns a list that looks like a printed block, but every number in that block is accessible by name. Five fields do most of the work: statistic (t value), parameter (degrees of freedom), p.value, conf.int (95% CI by default), and estimate (the sample mean, or the two group means). Pulling them out by name gives you one-line access for reports and pipelines.
Read in one breath: the sample mean (5.843) sits just below the hypothesized 5.85, the 95% CI covers 5.85, and p is 0.089. Not significant at 0.05. In APA style you would write this as t(149) = -1.71, p = 0.089, 95% CI [5.71, 5.85].
t.test() result in broom::tidy() (broom is WebR-safe) to get a one-row data frame with estimate, statistic, p.value, conf.low, conf.high, and method. It plugs straight into knitr::kable() or a ggplot label without manual extraction.p = 0.03 and tell opposite stories if one has a tight CI around a tiny effect and the other has a wide CI around a huge one. Always report the estimate, the confidence interval, and an effect size alongside the p-value, otherwise the reader cannot judge the finding.Try it: Run a one-sample t-test on airquality$Temp against mu = 77, remove missing values with na.rm handled by t.test() automatically, and print only the p-value to three decimals.
Click to reveal solution
Explanation: t.test() strips NAs internally, so you can pass a column with missing values directly. Chaining $p.value on the result pulls just the number, which is the form you want for pipelines and reports.
Practice Exercises
Twelve problems, split into one-sample (Exercises 1-4), two-sample (5-8), and paired (9-12). Each has a starter block, a hint, and a reveal. Solutions use my_* variable names so your tutorial variables above stay intact.
One-sample t-tests
Exercise 1: Two-sided test against a hypothesized mean
Using iris$Sepal.Length, test the hypothesis that the population mean equals 5.85. Store the full t.test() result in my_res1, then report the p-value rounded to three decimals and whether you reject H0 at alpha = 0.05.
Click to reveal solution
Explanation: p = 0.09 > 0.05, so you fail to reject H0. There is not enough evidence that the iris mean differs from 5.85.
Exercise 2: One-tailed test (less)
Use mtcars$mpg to test whether the population mean is less than 25 mpg. Set alternative = "less". Store the result in my_res2 and interpret.
Click to reveal solution
Explanation: p ≈ 4.5e-05 is far below 0.05, so you reject H0 in favour of the alternative: the mean mpg is significantly less than 25. The one-tailed framing lets the full alpha budget sit in the lower tail, raising power when the direction is pre-specified.
Exercise 3: Extract a 99% confidence interval
From airquality$Wind, extract the 99% confidence interval for the mean using conf.level = 0.99. Store the CI in my_ci and compare its width to the default 95% CI.
Click to reveal solution
Explanation: The 99% CI is wider (1.59 vs 1.21) because a higher confidence level demands a larger margin of error. A tighter CI requires either a larger sample or accepting a lower confidence level.
Exercise 4: Small-sample simulation
Use set.seed(21) and generate 8 observations from rnorm(8, mean = 102, sd = 5). Test H0: mu = 100. Store the sample in my_sample and the result in my_res4. Explain what the p-value tells you despite a true effect being present.
Click to reveal solution
Explanation: Even though the true mean (102) differs from H0 (100), the p-value (0.319) is not significant. Eight observations with sd = 5 give very low statistical power; a real but small effect goes undetected. This is a Type II error: failing to reject a false H0.
p > 0.05. Low power, not absence of effect.Two-sample t-tests
Exercise 5: Welch two-sample with formula notation
Compare mpg between automatic (am = 0) and manual (am = 1) cars in mtcars using the formula interface. This is Welch's test by default, no equal-variance assumption. Store in my_res5.
Click to reveal solution
Explanation: p = 0.0014 indicates a clear mpg difference between transmission types. Notice the fractional df (18.33): Welch's adjustment shrinks the degrees of freedom to account for unequal group variances.
y ~ group requires exactly two levels. If your grouping variable has 3+ levels, subset first (as in Exercise 6) or switch to one-way ANOVA with aov().Exercise 6: Student two-sample with var.equal = TRUE
Using iris, compare Petal.Length between "setosa" and "versicolor". Assume equal variances by setting var.equal = TRUE (Student's classical form). Store in my_res6.
Click to reveal solution
Explanation: Degrees of freedom are integer (98 = n1 + n2 - 2), confirming this is the classical Student form. The enormous t-statistic reflects the huge gap between setosa and versicolor petal lengths, a textbook case of clear group separation.
Exercise 7: One-tailed two-sample test
Using iris, test whether Sepal.Width of "virginica" is greater than that of "versicolor". Use alternative = "greater". Store in my_res7.
Click to reveal solution
Explanation: The relevelling ensures virginica is the reference level, so alternative = "greater" tests virginica mean > versicolor mean. p = 0.0009 is strong evidence for the directional claim. Always check how your factor is ordered before a one-tailed two-sample test.
Exercise 8: Cohen's d for a two-sample comparison
Using chickwts, compute Cohen's d for the weight difference between "casein" and "horsebean" feeds using the pooled-standard-deviation formula. Store the result in my_d8.
$$d = \frac{\bar{x}_1 - \bar{x}_2}{s_{\text{pooled}}}, \quad s_{\text{pooled}} = \sqrt{\frac{(n_1 - 1) s_1^2 + (n_2 - 1) s_2^2}{n_1 + n_2 - 2}}$$
Click to reveal solution
Explanation: d = 2.82 is a huge effect (well above the 0.8 "large" threshold). Chicks on casein weigh nearly three pooled standard deviations more than chicks on horsebean, a gap far bigger than the p-value alone would communicate.
Paired t-tests
Exercise 9: Classic paired test
Use the built-in sleep dataset, which measures extra hours slept for 10 subjects under two drugs. Run a paired t-test comparing group = 1 and group = 2. Store the result in my_res9.
Click to reveal solution
Explanation: p = 0.0028, df = 9 (one less than the 10 pairs). The two drugs produce significantly different extra-sleep amounts. Because the same 10 subjects appear in both groups, each pair contributes one difference, hence the n - 1 = 9 degrees of freedom.
Exercise 10: Paired vs independent on the same data
Run two tests on sleep: a paired test (my_paired) and an independent two-sample test (my_indep). Compare the p-values and explain why they differ.
Click to reveal solution
Explanation: The paired test delivers p = 0.0028 while the independent test gives p = 0.079. Same data, wildly different verdicts.
Exercise 11: One-tailed paired test
Using sleep, test whether drug 2 produces more extra sleep than drug 1. Use alternative = "greater" with a careful factor ordering so the direction is correct. Store in my_res11.
Click to reveal solution
Explanation: p = 0.0014 = 0.0028 / 2, exactly half the two-sided paired p-value. That halving only makes sense if the data points the predicted way, which it does here (drug 2's mean > drug 1's mean). Never use a one-tailed test as a way to rescue a borderline two-sided p-value; commit to the direction before running the test.
Exercise 12: Assumption check before a paired test
For a paired t-test, normality must hold on the differences, not on the two raw groups. Run shapiro.test() on the differences for sleep, then run the paired t-test. Store the differences in my_diffs and the test in my_res12.
Click to reveal solution
Explanation: Shapiro-Wilk on the differences gives p = 0.83, so you fail to reject normality. The paired t-test is justified, and its p = 0.0028 matches the formula call in Exercise 9. If Shapiro had returned p < 0.05, you would switch to wilcox.test(g2, g1, paired = TRUE).
g1 and g2 separately while the differences are perfectly normal (and vice versa). The paired t-test assumption is on the differences, full stop.Complete Example: Medication Effect on Systolic Blood Pressure
One end-to-end workflow that combines test selection, assumption check, t-test, effect size, and an APA-style report. This is the pattern to reuse on your own paired data.
Scenario. Fifteen patients have their systolic blood pressure measured at baseline and again six weeks after starting a new medication. You want to know whether the medication lowers BP on average. Because each subject contributes two measurements, this is a paired design, and you expect a negative direction (post < pre).
First, simulate realistic paired data with a known drop.
The first five rows confirm each subject's after reading is below their before reading. Now run the full assumption-check + test + effect-size sequence.
Everything lines up. Shapiro's p = 0.94 confirms normal differences. The paired t-test rejects H0 overwhelmingly with t(14) = -9.23, p < 0.001, and the one-sided 95% CI rules out any drop smaller than 6.88 mmHg. Cohen's d is -2.38, far beyond "large." The APA-style report writes itself:
A paired-samples t-test showed a significant drop in systolic blood pressure after six weeks of medication (M_before = 147.6, M_after = 139.4), t(14) = -9.23, p < .001, one-sided 95% CI [-∞, -6.88], Cohen's d = -2.38.
This template, simulate or load → check assumption → test → effect size → one-sentence APA report, is the exact sequence to reuse when writing up any of the twelve exercises above against your own data.
Summary
| Test | R call | Assumption check | Effect size |
|---|---|---|---|
| One-sample | t.test(x, mu = m0) |
Shapiro on x |
(mean(x) - m0) / sd(x) |
| Welch two-sample | t.test(y ~ g, data = d) |
Shapiro per group | Cohen's d with pooled SD |
| Student two-sample | t.test(y ~ g, data = d, var.equal = TRUE) |
Shapiro per group + var.test() |
Cohen's d with pooled SD |
| Paired | t.test(y ~ g, data = d, paired = TRUE) |
Shapiro on the differences | mean(diff) / sd(diff) |
Key habits you should now have locked in:
- Pick the test from the data layout, not from what looks convenient.
- Always extract the named fields (
$p.value,$conf.int,$estimate) rather than eyeballing the printed block. - Run the assumption check on the correct quantity (raw groups for two-sample, differences for paired).
- Report p-value, 95% CI, and an effect size together, never the p-value alone.
- Commit to one-sided direction before seeing the data; do not use it as a p-hacking rescue.
References
- R Core Team,
t.test()reference documentation. Link - Student, "The Probable Error of a Mean." Biometrika 6(1), 1908. The original t-test paper. Link
- Welch, B. L., "The generalization of Student's problem when several different population variances are involved." Biometrika 34, 1947. Link
- Navarro, D., Learning Statistics with R, Chapter 13: Comparing two means. Link
- Diez, Barr, Çetinkaya-Rundel, OpenIntro Statistics, 4th ed., Chapter 7. Link
- Cohen, J., Statistical Power Analysis for the Behavioral Sciences, 2nd ed., Routledge, 1988. Effect-size thresholds for d.
- Wickham, H. & Grolemund, G., R for Data Science. Link
Continue Learning
- t-Tests in R, the canonical reference that covers the decision rule, assumption checks, and effect sizes in depth; use it when any exercise above leaves you wanting more theory.
- Hypothesis Testing in R, situates the t-test inside the broader hypothesis-testing framework, alongside chi-square, proportion tests, and non-parametric alternatives.
- Effect Size in R, deep-dive on Cohen's d, Hedges' g, and Glass's delta, with the pooled-SD formulas that Exercise 8 and the Complete Example use.