Sign Test in R: The Simplest One-Sample Nonparametric Test
The sign test is the simplest one-sample nonparametric test in R. It asks whether a population's median equals a hypothesised value using only the signs of the differences, no normality, no symmetry, no rank arithmetic. In R, you run it with one line: binom.test().
This guide covers the test on real survey data, on paired before/after measurements, the head-to-head with Wilcoxon, and ties.
When should you use the sign test?
The sign test answers one question: is the median of your data different from a value you specify? It counts how many observations land above your hypothesised median and treats each one like a coin flip. Under the null, you would expect roughly half above and half below, any extreme split is evidence against it. Because the test only looks at signs, it survives outliers, skew, and ordinal data that break the t-test.
A SaaS team runs a 20-user satisfaction survey on a 1 to 10 scale, and wants to know whether the median rating differs from 7.
Out of 17 non-tie ratings, 15 sat above 7 and only 2 sat below. Under a true median of 7, that 15-to-2 split is the same as flipping a fair coin and getting at least 15 heads in 17 tosses, the exact two-sided p-value is about 0.0024. We reject the null and conclude the median rating is genuinely above 7. Notice we never used a mean, a standard deviation, or any distributional assumption.
Try it: Given the vector ex1_vals of 12 reaction-time measurements below, test whether the median differs from 50 ms.
Click to reveal solution
Explanation: 8 of 12 values exceed 50, an 8-to-4 split that is well within what fair coin flipping produces, so we cannot reject the null.
How does the sign test actually work?
The mechanics are pure coin-flipping. Take each observation and compare it to your hypothesised median. Mark a + if it is above, a - if it is below, drop ties. Under the null, each observation is equally likely to be above or below the true median, so the count of + signs follows a Binomial distribution with n equal to the non-tie count and p = 0.5. The two-sided p-value is the probability of seeing a split at least as extreme as yours under that binomial.
Written as a formula, the upper-tail probability is:
$$P(K \geq k \mid n,\ p = 0.5) = \sum_{i=k}^{n} \binom{n}{i} (0.5)^n$$
Where:
- $k$ = the observed count of "+" signs (above the hypothesised median)
- $n$ = the count of non-tie observations
- $\binom{n}{i}$ = the binomial coefficient ("n choose i")
The two-sided p-value doubles the smaller tail. To prove binom.test() is doing exactly this, let us reproduce its p-value by hand using pbinom(), R's binomial CDF.
The two values agree to the seventh decimal. That is because the binomial null distribution is exact, no normal approximation, no large-sample correction. Whether you have 5 observations or 5,000, the p-value binom.test() returns is the true probability under the null.
Try it: A pharmacology lab runs 9 trials and 7 of them produce a response above the hypothesised median. Reproduce the two-sided p-value manually with pbinom(), then check it against binom.test().
Click to reveal solution
Explanation: pbinom(k - 1, ..., lower.tail = FALSE) gives P(X >= k). Doubling that for a two-sided test reproduces what binom.test() does internally.
How do you run a paired sign test in R?
Paired data is just one-sample data in disguise. If each subject contributes a before and an after value, take their difference and ask whether the median of those differences is zero. The sign test still uses binom.test(), the only extra step is computing the differences first.
R's built-in sleep dataset is the textbook paired example: 10 patients each tried two soporific drugs, and we have the extra hours of sleep produced by each. We will reshape it to wide format, compute drug2 - drug1, and run the sign test against H0: median difference = 0.
Nine of nine non-tie differences favoured drug 2, an unbroken winning streak. The exact p-value is $2 \times (1/2)^9 \approx 0.0039$, well below 0.05. We conclude drug 2 produced more extra sleep than drug 1, without ever assuming anything about the shape of the response distribution.
binom.test() against p = 0.5. The same machinery handles both designs.Try it: Test whether the median of mtcars$mpg is different from 20 mpg using a sign test.
Click to reveal solution
Explanation: 14 of 32 cars exceed 20 mpg, a 14-to-18 split close enough to 50/50 that the sign test cannot reject the hypothesised median.
Sign test vs Wilcoxon: which should you pick?
The Wilcoxon signed-rank test is the natural next step up. It also tests a one-sample median, but it uses the ranks of the absolute differences, not just their signs. That extra information gives Wilcoxon more statistical power when its symmetry assumption holds. The sign test, in contrast, does not assume symmetry, it only assumes the median exists.
Let us run both on the satisfaction scores from earlier and compare.
Both tests reject the null on this data. The two p-values differ because the underlying null hypotheses are subtly different: the sign test asks about the median, while Wilcoxon asks about the pseudo-median (median of pairwise averages), which only equals the true median when the distribution is symmetric. When symmetry holds and the data are continuous, Wilcoxon usually wins on power because it uses magnitudes too. When symmetry is doubtful, or the data are heavily tied as in Likert scores, the sign test answers a cleaner question.
The decision rule is short:
| Test | Distributional assumption | Power | When to prefer |
|---|---|---|---|
| Sign test | None about shape | Lowest of the three | Skewed data, ordinal scales, tiny samples |
| Wilcoxon signed-rank | Symmetry around median | Higher when symmetric | Continuous data with no obvious skew |
| One-sample t-test | Approximate normality | Highest when normal | Continuous, roughly normal, moderate n |
Try it: Run both the sign test and Wilcoxon on the small heart-rate sample below against H0: median = 70. Save the two p-values.
Click to reveal solution
Explanation: 14 of 14 non-tie heart rates lie above 70, an unbroken streak that pushes the sign test p-value to roughly 0.0001. Wilcoxon agrees, both tests strongly reject H0.
How do you handle ties (zero differences)?
Ties are observations that exactly equal the hypothesised median. The standard convention, baked into every R implementation of the sign test, is to drop them and reduce n accordingly. The rationale is intuitive: a tie provides no evidence in either direction, so it should not contribute to the count.
The danger is forgetting to drop them. If you pass the full sample size to binom.test() while the success count only includes non-ties, you understate the proportion and inflate your p-value, sometimes dramatically.
Both runs use the same 4 successes, but the right call asks "given 5 informative observations, is 4-to-1 unusual?" while the wrong call asks "given 10 trials, is a 4-to-6 split unusual?" The two p-values disagree by a factor of two, and on real data the difference can flip your conclusion.
length(x) as n when ties exist. Always count non-tie observations. A common bug is binom.test(sum(x > mu), length(x), p = 0.5), which silently mis-inflates p-values whenever any value equals mu. Use sum(x != mu) instead.Try it: Given ex5_vals below (with several ties at 4), compute the correct number of non-ties and run the sign test against H0: median = 4.
Click to reveal solution
Explanation: Five of the 12 values equal 4 and are dropped, leaving 7 informative observations. Of those, 6 lie above 4, a 6-to-1 split that is suggestive but not significant.
Practice Exercises
Exercise 1: One-sided sign test on training scores
Twelve employees take a course. We have their pre and post scores. Test whether the median improvement is greater than zero using a one-sided sign test. Save the p-value to cap1_p. Hint: pass alternative = "greater" to binom.test(), and remember to drop any patient whose pre and post scores are equal.
Click to reveal solution
Explanation: 10 of 11 employees improved, only 1 declined, one tied. The one-sided p-value of about 0.006 is strong evidence the median improvement is positive.
Exercise 2: Reviewer ratings, sign vs Wilcoxon
A streaming service collects 25 reviewer ratings on a 1-10 scale and wants to test whether the median differs from 6. Run both a sign test and a Wilcoxon test, save the two p-values, then explain in one sentence why they differ. Hint: this rating distribution has substantial ties at 6.
Click to reveal solution
Explanation: 17 of 18 non-tie ratings exceed 6, a 17-to-1 split that drives both p-values well below 0.001. They differ because Wilcoxon uses the magnitudes of the differences while the sign test uses only signs. With 7 ratings clustered at exactly 6 (the hypothesised median), Wilcoxon's tie-correction softens its rank advantage, and the sign test edges ahead on this dataset.
Complete Example
Let us walk an end-to-end sign test on a fictional QA team's bug-fix completion times for 18 tickets, hypothesising that the median time is 33 minutes. We will count signs, run the sign test, run Wilcoxon for comparison, and write the conclusion.
The 10-to-7 split is close to a coin flip, and the sign test returns p ≈ 0.63. Wilcoxon picks up more from the magnitudes and lands at p ≈ 0.10, but neither test crosses the 0.05 threshold. Statistical conclusion: the data are consistent with a median bug-fix time of 33 minutes, we have no evidence to reject that hypothesis. If this were a regression check ("is the team slower than last quarter's median of 33?"), we would conclude no slowdown.
Summary
The sign test in five lines:
- It tests whether a population's median equals a hypothesised value using only the signs of the differences.
- In R, run it with
binom.test(successes, n_non_ties, p = 0.5). - For paired data, take the differences first, then run the same one-sample test against
p = 0.5. - Prefer the sign test when symmetry is suspect, when the data are ordinal, or when the sample is tiny.
- Always drop ties before counting, never pass
length(x)as the trial count.
References
- R Core Team. binom.test: Exact Binomial Test. R documentation, base package
stats. Link - Hollander, M., Wolfe, D. A., and Chicken, E. Nonparametric Statistical Methods, 3rd Edition. Wiley (2014). Chapter 3 covers the one-sample sign test in depth.
- Mangiafico, S. S. R Handbook: Sign Test and Trinomial Test for One-sample Data. rcompanion.org. Link
- Conover, W. J. Practical Nonparametric Statistics, 3rd Edition. Wiley (1999). Section 3.4 on sign tests and confidence intervals for the median.
- Wikipedia contributors. Sign test. Wikipedia. Link
Continue Learning
- Wilcoxon Signed-Rank Test in R, the rank-based generalisation that gains power when symmetry holds.
- Wilcoxon-Mann-Whitney and Kruskal-Wallis in R, extends nonparametric testing to two and many independent samples.
- One-Sample t-Test in R, the parametric counterpart for normally distributed data.