Pearson Correlation Test in R: cor.test() Guide
A Pearson correlation test in R measures the LINEAR association between two numeric variables. Use cor.test(x, y) to get the correlation coefficient, p-value, and confidence interval.
cor.test(x, y) # default Pearson, two-sided cor.test(x, y, method = "spearman") # rank-based (non-linear ok) cor.test(x, y, method = "kendall") # rank-based (small N) cor.test(x, y, alternative = "greater") # one-sided cor.test(x, y, conf.level = 0.99) # custom CI cor(x, y, use = "complete.obs") # just the coefficient (no test) cor.test(x, y)$p.value # extract p-value
Need explanation? Read on for examples and pitfalls.
What Pearson correlation does in one sentence
Pearson's r measures the strength and direction of LINEAR association between two numeric variables, ranging from -1 (perfect negative) to +1 (perfect positive), with 0 meaning no linear relationship. The test gives a p-value for the null hypothesis "true correlation is 0".
Use it for two continuous variables that look reasonably linear and roughly normal. For non-linear or rank-based associations, use Spearman or Kendall instead.
Syntax
cor.test(x, y) returns Pearson's r, t-statistic, p-value, and 95% CI.
The correlation is -0.87 (strong negative), p is essentially 0, CI is (-0.93, -0.74).
Five common patterns
1. Default Pearson correlation
Pearson's r answers: "as one variable increases, does the other increase (positive r) or decrease (negative r)?"
2. Spearman (rank-based, robust to non-normality)
Spearman correlates the RANKS rather than raw values. Robust to outliers and non-linear monotonic relationships.
3. Kendall (small samples, many ties)
Kendall's tau is more conservative than Spearman, especially with small samples.
4. One-sided test
Use alternative = "greater" (one-sided positive) or "less" (one-sided negative) ONLY if you specified the direction BEFORE seeing the data.
5. Many-pair correlation matrix
For a matrix WITH p-values, use Hmisc::rcorr(as.matrix(data)) or psych::corr.test(data).
Pearson vs Spearman vs Kendall
Three correlation methods with different assumptions and use cases. Pick based on data shape and sample size.
| Method | Tests | Robust to | Best for |
|---|---|---|---|
| Pearson | Linear association | Normal-ish data | Two roughly normal continuous variables |
| Spearman | Monotonic association | Non-normality, mild outliers | Non-linear monotonic, ranks, ordinal data |
| Kendall | Concordance | Small N, many ties | Small samples, ordinal with ties |
When to use which:
- Use Pearson for typical "linear" scenarios with normal-ish data.
- Use Spearman when data are skewed or non-linear monotonic.
- Use Kendall for small samples or many tied values.
Common pitfalls
Pitfall 1: assuming correlation means linearity. Pearson only measures LINEAR association. Two variables can be perfectly related (e.g., y = x^2) yet have Pearson r = 0. Always plot first.
Pitfall 2: outliers can dominate. A single extreme point can drive correlation up or down. Use Spearman if you suspect outliers, or examine influence with cooks.distance() after fitting lm().
p.adjust(p_values, method = "BH") for false-discovery-rate control.Pitfall 3: small samples produce unreliable estimates. With n < 20, the correlation coefficient has wide confidence intervals. A "strong" correlation in a tiny sample may not replicate.
Try it yourself
Try it: Test the correlation between iris$Sepal.Length and iris$Petal.Length using Pearson. Save to ex_test and report the coefficient and p-value.
Click to reveal solution
Explanation: Pearson r = 0.87 (strong positive linear correlation). Sepal length and petal length grow together. The tiny p-value confirms the relationship is far from zero.
Related tests
After mastering Pearson correlation, look at:
cor.test(method = "spearman"): rank-based, non-linear monotoniccor.test(method = "kendall"): rank-based with tieslm(y ~ x): regression for the same data with slope and interceptHmisc::rcorr(): correlation matrix with p-valuespsych::corr.test(): correlation matrix with adjustment optionsppcor::pcor.test(): partial correlation controlling for other variables
For visualization, ggplot2::ggplot() + geom_point() + geom_smooth() shows the relationship visually.
FAQ
How do I do a Pearson correlation test in R?
cor.test(x, y) runs the default Pearson test. The result includes the correlation coefficient, t-statistic, p-value, and 95% confidence interval. Save the result and use $estimate for r and $p.value for the p-value.
What is the difference between Pearson and Spearman correlation?
Pearson measures linear association on the raw values. Spearman measures monotonic association on the ranks. Use Pearson for normal-ish linear data; Spearman for non-normal, ordinal, or non-linear-monotonic data.
How do I extract the correlation coefficient from cor.test in R?
result <- cor.test(x, y); result$estimate. The estimate is named "cor" (or "rho" for Spearman, "tau" for Kendall). For just the number: unname(result$estimate) or result$estimate[[1]].
What does a high correlation mean?
A high correlation (close to +1 or -1) means the variables vary together (or oppositely). It does NOT mean one causes the other; a third variable could explain both, or the direction may be reversed. Correlation is a description of co-variation, not causation.
How do I test correlation with NA values in R?
cor.test() and cor() accept a use argument: use = "complete.obs" removes any row with NA in either variable; use = "pairwise.complete.obs" keeps all pairwise combinations. The choice affects the final r and the sample size.