Correlation Matrix Plot in R: corrplot, ggcorrplot, and ggplot2
A correlation matrix plot shows pairwise Pearson (or Spearman) correlations between all numeric variables in a dataset — typically as a color grid where warm colors mean strong positive correlation and cool colors mean negative correlation.
Introduction
When you have a dataset with 5-20 numeric variables, running cor() returns a matrix of numbers that's hard to parse at a glance. A correlation matrix plot turns that matrix into a color grid where patterns jump out immediately: clusters of highly correlated variables, variables that are negatively related, and variables that are independent.
There are three common approaches in R:
- ggplot2 + geom_tile() — full manual control, no extra packages
- ggcorrplot — wraps ggplot2 with sensible correlation-plot defaults (reordering, significance masking, upper/lower triangle)
- corrplot — base-R graphics, extremely feature-rich for publication
This post covers all three, starting with the ggplot2 approach to understand the mechanics, then showing how ggcorrplot streamlines the workflow.
How do you compute and reshape a correlation matrix for plotting?
Start with cor() to get the correlation matrix, then reshape it to long format for ggplot2.
as.table(cor_mat) converts the matrix to a table, and as.data.frame() flattens it to long format. Every pair of variables gets its own row, including the diagonal (self-correlation = 1) and both upper and lower triangle.
Try it: After running this, type nrow(cor_long) — it should equal n_vars² = 7² = 49 rows (all pairs including self-pairs and duplicates from both triangles).
How do you build a basic correlation heatmap with ggplot2?
Once you have long-format data, geom_tile() creates the color grid and scale_fill_gradient2() applies the diverging color scale.
scale_fill_gradient2() with midpoint = 0 and limits = c(-1, 1) anchors white to zero — strong positive correlations go red, strong negative go blue. The neutral variables appear white.
Try it: Change low = "#4393c3" and high = "#d6604d" to low = "#2166ac" and high = "#b2182b" for deeper, more saturated colors. Then try scale_fill_viridis_c(limits = c(-1, 1), option = "RdYlBu", direction = -1).
How do you use ggcorrplot for a smarter correlation plot?
ggcorrplot automates the tricky parts: hierarchical reordering of variables (grouping correlated variables together), masking the redundant triangle, and p-value significance filtering.
hc.order = TRUE clusters variables so highly correlated ones sit near each other — making patterns (like the cyl, disp, hp, wt cluster) visually obvious. type = "upper" shows only the upper triangle, eliminating the redundant mirror image.
Try it: Change method = "square" to method = "circle" — circles sized by correlation magnitude instead of solid colored squares. Which communicates the strength of weak correlations more clearly?
How do you show only the upper or lower triangle?
Showing both triangles is redundant (the matrix is symmetric). Use type = "upper" in ggcorrplot, or manually filter in the ggplot2 approach.
Try it: Add p.mat = cor_pmat(cor_mat) and sig.level = 0.05 inside ggcorrplot() — this masks correlations that are not statistically significant (p > 0.05) with an X mark, so readers know which correlations are reliable.
How do you add correlation value labels to tiles?
Labels inside tiles let readers see exact values without needing to reference a color scale. The key is switching text color for dark tiles so labels remain readable.
color = abs_cor > 0.5 switches between white text (for dark tiles with strong correlations) and grey text (for pale tiles near zero). This is the same technique used in the Heatmap-in-R post.
Try it: Change the threshold from 0.5 to 0.3 — more tiles get white text. Find the threshold that gives the best contrast for your color palette.
Complete Example: Publication-Ready Correlation Plot
Common Mistakes and How to Fix Them
Mistake 1: Not using a diverging color scale
❌ A sequential scale (e.g., scale_fill_viridis_c()) has no clear midpoint at zero, making it hard to tell positive from negative correlations.
✅ Always use a diverging scale anchored at 0:
Mistake 2: Including non-numeric columns in cor()
cor() fails if any column is non-numeric. Always subset to numeric columns first.
Mistake 3: Not setting limits = c(-1, 1) in the color scale
Without explicit limits, the scale anchors to the min and max of your data — not to -1 and 1. A maximum correlation of 0.95 would push the color scale, making 0.7 look "light" when it's actually strong.
Mistake 4: Showing both triangles
The correlation matrix is symmetric (r(A,B) = r(B,A)). Showing both triangles doubles every value and wastes space. Use type = "upper" in ggcorrplot or filter cor_long to Var1 < Var2.
Mistake 5: Ignoring the diagonal
The diagonal is always 1.0 (self-correlation) and adds no information. Remove it: cor_long <- cor_long[cor_long$Var1 != cor_long$Var2, ].
Practice Exercises
Exercise 1: iris correlation heatmap
Using the iris dataset (numeric columns only: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width), compute the correlation matrix and create a full ggplot2 heatmap with correlation labels. Use a blue-white-red diverging palette.
Show solution
Exercise 2: ggcorrplot with significance masking
Using mtcars (all numeric columns), create a ggcorrplot showing only the upper triangle, clustered by hierarchical ordering. Mask non-significant correlations (p > 0.05) with blank tiles.
Show solution
Summary
| Approach | Package | Best for |
|---|---|---|
geom_tile() |
ggplot2 | Full manual control, custom layouts |
ggcorrplot() |
ggcorrplot | Quick, clustered, significance-aware |
corrplot() |
corrplot | Base-R, many visual styles (circle, ellipse, pie) |
| Key function | Purpose |
|---|---|
cor(df) |
Compute Pearson correlation matrix |
cor(df, method = "spearman") |
Spearman (rank-based) correlations |
as.data.frame(as.table(cor_mat)) |
Reshape matrix to long format |
scale_fill_gradient2(midpoint = 0) |
Diverging color scale anchored at zero |
cor_pmat(df) |
Compute p-value matrix (from ggcorrplot) |
ggcorrplot(..., hc.order = TRUE) |
Reorder by hierarchical clustering |
FAQ
What is the difference between Pearson and Spearman correlation in these plots? Pearson measures linear association; Spearman measures monotonic (rank-based) association and is robust to outliers. Use cor(df, method = "spearman") to switch. For ordinal data or skewed distributions, Spearman is usually preferred.
How do I reorder variables manually instead of by clustering? Before plotting, reorder Var1 and Var2 factors: cor_long$Var1 <- factor(cor_long$Var1, levels = c("var_a", "var_b", ...)). The plot will respect the factor level order.
Why does my ggcorrplot show "X" marks on some tiles? You've passed p.mat with insig = "pch" — X marks indicate non-significant correlations (p > sig.level). Switch to insig = "blank" to show blanks, or insig = "n" to show nothing and display all correlations.
Can I add a scatter plot matrix alongside the correlation heatmap? Yes — the GGally::ggpairs() function creates a scatterplot matrix with correlations in the upper triangle, distributions on the diagonal, and scatter plots in the lower triangle. It combines visual exploration with correlation values.
How do I handle missing data in cor()? cor() returns NA for any pair that has NAs. Use use = "complete.obs" (listwise deletion) or use = "pairwise.complete.obs" (pairwise deletion) to handle missing values.
References
- ggcorrplot documentation: sthda.com/english/wiki/ggcorrplot
- corrplot CRAN vignette: cran.r-project.org/web/packages/corrplot
- Wickham H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer.
- Wilke C. (2019). Fundamentals of Data Visualization — Chapter 12: Visualizing associations
What's Next?
- Heatmap in R — the general case: any matrix as a color grid with geom_tile()
- ggplot2 Scatter Plots — explore bivariate relationships between individual variable pairs
- R Statistical Tests — back up what the correlation plot shows with formal hypothesis tests