Violin Plot in R: Draw, Customize, and Combine with Boxplots
A violin plot shows the full distribution of a variable using a mirrored density estimate — making it far more informative than a boxplot for data with multiple modes, heavy tails, or unusual shapes. In ggplot2, geom_violin() creates them in one line.
Introduction
A boxplot summarises a distribution with five numbers — minimum, Q1, median, Q3, maximum. That's useful for quick comparisons, but it completely hides distribution shape. Two groups with identical boxplots can have wildly different distributions: one unimodal and symmetric, the other bimodal and skewed.
A violin plot solves this by showing the full density estimate of the data on both sides of a central axis — the "width" of the violin at any point represents how many data points fall near that value. Where the violin is widest, data clusters most densely. Where it narrows, data is sparse.
The ideal approach combines both: a violin for shape + a mini boxplot for the five-number summary + optional jitter points for the raw data. Together they give readers more information than any single plot type alone.
In this tutorial you will learn:
- How to draw a basic violin plot with
geom_violin() - How to fill, color, and style violins
- How to embed a boxplot inside the violin for dual encoding
- How to add raw data points on top
- How the
adjustandscaleparameters control the bandwidth and sizing
How Does geom_violin() Show the Distribution Shape?
geom_violin() uses kernel density estimation (KDE) — the same algorithm as geom_density() — but mirrors the density curve symmetrically around the group's x-position. Each side of the violin is the density curve reflected.
Let's start with the basics:
The widest part of the front-wheel-drive violin — around 25-30 mpg — shows where most front-wheel-drive cars cluster. The narrower sections indicate fewer cars at those values.
Now add color and fill to distinguish groups:
KEY INSIGHT: Use
fillfor the violin interior andcolorfor the outline. Settingcolor = "white"removes the default outline, giving a cleaner look. Usealphato control transparency — useful when violins overlap or you add points on top.
Try it: Change palette = "Set2" to palette = "Dark2". How does the visual impact change?
How Do You Embed a Boxplot Inside a Violin Plot?
The classic pattern is violin + mini boxplot. The violin shows shape; the boxplot inside shows median, IQR, and outliers. The trick is sizing the inner boxplot small enough that it doesn't dominate the violin.
width = 0.12 keeps the boxplot narrow enough to read inside the violin. outlier.shape = NA suppresses the outlier dots — since the violin already shows the full data distribution, outlier dots add clutter without adding information.
TIP: Add a median point explicitly for extra clarity:
stat_summary(fun = median, geom = "point", size = 2, color = "black"). This adds a solid dot at the median position, making it easy to compare medians across groups even when the violin widths differ.
Try it: Remove outlier.shape = NA from the boxplot. Do the outlier points add useful information, or do they clutter the violin?
How Do You Add Raw Data Points to a Violin Plot?
For smaller datasets (fewer than ~200 observations per group), showing the individual data points on top of the violin reveals exactly where each observation falls. Use geom_jitter() with a small width to prevent overlap:
The three layers now communicate:
- Violin — shape and density of the distribution
- Boxplot — median, IQR, and whisker range
- Jitter — every individual data point
WARNING: For large datasets (>500 points per group), jitter becomes a solid mass that obscures the violin shape. Use it only with moderately sized groups. With large data, the violin + boxplot combination alone is sufficient — the density estimate already tells the full distribution story.
Try it: Set width = 0.25 in geom_jitter(). Do the points still sit inside the violin, or do they spill outside?
How Do the Bandwidth and Scale Parameters Work?
Two parameters control the violin's shape: adjust (bandwidth) and scale (how violins are sized relative to each other).
**adjust — bandwidth multiplier:**
The bandwidth controls how smooth or detailed the density estimate is. adjust = 1 uses the default bandwidth (chosen automatically). adjust < 1 gives a rougher, more detailed estimate that follows local peaks. adjust > 1 gives a smoother, more generalized estimate.
**scale — how violin widths compare:**
scale = |
Violin width represents |
|---|---|
"area" (default) |
Equal total area for all violins |
"count" |
Width proportional to number of observations in that group |
"width" |
All violins scaled to the same maximum width |
KEY INSIGHT:
scale = "count"is often the most honest choice — a group with 10 observations shouldn't look as prominent as a group with 100. Withscale = "area"(the default), all violins look equally important regardless of sample size.
Try it: Change scale = "count" to scale = "width". Do the three violins now have the same maximum width?
Common Mistakes and How to Fix Them
Mistake 1: Using a violin plot with too few data points
❌ With fewer than ~20-30 observations per group, the kernel density estimate is unreliable — the violin shows smooth curves that misrepresent sparse data:
✅ Use a dotplot (geom_dotplot()) or just jitter (geom_jitter()) for small samples. The violin's smooth estimate is only trustworthy with 30+ observations per group.
Mistake 2: Jitter width spilling outside the violin
❌ geom_jitter(width = 0.4) spreads points wider than the violin, making points appear to "float" outside the density region.
✅ Keep jitter width small (0.05-0.1). The points should cluster inside the violin's widest section.
Mistake 3: Outlier dots shown from both geom_boxplot and geom_jitter
❌ When combining boxplot + jitter inside a violin, the boxplot outlier points double-plot with the jitter points.
✅ Always set outlier.shape = NA in geom_boxplot() when adding jitter on top.
Mistake 4: Using the default scale = "area" when group sizes differ greatly
❌ If group A has 10 observations and group B has 200, both violins appear the same size — misrepresenting how much data supports each estimate.
✅ Use scale = "count" to make violin width reflect sample size.
Mistake 5: Using a violin when a ridgeline plot would be better
❌ Comparing 8+ groups side-by-side with violins creates a very wide, crowded chart.
✅ For many groups, use a ridgeline plot (ggridges::geom_density_ridges()) which stacks distributions vertically — much more readable with 5+ groups.
Practice Exercises
Exercise 1: Violin with diamonds data
Using the diamonds dataset, create a violin plot of price by cut. Add an embedded boxplot (width = 0.1). Use scale = "count" so the violin widths reflect how many diamonds are in each cut category. Add appropriate labels and a colorblind-safe palette.
Exercise 2: Three-layer violin for iris
Using the iris dataset (150 rows, 3 species with 50 each), create a violin + boxplot + jitter combination for Sepal.Length by Species. Since the sample is small (50 per group), set adjust = 1.5 for a smoother bandwidth. Remove the legend since the x-axis already labels the groups.
Complete Example
This example compares the distribution of city MPG across vehicle classes with all three layers, a cleaned theme, and labeled axes:
reorder(class, cty, FUN = median) sorts the x-axis by median city MPG — ensuring the most efficient classes appear on the right and the chart tells a clear story from left (least efficient) to right (most efficient).
Summary
| Task | Code |
|---|---|
| Basic violin | geom_violin() |
| Fill by group | aes(fill = var) + scale_fill_brewer() |
| Embed boxplot | + geom_boxplot(width = 0.12, fill = "white", outlier.shape = NA) |
| Add raw points | + geom_jitter(width = 0.08, height = 0, alpha = 0.5) |
| Adjust smoothness | geom_violin(adjust = 1.5) (smoother) or adjust = 0.5 (rougher) |
| Scale by sample size | geom_violin(scale = "count") |
| Sort x by median | aes(x = reorder(var, y, FUN = median)) |
| Add median dot | stat_summary(fun = median, geom = "point") |
Key rules:
- Use violins only with 30+ observations per group — fewer points make the density estimate unreliable
- Combine violin + boxplot for both shape and summary
- Set
scale = "count"when group sizes differ meaningfully - Use
adjustto control smoothness: lower = more detail, higher = smoother
FAQ
When should I use a violin plot instead of a boxplot?
Use a violin when the distribution shape matters — for example, to detect bimodal distributions (two peaks), skewness, or heavy tails. A boxplot will never reveal that a group has two distinct sub-populations; a violin shows this immediately as two bulges. When you only need the five-number summary for a quick comparison, a boxplot is simpler and cleaner.
Why does my violin look like a very thin spike?
You likely have very few observations in one group. With 10 or fewer points, the kernel density estimate produces a very narrow violin that misrepresents the data. Check your group sizes and consider using geom_jitter() or geom_dotplot() for small samples instead.
How do I draw a horizontal violin plot?
Add coord_flip() to rotate the chart 90°: ggplot(...) + geom_violin() + coord_flip(). Horizontal violins work well when group labels are long.
What is the difference between adjust and bw in geom_violin()?
adjust is a multiplier on the automatically chosen bandwidth. bw sets the bandwidth to an explicit value in the same units as the data. For most use cases, adjust is more practical — adjust = 0.5 always means "twice as rough as the default," regardless of the data's units or scale.
Can I show half-violins to save space?
Yes, using the gghalves package: gghalves::geom_half_violin() draws only one side of the violin, letting you pair it with a half-jitter plot on the other side in a "raincloud" layout. This is more compact and equally informative.
References
- Hintze, J. L. & Nelson, R. D. (1998). Violin Plots: A Box Plot-Density Trace Synergism. The American Statistician, 52(2), 181–184.
- ggplot2 reference —
geom_violin(). https://ggplot2.tidyverse.org/reference/geom_violin.html - Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer. https://ggplot2-book.org/
- Wilke, C. O. (2019). Fundamentals of Data Visualization, Chapter 9: Visualizing Many Distributions at Once. https://clauswilke.com/dataviz/
- R Graph Gallery — Violin Charts. https://r-graph-gallery.com/violin.html
- gghalves package documentation. https://erocoar.github.io/gghalves/
What's Next?
- ggplot2 Distribution Charts — the full guide to histograms, density plots, boxplots, and violin plots with guidance on when each type works best.
- Ridgeline Plot in R — stack distributions vertically with
ggridges::geom_density_ridges()for clean comparison of 5+ groups. - ggplot2 Box Plots — deep dive into
geom_boxplot()with notched variants, variable-width boxplots, and grouping strategies.