ggplot2 geom_violin() in R: Violin Plots With Examples
The geom_violin() function in ggplot2 shows the kernel density of a distribution mirrored on both sides of a vertical axis, producing a violin shape per group. It reveals shape (multimodality, skew) that box plots hide.
ggplot(df, aes(x = group, y = value)) + geom_violin() # basic ggplot(df, aes(x = group, y = value, fill = group)) + geom_violin() # filled ggplot(df, aes(x, y)) + geom_violin(scale = "count") # area = sample size ggplot(df, aes(x, y)) + geom_violin() + geom_boxplot(width = 0.1) # combo ggplot(df, aes(x, y)) + geom_violin(trim = FALSE) # full tails ggplot(df, aes(x, y)) + geom_violin(adjust = 0.5) # tighter density ggplot(df, aes(x, y)) + geom_violin() + geom_jitter(width = 0.05, alpha=0.4)# raw + violin
Need explanation? Read on for examples and pitfalls.
What geom_violin() does in one sentence
geom_violin() is a mirrored kernel density plot per group. Each "violin" shape shows the distribution of values: wider at densely populated y values, narrower at sparse ones. Symmetric around the center axis.
Unlike geom_boxplot() (which shows a five-number summary), violin plots reveal SHAPE: bimodality, heavy tails, skew. Use violin when shape matters; use boxplot when you have many groups and only need quick comparison.
Syntax
geom_violin() requires aes(x, y) for grouped violins. Each unique x produces one violin built from the y values in that group.
The full signature:
geom_violin(mapping = NULL, data = NULL, stat = "ydensity", position = "dodge",
..., draw_quantiles = NULL, trim = TRUE, scale = "area",
na.rm = FALSE, orientation = NA, show.legend = NA, inherit.aes = TRUE)
scale argument controls how violin widths relate to each other. Default "area" makes all violins have equal AREA (each shape sums to the same density). "count" makes them proportional to sample size (groups with more data get wider violins). "width" makes them all the same maximum width. Choose scale = "count" when group sizes differ a lot.Six common patterns
1. Basic violin per group
One violin per class. Width at each y level shows how many observations cluster there.
2. Filled violins
fill = class colors each violin. Hide the redundant legend with guides(fill = "none").
3. Violin with embedded boxplot
The thin boxplot inside each violin shows median and IQR; the violin shows shape. outlier.shape = NA prevents double-display.
4. Scale by sample size
scale = "count" makes the area of each violin proportional to the number of observations in that group. Smaller groups appear visibly thinner. Useful when sample sizes vary a lot.
5. Show full tails (trim = FALSE)
By default trim = TRUE cuts violins at the extreme observed values. trim = FALSE extends them through the full kernel density range, showing what the smoother predicts beyond the data.
6. Violin + jittered raw points
Combination of shape (violin) and individual values (jitter). Best when you want to verify the shape against actual points.
geom_jitter() or geom_dotplot() to show actual points.geom_violin() vs geom_boxplot()
Both compare distributions across groups. Violin shows SHAPE; boxplot shows SUMMARY.
| Feature | geom_violin | geom_boxplot |
|---|---|---|
| Shows distribution shape | Yes | No (just summary) |
| Reveals multimodality | Yes | No |
| Compact for many groups | Wider per group | More compact |
| Effective with small N | No (needs 20+) | Yes |
| Identifies outliers | Implicit (tail width) | Explicit (outlier points) |
| Shows median, IQR | Add draw_quantiles = c(0.25, 0.5, 0.75) |
Built in |
| Best for | 4 to 8 groups, shape matters | Many groups, ranking medians |
When to use which:
- Use violin when group SHAPES matter (multimodality, skew).
- Use boxplot when SUMMARY suffices and group count is large.
- For both, overlay:
geom_violin() + geom_boxplot(width = 0.1).
Common pitfalls
Pitfall 1: violin with small N is misleading. With < 10 obs per group, the kernel density is mostly noise. Use geom_dotplot() or geom_jitter() instead for small samples.
Pitfall 2: forgetting that violin is symmetric. Both halves of the violin show the same density. The width does NOT mean "spread to the right of the line"; it means "density of values at this y level".
scale = "area" can give visually misleading widths. All violins normalize to the SAME area, so a group with 5 observations and a group with 500 obs look equally wide. Set scale = "count" to encode sample size in width.Pitfall 3: kernel bandwidth choice changes appearance. ggplot's default adjust = 1 uses a bandwidth that smooths over real bumps in some data. Try adjust = 0.5 for tighter (more detail) or adjust = 2 for looser (more smoothing) density.
Try it yourself
Try it: Make a violin plot of mpg$hwy per class, overlay a thin white boxplot, and use scale = "count" to show group sizes. Save to ex_plot.
Click to reveal solution
Explanation: scale = "count" makes each violin's width proportional to its sample size. The geom_boxplot(width = 0.1) adds a thin embedded boxplot showing the median and IQR. Together they show shape AND summary statistics.
Related ggplot2 functions
After mastering geom_violin(), look at:
geom_boxplot(): simpler summary; pair with violingeom_density(): density curve for a single variable (no grouping)geom_dotplot(): stacked dots for small samplesgeom_jitter(): raw points overlayggdist::stat_halfeye(): half-violin / raincloud-style displaysggridges::geom_density_ridges(): ridge plots (rotated stacked densities)
For raincloud plots (half-violin + box + jitter combination), the ggdist package extends ggplot2 with publication-quality distribution displays.
FAQ
What is the difference between violin and box plot in ggplot2?
Violin plot shows the SHAPE of the distribution (kernel density). Box plot shows a five-number SUMMARY (median, IQR, whiskers). Violin reveals multimodality and skew that box plots hide; box plot is more compact when comparing many groups.
How do I add a boxplot inside a violin plot in ggplot2?
Layer them: geom_violin() + geom_boxplot(width = 0.1, outlier.shape = NA). The thin boxplot shows median and IQR; the violin shows shape. Hide outliers on the boxplot to avoid double-display with the violin tails.
What does the scale argument do in geom_violin?
scale controls how violin widths relate. "area" (default) gives all violins equal area. "count" makes width proportional to sample size. "width" gives all violins the same maximum width. Use "count" when group sizes differ.
Why does my violin plot look weird with small N?
Kernel density needs many observations to estimate shape. With fewer than ~10 observations per group, the violin is mostly an artifact of smoothing. Use geom_jitter() or geom_dotplot() for small samples; reserve violin for groups with 20+ obs each.
How do I make a horizontal violin plot in ggplot2?
Add coord_flip(): ggplot() + geom_violin() + coord_flip(). Or in newer ggplot2, swap x and y: aes(x = value, y = group). Both produce horizontal violins.