ggplot2 geom_boxplot() in R: Box Plots With Examples
The geom_boxplot() function in ggplot2 draws a box plot summarizing a distribution: median (center line), 25th and 75th percentiles (box edges), whiskers (1.5x IQR), and outliers (points). It is the most efficient way to compare many distributions at once.
ggplot(df, aes(x = group, y = value)) + geom_boxplot() # basic ggplot(df, aes(x = group, y = value, fill = group)) + geom_boxplot() # filled ggplot(df, aes(x = group, y = value)) + geom_boxplot(notch = TRUE) # with notches ggplot(df, aes(x = group, y = value)) + geom_boxplot() + geom_jitter(width = 0.1) ggplot(df, aes(x, y)) + geom_boxplot(outlier.color = "red") # color outliers ggplot(df, aes(x = reorder(group, value), y = value)) + geom_boxplot()# sorted ggplot(df, aes(x = group, y = value)) + geom_boxplot() + coord_flip() # horizontal
Need explanation? Read on for examples and pitfalls.
What geom_boxplot() does in one sentence
geom_boxplot() draws a five-number summary per group: median, 25th and 75th percentiles, plus whiskers and outliers. The box represents the middle 50% of values; the line inside is the median; whiskers extend to 1.5 times the interquartile range; points beyond are outliers.
Box plots are the most data-dense way to compare many distributions in one chart. They reveal central tendency, spread, skewness, and outliers without asking the reader to interpret bin choices.
Syntax
geom_boxplot() requires aes(y). With grouping, also map x. Without an x mapping, all values go into one box.
The full signature:
geom_boxplot(mapping = NULL, data = NULL, stat = "boxplot", position = "dodge2",
..., outlier.colour = NULL, outlier.color = NULL, outlier.fill = NULL,
outlier.shape = 19, outlier.size = 1.5, outlier.stroke = 0.5,
outlier.alpha = NULL, notch = FALSE, notchwidth = 0.5, varwidth = FALSE,
na.rm = FALSE, orientation = NA, show.legend = NA, inherit.aes = TRUE)
varwidth = TRUE for proportional widths. With varwidth = TRUE, each box's width is proportional to the square root of the group's sample size. Smaller groups get narrower boxes, conveying group size visually alongside distribution.Six common patterns
1. Basic boxplot per group
The simplest case: one box per class. Median, IQR, whiskers, and outliers all visible.
2. Filled boxes with color
fill = class colors each box. guides(fill = "none") removes the legend (redundant with the x axis).
3. Notched box plot
notch = TRUE adds a "notch" around the median. Non-overlapping notches between groups suggest medians differ significantly. Notches are approximate; for formal tests, use a Wilcoxon test.
4. Box plot with jittered points overlay
geom_jitter overlays raw points; outlier.shape = NA hides boxplot outliers (avoiding double-display since jitter shows them too). Useful when group sizes are small.
5. Highlight outliers in red
outlier.color and outlier.size style the outlier points without affecting the box.
6. Sorted box plot (by median)
reorder(class, hwy, FUN = median) orders factor levels by descending median hwy. Combined with coord_flip(), this produces a clean ranked horizontal box plot.
geom_violin() or overlay raw data with geom_jitter(). Box plots are best for ranking medians and spotting outliers across many groups.geom_boxplot() vs base R boxplot()
Both produce nearly identical visualizations. ggplot2's version integrates with grouping, faceting, and theming.
| Task | ggplot2 | Base R |
|---|---|---|
| Per group | aes(x = grp, y = val) + geom_boxplot() |
boxplot(val ~ grp, data = df) |
| With notches | geom_boxplot(notch = TRUE) |
boxplot(notch = TRUE) |
| Color by group | aes(fill = grp) |
boxplot(col = palette) |
| Sort by median | reorder(grp, val, median) |
(manual reorder before plot) |
| Outliers off | outlier.shape = NA |
outline = FALSE |
| Overlay points | + geom_jitter() |
stripchart(add = TRUE) |
When to use which:
- Use ggplot2 for grouped, faceted, sorted, or themed box plots.
- Use base
boxplot()for one-line interactive use.
Common pitfalls
Pitfall 1: outliers shown twice when adding jitter. geom_boxplot() + geom_jitter() plots outliers as part of the boxplot AND as jitter points. Set outlier.shape = NA on the boxplot to hide its outliers.
Pitfall 2: x is continuous but you want grouped boxes. ggplot tries to draw one box per unique x value. If x is numeric and represents groups, convert to factor: aes(x = factor(group_num), y = value).
Pitfall 3: comparing very different group sizes silently. A box from 5 observations carries less reliable info than a box from 500. Use varwidth = TRUE to encode group size as box width; or print group counts (stat_summary(geom = "text", fun.data = ...)).
Try it yourself
Try it: Make a box plot of mpg$hwy per class, with classes sorted by median hwy, horizontal orientation, and steelblue fill. Save to ex_plot.
Click to reveal solution
Explanation: reorder(class, hwy, FUN = median) orders factor levels by ascending median hwy. After coord_flip(), the highest-median class is at the TOP of the chart (visually). The fill applies to all boxes.
Related ggplot2 functions
After mastering geom_boxplot(), look at:
geom_violin(): shows distribution shape, not just summarygeom_jitter(): raw points with random horizontal offsetgeom_dotplot(): stacked dots per observation (small N only)stat_summary(): add custom summary points (mean, SE, etc.) on topgeom_errorbar(): alternative summary displaycoord_flip(): horizontal box plots
For "raincloud" plots (half-violin + box + jitter), see the ggdist package which extends ggplot2 with publication-quality distribution displays.
FAQ
How do I add jittered points to a ggplot2 boxplot?
Combine: geom_boxplot(outlier.shape = NA) + geom_jitter(width = 0.2, alpha = 0.5). Hide boxplot outliers (otherwise they show twice). width = 0.2 keeps jitter inside the box footprint.
What do the whiskers mean in a ggplot2 boxplot?
Whiskers extend to the most extreme value within 1.5 times the interquartile range (IQR) from the 25th and 75th percentile boxes. Points beyond are plotted individually as outliers. This is Tukey's convention.
How do I sort boxes by median in ggplot2?
Wrap the x mapping in reorder(): aes(x = reorder(group, value, FUN = median), y = value). The FUN argument controls the sort statistic (median, mean, max, etc.). Default sort is ascending.
How do I make a horizontal boxplot in ggplot2?
Add coord_flip() after the boxplot: ggplot() + geom_boxplot() + coord_flip(). Or in newer ggplot2, swap x and y in aes(): aes(x = value, y = group) produces horizontal boxes natively.
Should I use boxplot or violin plot in ggplot2?
Boxplot is best for comparing MANY groups (5+) where the summary is the focus. Violin plot is better when the SHAPE of each distribution matters (multimodality, skewness). For best of both, overlay: geom_violin() + geom_boxplot(width = 0.1).