Bubble Chart in R: Add a Third Variable to Your Scatter Plot
A bubble chart is a scatter plot where a third numeric variable is encoded as the size of each point — letting you visualize three dimensions in a single 2D plot without faceting.
Introduction
A scatter plot shows two variables. A bubble chart shows three. That third variable — population size, revenue, number of observations — is encoded as the area of the circle (the "bubble"), giving your data an extra dimension without adding complexity for the reader.
The classic use case is Hans Rosling's famous gapminder visualization: GDP on the x-axis, life expectancy on the y-axis, population as bubble size, and continent as color. Four variables, one chart, an immediately readable story.
ggplot2 makes bubble charts straightforward with geom_point() — the only difference from a scatter plot is that you also map a numeric variable to the size aesthetic. But the devil is in the details: using the wrong size scale, skipping transparency, or adding labels naively can make a bubble chart unreadable. This post walks you through every step.
How do you create a basic bubble chart in R?
A bubble chart is just geom_point() with size mapped to a variable inside aes(). Let's use the built-in mtcars dataset, where we'll show wt (weight) on x, mpg on y, and bubble size proportional to hp (horsepower).
scale_size(range = c(2, 14)) maps the smallest hp value to 2px and the largest to 14px. Without it, the default range is tiny — all bubbles look nearly the same size.
Try it: Change range = c(2, 14) to range = c(1, 20) and see how it affects readability. Is the range too dramatic now?
What is the difference between scale_size() and scale_size_area()?
This is the most commonly misunderstood part of bubble charts. There are two size scales in ggplot2, and they behave differently:
scale_size()maps the data value to the radius of the circle.scale_size_area()maps the data value to the area of the circle.
Why does this matter? Human perception works in two dimensions — we perceive circle area, not radius. If value A is twice value B and you double the radius, the bubble looks four times bigger (because area = π r²). This creates a misleading chart.
Rule of thumb:
- Use
scale_size_area()when your size variable represents a count or total (population, revenue, observations) — this is the perceptually honest choice. - Use
scale_size()when you want manual control over the visual range and perceptual accuracy is less critical. scale_size_area()also maps zero to an actual zero-sized point (invisible), which is useful when zero is a meaningful value.
Try it: Swap scale_size_area(max_size = 14) for scale_size(range = c(2, 14)) and compare the two plots. Notice which bubbles look more proportionally correct.
How do you add a fourth variable with color?
Color is the natural fourth channel. Add a categorical or continuous variable to the color aesthetic to encode group membership or gradient intensity.
Four variables, one chart: weight (x), mpg (y), horsepower (size), cylinders (color). The pattern becomes immediately readable — 8-cylinder cars cluster in the bottom-right (heavy and inefficient), 4-cylinder cars in the top-left.
Try it: Change color = cyl_f to color = hp (continuous) and swap scale_color_manual() for scale_color_viridis_c(). How does encoding the same variable (hp) as both size and color change what you notice?
How do you add labels to bubble charts without overlapping?
Labels on bubble charts overlap easily because bubbles are large and placed at arbitrary positions. geom_text() gives you labels but no overlap control. Use ggrepel::geom_text_repel() to automatically push labels away from each other and from the bubbles.
geom_text_repel() draws connecting lines from each label to its point when the label has been pushed away. The max.overlaps argument controls how aggressively labels are placed — increase it if some labels disappear.
Try it: Replace geom_text_repel() with plain geom_text(aes(label = rownames(mtcars)), size = 2.8) and see the difference. This is why ggrepel exists.
How do you handle overplotting in bubble charts?
When bubbles overlap, smaller ones get hidden behind larger ones. Two fixes work together:
- Alpha transparency — lets you see buried bubbles through the ones on top.
- Reorder by size descending — plot large bubbles first so small ones render on top.
The double geom_point() trick: the first call draws a white-bordered ghost (using color = "white" with low alpha), the second draws the filled colored bubble on top. The white border acts as a visual separator between touching bubbles — a classic bubble chart technique.
Try it: Remove the first geom_point() call (the white border layer) and compare. The effect is subtle but makes crowded regions much easier to read.
Complete Example: A Polished Bubble Chart
Here's a full production-ready bubble chart with proper sizing, labeling, theme, and annotation.
Common Mistakes and How to Fix Them
Mistake 1: Mapping size to radius instead of area
❌ Using scale_size() for counts makes large categories look disproportionately bigger.
✅ Use scale_size_area() when size represents a count or total.
Mistake 2: Not sorting data before plotting
❌ Plotting in default order buries small bubbles under large ones.
✅ Sort descending by size so large bubbles render first.
Mistake 3: Using geom_text() for labels on busy charts
❌ Plain geom_text() draws labels at exact coordinates, producing unreadable overlaps.
✅ Use ggrepel::geom_text_repel() to push labels apart automatically.
Mistake 4: Skipping transparency
Without alpha, overlapping bubbles are completely opaque — you lose the information underneath.
Mistake 5: Encoding too many variables
Four variables (x, y, size, color) is the comfortable maximum for a bubble chart. Adding a fifth (shape, label for every point, facet grid, AND color) creates cognitive overload. Simplify — pick the most important story.
Practice Exercises
Exercise 1: Gapminder-style bubble chart
Using the built-in LifeCycleSavings dataset (savings rate, per-capita disposable income, per-capita income growth, population), create a bubble chart with:
- x =
dpi(per-capita disposable income) - y =
sr(savings rate) - size =
pop75(population over 75, a proxy for aging) - Sorted so large bubbles are behind small ones
scale_size_area(max_size = 12)- A clean minimal theme with a descriptive title
Show solution
Exercise 2: Label the extremes
Extend Exercise 1 to label only the 5 countries with the highest pop75 value using ggrepel::geom_text_repel(). All other labels should be NA.
Show solution
Summary
| Task | Code |
|---|---|
| Basic bubble chart | geom_point(aes(size = var), alpha = 0.6) |
| Scale by area (perceptually correct) | scale_size_area(max_size = 14) |
| Scale by radius (manual range) | scale_size(range = c(2, 14)) |
| Add color dimension | aes(color = group_var) + scale_color_manual() |
| Label without overlap | ggrepel::geom_text_repel() |
| Fix overplotting | Sort data descending by size; use alpha; add white border layer |
| Selective labels | label = ifelse(condition, name, NA) + na.rm = TRUE |
When to use bubble charts:
- You have 3-4 numeric or mixed variables to show simultaneously
- The size variable has a natural zero (counts, totals, populations)
- Your audience can read size differences accurately (differences must be substantial — humans struggle to distinguish 10% size differences)
When NOT to use bubble charts:
- More than 20-30 data points (overplotting becomes unmanageable)
- Size differences are subtle (a bar chart communicates magnitude more precisely)
- All three variables matter equally for a decision (consider parallel coordinates instead)
FAQ
What is the difference between a bubble chart and a scatter plot? A scatter plot shows two variables (x and y). A bubble chart adds a third — the size of each point encodes a numeric variable. Color can add a fourth.
Should I use scale_size() or scale_size_area()? Use scale_size_area() when size represents a count or total (population, revenue, frequency) — it maps values to area, which is what we perceive. Use scale_size() when you need manual control over the visual range and perceptual accuracy matters less.
Why are some labels missing when I use geom_text_repel()? ggrepel gives up on placing labels that would overlap too much. Increase max.overlaps (e.g., max.overlaps = 30) or reduce the number of labels by setting most to NA and only labeling notable points.
How do I make bubble sizes appear in the legend correctly? Use guide_legend(override.aes = list(size = c(4, 8, 12))) inside scale_size_area() or scale_size() to manually set legend key sizes to representative values.
Can I create a 3D bubble chart in R? ggplot2 doesn't support true 3D. For interactive 3D, use the plotly package with plot_ly(type = "scatter3d"). For static, faceting or color coding is more readable than faked 3D perspective.
References
- Wickham H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer.
- ggrepel documentation: overlapping text labels for ggplot2
- R Graph Gallery — Bubble chart: r-graph-gallery.com/bubble-chart.html
- r-charts.com — Bubble chart in ggplot2
- Wilke C. (2019). Fundamentals of Data Visualization — Chapter 12: Visualizing associations
What's Next?
- ggplot2 Scatter Plots — the foundation: geom_point(), trend lines, overplotting, and annotations
- Heatmap in R — encode a matrix of values as a color grid with geom_tile()
- R Correlation Matrix Plot — visualize pairwise correlations with corrplot and ggplot2