ggplot2 position_jitter() in R: Reduce Point Overplotting
The position_jitter() function in ggplot2 adds controlled random noise to a geom's x and y coordinates so overlapping points become visible. Pass it to any geom's position = argument to spread points without changing their visual meaning.
geom_point(position = position_jitter()) # default jitter geom_point(position = position_jitter(width = 0.2, height = 0)) # x only geom_point(position = position_jitter(width = 0, height = 0.1)) # y only geom_point(position = position_jitter(seed = 42)) # reproducible geom_boxplot() + geom_point(position = position_jitter(width = 0.1)) # box + raw geom_point(position = position_jitterdodge(jitter.width = 0.1)) # grouped geom_point(position = "jitter") # string shortcut
Need explanation? Read on for examples and pitfalls.
What position_jitter() does in one sentence
position_jitter() is a position adjustment that adds uniform random noise to x and y, so it works with any geom rather than only with points. It is a tool you hand to a geom through position =, not a layer you add on its own.
The default geom_point() plots exact coordinates, which hides overlapping observations when many points share the same x or y value. position_jitter() perturbs each coordinate independently so the cloud spreads out and the density becomes visible.
Syntax
The function takes three arguments: width, height, and seed. All control how the noise is generated.
The full signature is:
position_jitter(width = NULL, height = NULL, seed = NA)
position_jitterdodge(jitter.width = NULL, jitter.height = 0,
dodge.width = 0.75, seed = NA)
width: amount of horizontal noise, in x-axis units. Defaults to 40 percent of the resolution of the data; set to0to disable.height: amount of vertical noise, in y-axis units. Same default rule as width.seed: integer for reproducibility. Without a seed, every render produces a slightly different cloud.
seed whenever the plot ships in a report. Default jitter changes on every redraw, which causes pointless visual diffs in version control and confuses readers comparing two saved images of the same data.Six common patterns
1. Plain jitter on a discrete x axis
With seven car classes on the x axis and many cars per class, plain geom_point() would draw vertical stripes of overlapping dots. A small horizontal jitter (width = 0.2) reveals each class's density and shape.
2. Jitter only the y axis
cyl only takes four discrete values (4, 5, 6, 8), so points stack on those horizontal lines. Jittering height only spreads them vertically without distorting the continuous displ axis.
3. Reproducible jitter with seed
Running this code twice produces the exact same cloud. Without seed, the cloud changes randomly every time, which makes side-by-side comparisons unreliable.
4. Box plot with raw points overlaid
A box plot shows the summary; the jittered points show the raw distribution. Hiding outliers on the box (outlier.shape = NA) avoids drawing them twice, since the jitter layer already includes them.
5. Grouped categories with position_jitterdodge
position_jitterdodge() first dodges the groups (here by drv) and then jitters within each dodged slot. Plain position_jitter would mix the three drivetrains into one cloud per class.
6. The string shortcut
Passing position = "jitter" is shorthand for position_jitter() with default width and height. Quick to type, but you give up control over the exact noise amount and the seed.
mpg never change. The position adjustment exists purely at the rendering layer, which is why downstream geoms in the same plot still see the original values.position_jitter vs alternatives
Pick the position adjustment that matches the question your plot answers.
| Adjustment | Effect | Use when |
|---|---|---|
position_jitter() |
Random noise on x and y | Discrete or rounded data overlaps |
geom_jitter() |
Shortcut layer wrapping the above | One-off scatter, no other geoms |
position_jitterdodge() |
Dodge then jitter | Grouped points by fill or color |
position_dodge() |
Side-by-side shift | Compare groups without noise |
position_stack() |
Stack vertically | Composition of a total |
geom_point(alpha = 0.3) |
Transparency, no noise | Continuous data, very dense |
geom_hex() or geom_bin2d() |
Density bins | Tens of thousands of points |
Decision rule: if you want raw observations on a discrete axis, reach for position_jitter. If you want them inside groups, reach for position_jitterdodge. If you have continuous overplotting, prefer transparency or 2D binning over noise.
geom_jitter() is just geom_point(position = position_jitter(...)). They are the same operation. Use geom_jitter() for terse code and position_jitter() when you need it on a non-point geom such as geom_text() or geom_segment().Common pitfalls
Pitfall 1: jittering continuous data. If both x and y are continuous and merely dense, noise hides structure rather than reveals it. Use alpha, geom_hex(), or geom_density_2d() instead.
Pitfall 2: too much width or height. Defaults are 40 percent of the data resolution. On a discrete x axis where points sit on integer positions, that means roughly 0.4 of horizontal spread, enough to bleed across category boundaries. Set width explicitly (0.1 to 0.25 is usually safe).
Pitfall 3: forgetting seed in published plots. Every redraw shuffles the noise. A reviewer comparing yesterday's PNG to today's PNG will see "changes" that are pure RNG. Fix this by passing a seed.
Try it yourself
Try it: Plot mpg highway mileage by class with a transparent boxplot underneath and jittered points on top, using a width of 0.15 and seed 7. Save the plot to ex_jitter.
Click to reveal solution
Explanation: Hiding the boxplot outliers prevents double-plotting, and the fixed seed = 7 makes the cloud reproducible across renders.
Related ggplot2 functions
After mastering position_jitter, look at:
position_jitterdodge(): dodge by group, then jitter inside each groupposition_dodge(): side-by-side groups without noiseposition_stack(): stack bars or areas on top of each othergeom_jitter(): the convenience layer for jittered pointsgeom_hex(): density bins for continuous overplotting
See the official reference at ggplot2.tidyverse.org for the full signature.
FAQ
What does position_jitter do in ggplot2?
position_jitter() adds uniform random noise to a geom's x and y coordinates so overlapping observations spread apart. It is a position adjustment, meaning you pass it through a geom's position = argument. The underlying data is unchanged; only the rendered positions move. Use it for discrete or rounded data where many points share the same coordinates and would otherwise overplot.
What is the difference between geom_jitter and position_jitter?
geom_jitter() is a convenience wrapper for geom_point(position = position_jitter(...)). They produce identical output for scatter plots. Use geom_jitter() when the layer is points alone. Use position_jitter() when you want jittered text labels, segments, error bars, or any non-point geom, since those layers do not have their own geom_jitter equivalent.
How do I set the jitter width and height?
Pass width and height to position_jitter(): e.g. position_jitter(width = 0.2, height = 0). Values are in x-axis and y-axis units, not pixels. For a discrete axis where categories sit at integer positions, width = 0.2 keeps points well inside their slot. Set either argument to 0 to disable noise on that dimension.
Why does my jittered plot change every time?
position_jitter() draws fresh random numbers on every render unless you pass a seed. Set seed = 42 (or any integer) to lock the cloud in place, which is essential for reports, screenshots, and version-controlled plots. The seed argument also exists on position_jitterdodge().
Can I use position_jitter with geom_boxplot or geom_violin?
Not directly on the boxplot or violin itself, since those geoms compute summaries and have no need to jitter. The common pattern is to add a separate geom_point(position = position_jitter(...)) layer on top of the boxplot or violin. That gives you both the summary geom and the raw observations in one plot.