ggplot2 Log Scale in R: When & How to Transform Axes (with Examples)
A log scale compresses wide-range data so that multiplicative differences appear as equal visual steps. In ggplot2, scale_x_log10(), scale_y_log10(), and coord_trans() each apply log transformations differently.
Introduction
Real-world data often spans orders of magnitude. GDP ranges from millions to trillions. Diamond prices range from $300 to $19,000. Gene expression values can differ by a factor of 10,000. On a linear axis, small values get crushed into a thin band at the bottom while outliers stretch the axis.
A log scale fixes this by converting multiplicative relationships into additive ones. A tenfold increase always covers the same visual distance, whether you go from 10 to 100 or from 1,000 to 10,000. This makes patterns in skewed data visible.
In this tutorial, you will learn three ggplot2 approaches to log scales, when to pick each one, how to format labels so readers understand your axis, and how to handle the tricky case of zeros and negative values. All code runs directly in your browser.
When Should You Use a Log Scale?
Not every skewed dataset needs a log scale. Log scales are the right choice when the data has a multiplicative structure, not just when it looks skewed. Here is how to diagnose it.
A histogram tells you immediately if your data spans orders of magnitude. Let's check diamond prices from the built-in diamonds dataset.
The histogram shows a strong right skew. Most diamonds cost under $5,000, but some reach $18,000+. The tail stretches the axis so far that the shape of the bulk is hard to read.
Now apply a log scale to the x-axis. The distribution's shape becomes much clearer.
On the log scale, you can see that diamond prices are roughly log-normal. The bulk sits between $1,000 and $5,000, and the distribution is nearly symmetric after the transformation.
Use a log scale when your data shows any of these signs:
- Values span 2+ orders of magnitude (e.g., 10 to 10,000)
- The relationship is multiplicative (percentages, growth rates, ratios)
- A histogram shows a long right tail that compresses most data points
- You are comparing quantities on fundamentally different scales (e.g., countries by GDP)
Try it: Check whether airquality$Ozone (with NAs removed) is a good candidate for a log scale. Create a histogram and look for right skew and multi-order-of-magnitude spread.
Click to reveal solution
Explanation: The range spans about 2 orders of magnitude and the histogram is right-skewed, so a log scale would help visualization.
How Does scale_y_log10() Transform Your Plot?
The scale_y_log10() function transforms the data before any statistical calculations happen. This is the most common and usually the correct choice.
When you add scale_y_log10() to a plot, ggplot2 applies log10 to every y-value before computing statistics like geom_smooth(), geom_boxplot(), or stat_summary(). The statistical layers then operate in log-space.
Let's see this with a scatter plot of diamond price versus carat weight.
The smooth line fits a linear model in log-space. This means it captures the multiplicative relationship between carat and price. A straight line in log-space represents exponential growth in the original data.
scale_x_continuous(trans = "log10") and scale_y_continuous(trans = "log10"). The longer form gives you more control over breaks and labels.Notice that the y-axis labels still show the original dollar values (1000, 5000, 10000), not the log-transformed values (3, 3.7, 4). ggplot2 back-transforms the labels automatically, which keeps the axis readable.
Try it: Apply scale_x_log10() to plot carat on a log scale too. This double-log plot should make the relationship nearly linear.
Click to reveal solution
Explanation: On a log-log scale, a power-law relationship (price ~ carat^n) appears as a straight line.
How Does coord_trans() Differ from Scale Functions?
The coord_trans() function transforms the axes after statistical calculations. The statistics are computed on the original (untransformed) data, and only the visual positions are changed.
This distinction matters a lot when you use geom_smooth() or any stat layer. Let's compare the two approaches side by side.
The red line fits a linear model to log10(price) ~ carat. In log-space, this line is straight.
The blue line was fit to the raw price values (linear space), then the coordinate system curved it. The line bends because a linear fit in dollar-space becomes curved when displayed on a log axis.
scale_y_log10() when you want statistics computed in log-space.When should you use coord_trans() instead? It is useful when you want log-spaced visual positions but need the statistical computations to remain in the original units. For example, geom_bar() stacking and position_dodge() work more predictably with coord_trans().
Here is a quick decision rule:
| Scenario | Use |
|---|---|
| Scatter + smooth in log-space | scale_y_log10() |
| Bar chart with log-spaced axis | coord_trans(y = "log10") |
| Boxplot comparing groups on log scale | scale_y_log10() |
| Need original-unit axis labels | Both work (auto back-transform) |
Try it: Create a scatter of carat vs price with coord_trans() applied to both x and y axes. Compare how it looks to the scale version.
Click to reveal solution
Explanation: The point positions look the same because both methods apply log10 to the coordinates. The difference only shows when you add statistical layers.
How Do You Label Log-Scaled Axes Clearly?
Default log-scale labels in ggplot2 show the original values (100, 1000, 10000), which is usually fine. But sometimes you need exponent notation, formatted numbers, or custom breaks to make the axis easier to read.
The scales package provides labeling functions that pair perfectly with log scales. Let's start with exponent notation using label_log().
The axis now shows 10^2, 10^3, and 10^4. This is standard for scientific publications where readers expect powers-of-ten notation.
For a general audience, comma-formatted labels are more readable. Use label_comma() or label_dollar() to keep the original units.
The y-axis shows $500, $1,000, $2,000, $5,000, $10,000. The spacing between labels reflects the log scale, but the numbers are in familiar dollar format.
For log scales, adding minor tick marks between the major gridlines helps readers estimate intermediate values. Use guide_axis_logticks() to add these.
The minor ticks show the characteristic log-scale pattern: tightly spaced near the top of each decade, wider near the bottom. This visual cue tells readers the axis is not linear.
Try it: Create a scatter plot of diamonds price vs carat with a log10 y-axis. Format the y-axis labels as dollars and set breaks at 300, 1000, 3000, and 10000.
Click to reveal solution
Explanation: label_dollar() adds the "$" prefix and comma separators. The breaks argument controls which values get labeled.
How Do You Handle Zeros and Negative Values on a Log Scale?
Log transformations have a mathematical limitation: log(0) is negative infinity and log of a negative number is undefined. When your data contains zeros or negative values, scale_y_log10() silently removes those points and prints a warning.
This is a common problem with count data, where zero counts are meaningful. Let's see what happens.
Categories A and E have zero counts, so they disappear entirely. The warning message "Transformation introduced infinite values" tells you what happened, but it is easy to miss.
The pseudo_log_trans() function from the scales package solves this. It behaves like a log scale for large values but transitions smoothly to a linear scale near zero.
Now all eight categories are visible, including the zeros. The axis transitions smoothly from linear near zero to logarithmic for larger values.
sum(data$value == 0) to count them.Another approach is to add a small constant before taking the log. This is called the "log plus one" or log1p transformation.
Adding 1 shifts all values up so that zeros become 1, and log10(1) = 0. This is simple but changes the interpretation: the axis no longer shows the true counts.
| Method | Handles zeros? | Handles negatives? | Axis interpretation |
|---|---|---|---|
scale_y_log10() |
No (drops them) | No | True log10 values |
pseudo_log_trans() |
Yes | Yes | Approximate log, linear near 0 |
log(value + 1) |
Yes | No | Shifted values |
Try it: Create a bar chart of df_zeros using pseudo_log_trans(base = 10) and add label_comma() to the y-axis so the labels show original counts.
Click to reveal solution
Explanation: pseudo_log_trans() keeps zeros visible, and label_comma() formats the axis labels as readable numbers.
Common Mistakes and How to Fix Them
Mistake 1: Using coord_trans when you want log-space statistics
When you add a smooth line with coord_trans(), the model fits linear data, not log-transformed data. The curve you see is misleading.
Wrong:
Why it is wrong: The linear model was fit to raw prices (not log-prices). The line curves only because of the coordinate warp, not because the model learned a log relationship.
Correct:
Mistake 2: Forgetting that zeros produce warnings and blank points
When data contains zeros and you use scale_y_log10(), ggplot2 silently removes those points.
Wrong:
Why it is wrong: Two data points vanish without an obvious visual cue. Readers see three points instead of five.
Correct:
Mistake 3: Using xlim() or ylim() to set limits on a log-scaled plot
The xlim() and ylim() functions replace the scale, which removes scale_y_log10(). Use coord_cartesian() or set limits inside the scale function.
Wrong:
Why it is wrong: ylim() creates a new linear scale that replaces your scale_y_log10(). You lose the log transformation entirely.
Correct:
Mistake 4: Mixing log10 and natural log without realizing
scale_y_log10() uses base 10, but log_trans() defaults to natural log (base e). Mixing them leads to confusing axis labels.
Wrong:
Why it is wrong: The labels show values like 403, 1097, 2981 which are not round numbers in base 10. Readers expect 100, 1000, 10000 on a "log scale" axis.
Correct:
Practice Exercises
Exercise 1: Full scatter plot with log axes and professional labels
Create a scatter plot of diamonds using carat on the x-axis and price on the y-axis. Apply log10 scales to both axes. Add a linear smooth line. Format the y-axis with dollar labels and the x-axis with label_number(). Add log tick marks to the y-axis using guide_axis_logticks().
Click to reveal solution
Explanation: scale_x_log10() and scale_y_log10() transform both axes. label_dollar() formats the y-axis as currency. guide_axis_logticks() adds minor tick marks showing the log spacing. The linear smooth in log-log space captures the power-law relationship.
Exercise 2: Visualize data with zeros using pseudo-log and compare
Create a data frame with 10 categories and counts that include at least two zeros and values ranging from 0 to 50,000. Make two bar charts side by side: one with scale_y_log10() (showing the zero problem) and one with pseudo_log_trans() (showing the fix). Give each plot an informative title.
Click to reveal solution
Explanation: scale_y_log10() drops the three zero-count bars because log10(0) is undefined. pseudo_log_trans() transitions smoothly near zero, keeping all bars visible.
Putting It All Together
Here is a complete, polished example that starts with raw data, diagnoses whether a log scale is needed, applies it, and formats the output for publication.
The price range spans a factor of 58, nearly two orders of magnitude. A log scale will help.
Each cut quality follows a roughly parallel trend on the log-log scale. This means the price-carat power-law relationship holds across all cut grades, with better cuts sitting at a higher intercept.
Summary
| Method | When to use | Effect on stats | Handles zeros? |
|---|---|---|---|
scale_y_log10() |
Most common choice. Scatter, boxplot, histogram with log-space stats | Transforms before stat layers | No |
scale_y_continuous(trans = "log10") |
Same as above, with more control over breaks/labels | Same as scale_y_log10 | No |
coord_trans(y = "log10") |
Bar charts, stacked geoms, when stats must stay in original units | Visual only, stats unchanged | No |
pseudo_log_trans() |
Data with zeros or negative values | Linear near 0, log for large values | Yes |
log(value + 1) |
Quick fix for zeros (shifts interpretation) | Manual pre-transformation | Zeros only |
Key takeaways:
- Use a log scale when data spans 2+ orders of magnitude or has multiplicative structure
scale_y_log10()is the default choice. It transforms data before statistics are calculated.coord_trans()only changes the visual coordinates. Statistics are computed on raw data.- Always check for zeros before applying a log scale. Use
pseudo_log_trans()if zeros matter. - Format labels with
label_dollar(),label_comma(), orlabel_log()depending on your audience - Add
guide_axis_logticks()for minor tick marks that signal "this is a log axis"
FAQ
Can I use log2 or natural log instead of log10?
Yes. Use scale_y_continuous(trans = "log2") for base-2 or scale_y_continuous(trans = "log") for natural log (base e). You can also use log_trans(base = 2) from the scales package. Base-10 is the most common because readers intuit powers of 10 easily.
What happens to zero values with scale_y_log10()?
ggplot2 removes them and prints the warning "Transformation introduced infinite values in continuous y transformation." The points vanish from the plot. Use pseudo_log_trans() or add a small constant to keep them visible.
Should I log-transform the data column or the axis?
Transform the axis with scale_y_log10(), not the data. Axis transformation keeps the original values in your data frame and shows readable back-transformed labels. If you use mutate(log_price = log10(price)), the axis labels show log values (2, 3, 4) instead of dollars.
How do I add minor grid lines on a log scale?
Add guide_axis_logticks() via the guides() function: guides(y = guide_axis_logticks()). For minor grid lines specifically, set minor_breaks in the scale function: scale_y_log10(minor_breaks = c(200, 500, 2000, 5000)).
References
- Wickham, H. — ggplot2: Elegant Graphics for Data Analysis, 3rd Edition. Springer (2024). Chapter 15: Scales. Link
- ggplot2 documentation —
scale_continuous()reference. Link - ggplot2 documentation —
coord_trans()reference. Link - ggplot2 documentation —
guide_axis_logticks()reference. Link - scales package documentation — transformation functions. Link
- Heiss, A. — "How to use natural and base 10 log scales in ggplot2" (2022). Link
What's Next?
- ggplot2 Scales — The full reference on controlling axes, colors, sizes, and all scale types in ggplot2
- ggplot2 Themes — Customize fonts, colors, grid lines, and overall plot appearance after setting up your scales