ggplot2 Exercises: 15 Chart-Building Practice Problems (With Solutions)
Fifteen hands-on ggplot2 exercises covering scatter plots, bar charts, line charts, heatmaps, facets, themes, and more, each with a worked solution and runnable code.
Reading about ggplot2 is useful. Writing ggplot2 code without looking at notes is how you actually learn it. These 15 problems cover the full core ggplot2 toolkit, geoms, aesthetics, scales, facets, themes, and coordinate systems, using built-in R datasets so no data download is needed.
The 15 problems are grouped into three sections of five. Section 1 builds single-geom charts one concept at a time. Section 2 mixes facets, transformed scales, and combined geoms. Section 3 stitches multiple layers, custom themes, and annotations into the kind of polished charts you build on the job. Every problem ships with an expected result, two progressive hints, and a hidden solution with an explanation.
All code runs in one shared R session, so the setup block above loads the packages once and the exercises do not repeat it. Use ex_ prefixed names (already scaffolded) so you do not overwrite anything by accident. Every exercise uses only base R datasets or datasets bundled with ggplot2 (mtcars, iris, airquality, mpg, diamonds, economics, economics_long), so you can run them anywhere.
Section 1. Single-geom charts: scatter, bar, and line basics
These five problems build one chart with one or two geoms at a time.
Exercise 1.1: Basic scatter plot colored by group
Task: Using iris, create a scatter plot of Sepal.Length (x) versus Petal.Length (y), with points colored by Species. Set point size to 3 and transparency (alpha) to 0.7, add a manual color palette, axis labels, a title, and theme_minimal(). Save the plot object to ex_1_1.
Expected result:
A scatter plot of sepal length versus petal length showing three colored point clusters, one per iris species, clearly separated.
Difficulty: Beginner
The grouping variable that decides each point's color belongs inside the aesthetic mapping, not in the geom.
Put color = Species inside aes(), draw points with geom_point(size = 3, alpha = 0.7), then add scale_color_manual(), labs(), and theme_minimal().
Click to reveal solution
Explanation: Mapping color = Species inside aes() ties the color to the data, so ggplot2 draws one color per species and adds a legend automatically. scale_color_manual() overrides the default palette with three colorblind-safe hues. geom_point() arguments outside aes() (size, alpha) are fixed values, not data-driven. The three species form visibly separate clusters, which is why iris is the classic classification demo.
Exercise 1.2: Scatter plot with a regression trend line
Task: Using mtcars, create a scatter plot of wt (x) versus mpg (y), with points colored by cyl treated as a factor. Add a linear regression smooth line (method = "lm") with a 95% confidence band, a manual color palette, and labels. Save the plot object to ex_1_2.
Expected result:
A scatter plot of car weight versus miles per gallon with points colored by cylinder count and a straight downward-sloping regression line with a shaded confidence band.
Difficulty: Beginner
Treat the cylinder count as a discrete category before mapping it to color, and the trend line is its own layer added on top of the points.
Use color = factor(cyl) in aes(), then add geom_smooth(method = "lm", formula = y ~ x, se = TRUE).
Click to reveal solution
Explanation: factor(cyl) converts the numeric cylinder count into a discrete variable, so ggplot2 uses a categorical color scale instead of a continuous gradient. geom_smooth(method = "lm") fits an ordinary least-squares line; se = TRUE draws the shaded 95% confidence band. The line slopes downward because heavier cars get fewer miles per gallon.
Exercise 1.3: Ordered horizontal bar chart
Task: From the mpg dataset, compute the average highway MPG (hwy) for each manufacturer, then build a horizontal bar chart ordered from highest to lowest average MPG, all bars in a single color. Save the plot object to ex_1_3.
Expected result:
A horizontal bar chart of average highway MPG per manufacturer, with bars sorted longest at the top down to shortest.
Difficulty: Beginner
Summarise the data to one row per manufacturer first, and a bar chart only sorts cleanly if the category is converted to an ordered factor by the value you want to sort on.
Use group_by() + summarise(), then reorder(manufacturer, avg_hwy), plot with geom_col(), and flip to horizontal with coord_flip().
Click to reveal solution
Explanation: geom_col() draws bars whose heights are the values in the data, unlike geom_bar() which counts rows. reorder(manufacturer, avg_hwy) turns the manufacturer into an ordered factor sorted by average MPG, so the bars come out ranked. coord_flip() swaps the axes to make a horizontal chart, which keeps the manufacturer labels readable.
Exercise 1.4: Grouped bar chart with dodged bars
Task: From mtcars, build a grouped bar chart of average mpg for each combination of cyl (x-axis) and am (transmission, used for fill). Treat am as a factor labeled Automatic / Manual, and separate the grouped bars with position_dodge(). Save the plot object to ex_1_4.
Expected result:
A grouped bar chart with one pair of side-by-side bars (automatic and manual) for each cylinder count, comparing average MPG.
Difficulty: Beginner
The variable that splits each x-position into multiple bars must be a discrete factor mapped to fill, and the bars need to be told to sit next to each other rather than stack.
Map fill = am (as a factor), then set position = position_dodge(0.8) inside geom_col().
Click to reveal solution
Explanation: geom_col() defaults to stacking bars with the same x-value; position_dodge(0.8) places them side by side instead, so you can compare automatic against manual within each cylinder group. factor(am, labels = ...) swaps the raw 0/1 codes for readable labels. scale_fill_manual() assigns a fixed color to each transmission type.
Exercise 1.5: Time series line chart with a reference line
Task: From the economics dataset, plot unemploy (number unemployed) over date as a line chart. Shade the area below the line with geom_area(), add a dashed horizontal reference line at the mean of unemploy, annotate that line with its value, and format the y-axis in thousands. Save the plot object to ex_1_5.
Expected result:
A line chart of US unemployment over time with the area under the line shaded and a dashed horizontal line marking the long-run mean.
Difficulty: Beginner
A horizontal reference line is a separate layer that needs a single y-value, and the shaded fill below the line is also its own geom drawn before the line.
Add geom_area(alpha = 0.2) then geom_line(), and place the reference with geom_hline(yintercept = mean(economics$unemploy)).
Click to reveal solution
Explanation: geom_area() fills the region between the line and the x-axis; drawing it before geom_line() keeps the line crisp on top. geom_hline() adds a constant horizontal reference at yintercept. annotate() places a one-off text label at fixed coordinates without needing a data column. scale_y_continuous(labels = ...) reformats the axis tick text into a compact "K" notation.
Section 2. Facets, transformed scales, and combined geoms
These five problems combine two or more concepts: faceting, scale transforms, and layered geoms.
Exercise 2.1: Faceted line chart with a labeller
Task: From airquality, plot daily Temp (y) against Day (x) as a line, one panel per Month arranged in a single row with facet_wrap(). Replace the numeric month codes with month names using a labeller, and add a dashed LOESS smooth in each panel. Save the plot object to ex_2_1.
Expected result:
A row of five line-chart panels, one per month from May to September, each showing daily temperature with a dashed smoothed trend.
Difficulty: Intermediate
Each month should get its own panel, and the panels need readable names rather than the raw numeric codes.
Use facet_wrap(~ Month, nrow = 1) and pass labeller = labeller(Month = ...) with a named vector of month names.
Click to reveal solution
Explanation: facet_wrap(~ Month) splits the data into one panel per month; nrow = 1 forces them onto a single row for side-by-side comparison. The labeller argument with a named vector swaps the numeric facet titles for month abbreviations. geom_smooth(method = "loess") adds a locally weighted trend to each panel independently.
Exercise 2.2: Histogram with an overlaid density curve
Task: From diamonds, plot a histogram of price on a log10-transformed x-axis. Overlay a density curve, and put both on the same vertical scale by mapping the histogram y-aesthetic to after_stat(density). Save the plot object to ex_2_2.
Expected result:
A histogram of diamond price on a log10 x-axis with a smooth density curve overlaid on the same density scale.
Difficulty: Intermediate
A raw histogram counts rows while a density curve integrates to one, so the histogram has to be rescaled to density before the two can share an axis.
Map y = after_stat(density) inside geom_histogram(), add geom_density(), and transform the x-axis with scale_x_log10().
Click to reveal solution
Explanation: By default geom_histogram() plots counts on the y-axis, which dwarfs a density curve. Mapping y = after_stat(density) rescales the bars so their total area is 1, matching the density curve. scale_x_log10() compresses the heavily right-skewed price range so the distribution shape is visible. after_stat() is the modern replacement for the older ..density.. syntax.
Exercise 2.3: Boxplot with jittered points
Task: From iris, create a boxplot of Sepal.Width for each Species, then overlay the individual observations with geom_jitter(). Color by species with a custom palette and hide the legend. Save the plot object to ex_2_3.
Expected result:
A boxplot of sepal width per iris species with the raw data points jittered on top of each box.
Difficulty: Intermediate
Drawing the raw points on top of the boxes needs a second geom that spreads overlapping points horizontally so they do not pile up on one line.
Add geom_jitter(width = 0.15) after geom_boxplot(), and set outlier.shape = NA so outliers are not drawn twice.
Click to reveal solution
Explanation: geom_jitter() is geom_point() with a small random horizontal shift, which separates points that would otherwise stack on a single category line. Setting outlier.shape = NA on the boxplot stops outliers being plotted twice, once by the box and once by the jitter. The boxplot fill alpha = 0.3 keeps the points visible through the boxes. theme(legend.position = "none") drops the redundant legend since the x-axis already names the species.
Exercise 2.4: Heatmap with a sequential color scale
Task: From airquality, build a heatmap of average daily temperature, with Month on the rows and week-of-month on the columns. Compute the week as ceiling(Day / 7), summarise the mean Temp per month-and-week cell, and draw the cells with geom_tile() using a sequential blue gradient. Save the plot object to ex_2_4.
Expected result:
A grid heatmap with months as rows and weeks as columns, each cell shaded from light to dark blue by average temperature.
Difficulty: Intermediate
A heatmap needs the data pre-aggregated to one value per cell, then a geom that draws rectangles and a continuous fill scale that runs from light to dark.
Summarise to one row per month-week, draw with geom_tile(), and color with scale_fill_gradient(low = ..., high = ...).
Click to reveal solution
Explanation: geom_tile() draws a colored rectangle for every x-y combination, which is exactly what a heatmap is. The data must be aggregated first, here group_by() + summarise() collapses each month-week pair to one mean temperature. scale_fill_gradient() maps that continuous value onto a two-point color ramp. The factor() on Month_lab fixes the row order to calendar order rather than alphabetical.
Exercise 2.5: Faceted scatter plot with per-facet smoothing
Task: From mpg, create scatter plots of displ (x) versus hwy (y), faceted by drv (drive type) in a vertical layout using facet_grid(). Add a LOESS smooth to each facet and give the facets descriptive labels with a labeller. Save the plot object to ex_2_5.
Expected result:
Three stacked scatter-plot panels, one per drive type, each plotting engine displacement against highway MPG with its own smoothed trend.
Difficulty: Intermediate
To stack the panels vertically, the faceting formula must put the variable on the row side and leave the column side empty.
Use facet_grid(drv ~ .) and pass labeller = labeller(drv = ...) with a named vector of drive-type labels.
Click to reveal solution
Explanation: facet_grid(drv ~ .) lays panels out in a grid where rows are drv levels and the dot means "no column variable", giving a single vertical stack. Unlike facet_wrap(), facet_grid() keeps panels aligned on shared axes, which makes comparing the three drive types straightforward. The labeller swaps cryptic codes for readable labels.
Section 3. Layered, themed, and annotated plots
These five problems stitch multiple layers, custom themes, and annotations into the kind of polished charts you build on the job.
Exercise 3.1: Lollipop chart with a conditional color rule
Task: From mtcars, compute the average qsec (quarter-mile time) per cylinder count. Build a horizontal lollipop chart ordered by qsec, with each lollipop colored by whether its value is above or below the overall mean, plus a dashed reference line at that mean. Save the plot object to ex_3_1.
Expected result:
A horizontal lollipop chart of average quarter-mile time per cylinder count, with stems and dots colored by whether they sit above or below the mean reference line.
Difficulty: Advanced
A lollipop is a stem plus a dot, and the stem must run from the reference value to each data value rather than from zero.
Draw the stems with geom_segment() (from the mean to the value), add geom_point() for the heads, and flip with coord_flip().
Click to reveal solution
Explanation: A lollipop chart is a leaner alternative to a bar chart: geom_segment() draws the thin stem from the mean reference line to each value, and geom_point() caps it with a dot. Mapping color = above (a logical) splits the lollipops into two colors via scale_color_manual(). coord_flip() makes the chart horizontal so the category labels stay readable.
Exercise 3.2: Fully custom dark theme
Task: Recreate the iris scatter plot from Exercise 1.1, but apply a fully custom dark theme: a dark grey background (#2b2b2b), white text, no gridlines, white axis lines, and the legend at the bottom. Build the theme as a reusable theme() object. Save the plot object to ex_3_2.
Expected result:
The iris sepal-versus-petal scatter plot rendered on a dark grey background with white text, white axis lines, no gridlines, and the legend below the plot.
Difficulty: Advanced
Every visual element of a plot, backgrounds, text, axis lines, legend position, is controlled by one function that takes element specifications.
Build a theme() object setting panel.background, plot.background, panel.grid, axis.line, the text elements, and legend.position.
Click to reveal solution
Explanation: theme() controls every non-data element. Each argument takes an element_*() function: element_rect() for filled boxes, element_line() for lines, element_text() for text, and element_blank() to remove an element entirely. Assigning the result to dark_theme makes it a reusable object you can add to any plot, which is how you enforce a consistent house style across a report.
Exercise 3.3: Stacked area chart over time
Task: Filter economics_long to the unemploy and pop variables, then plot a stacked area chart of their normalized values (value01) over date, using geom_area() with position = "stack" and a two-color palette. Save the plot object to ex_3_3.
Expected result:
A stacked area chart over time with two colored bands, unemployment and population, layered on top of each other.
Difficulty: Advanced
A stacked area chart needs the data in long format with one column naming the series, and the bands have to be told to pile up rather than overlap.
Filter to the two variables, then geom_area(aes(fill = variable), position = "stack").
Click to reveal solution
Explanation: economics_long is the tidy (long) form of economics, with a variable column naming each series, which is the shape ggplot2 wants for a multi-series chart. geom_area(position = "stack") piles the bands on top of one another so the top edge shows the combined total. value01 is the pre-normalized 0-1 version of each series, which keeps the two bands on a comparable scale.
Exercise 3.4: Annotated scatter plot labeling extreme points
Task: From mtcars, create a scatter plot of wt versus mpg, label the three most and three least fuel-efficient cars with ggrepel::geom_text_repel(), and add a linear geom_smooth(). Build a label column that is the car name only for those six rows and NA everywhere else. Save the plot object to ex_3_4.
Expected result:
A scatter plot of car weight versus MPG with a regression line and text labels marking only the three most and three least efficient cars.
Difficulty: Advanced
Labeling only the extremes means most rows get no label, and the labels must be nudged apart so they do not overlap each other or the points.
Pick the extremes with slice_max() / slice_min(), build a label column with ifelse(), and place labels with geom_text_repel(na.rm = TRUE).
Click to reveal solution
Explanation: Labeling every point would clutter the chart, so a label column is built that holds the car name only for the six extreme rows and NA for the rest. geom_text_repel() from the ggrepel package draws those labels and automatically nudges them apart so they do not collide; na.rm = TRUE quietly skips the NA rows. box.padding controls how far labels are pushed off their points.
Exercise 3.5: Multi-layer violin, box, and jitter plot
Task: From airquality, build a polished chart of the distribution of Temp by Month. Layer three geoms: a violin plot for the distribution shape, a narrow white boxplot inside it, and jittered points on top. Color by month with a sequential palette, label it cleanly, and apply a custom theme. Save the plot object to ex_3_5.
Expected result:
One violin per month with a slim white boxplot nested inside and semi-transparent jittered points overlaid, showing the temperature distribution.
Difficulty: Advanced
Three geoms drawn over the same x-y mapping stack from back to front, so the widest shape goes down first and the narrowest detail last.
Layer geom_violin(), then a narrow geom_boxplot(width = 0.1), then geom_jitter(), all sharing aes(x = Month_f, y = Temp).
Click to reveal solution
Explanation: Layer order matters: the wide geom_violin() is drawn first as the background, the slim geom_boxplot(width = 0.1) sits inside it to show quartiles, and geom_jitter() adds the raw points on top. The violin shows the full distribution shape, the box gives the summary statistics, and the points show sample size and outliers, three views of the same data in one chart. scale_*_brewer() applies a ColorBrewer sequential palette and guide = "none" drops the redundant legend.
Summary
The 15 problems together exercise the full core ggplot2 toolkit, geoms, aesthetics, scales, facets, themes, and coordinate systems.
| # | Topic | Geoms / Concepts |
|---|---|---|
| 1.1 | Scatter plot | geom_point(), color aesthetics |
| 1.2 | Scatter + trend | geom_smooth(method = "lm") |
| 1.3 | Ordered bar chart | geom_col(), reorder(), coord_flip() |
| 1.4 | Grouped bar chart | position_dodge(), factor fill |
| 1.5 | Time series | geom_line(), geom_area(), geom_hline() |
| 2.1 | Faceted lines | facet_wrap(), labeller |
| 2.2 | Histogram + density | after_stat(density), geom_density() |
| 2.3 | Boxplot + jitter | geom_boxplot(), geom_jitter() |
| 2.4 | Heatmap | geom_tile(), scale_fill_gradient() |
| 2.5 | Faceted scatter | facet_grid(), per-facet LOESS |
| 3.1 | Lollipop chart | geom_segment(), geom_point() |
| 3.2 | Custom theme | theme() elements, dark background |
| 3.3 | Stacked area | geom_area(position = "stack") |
| 3.4 | Annotated scatter | geom_text_repel(), conditional labels |
| 3.5 | Multi-layer violin | geom_violin(), geom_boxplot(), geom_jitter() |
If you built Sections 1 and 2 without peeking, you are comfortable with everyday ggplot2. If you built Section 3 too, you are ready for layered, publication-grade charts.
FAQ
Q: Why do my points all show the same color even though I set color? You probably set color outside aes(). A fixed value like geom_point(color = "blue") paints every point blue. To color by a variable, the mapping must go inside aes(), as in aes(color = Species).
Q: What is the difference between geom_bar() and geom_col()? geom_bar() counts the rows in each x category and uses the count as the bar height. geom_col() uses a y-value you already computed. If you have summarised your data, use geom_col().
Q: When should I use facet_wrap() versus facet_grid()? facet_wrap() lays panels out in a flowing grid based on one variable and is best for many levels. facet_grid() arranges panels by one variable on rows and another on columns, keeping axes aligned, and is best for a true two-way layout.
Q: My geom_smooth() prints a message about the formula. How do I silence it? Pass the formula explicitly, for example geom_smooth(method = "lm", formula = y ~ x). Every solution above does this so the message never appears.
Q: How do I save a ggplot to a file? Use ggsave("plot.png", plot = ex_1_1, width = 8, height = 5, dpi = 300). If you omit the plot argument, ggsave() saves the most recently displayed plot.
References
- Wickham, H., ggplot2: Elegant Graphics for Data Analysis, 3rd Edition. Link
- ggplot2 documentation, function reference. Link
- Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G., R for Data Science, 2nd Edition. Chapter 1: Data visualization. Link
- Posit, Data visualization with ggplot2 cheatsheet. Link
- ggrepel documentation, repelling text labels. Link
Continue Learning
- ggplot2 Getting Started, the full ggplot2 grammar explained from the ground up
- ggplot2 Themes in R, master
theme()to create professional publication-ready plots - The Complete ggplot2 Tutorial, a top-to-bottom reference covering every chart type and customization
r-statistics.co · Verifiable credential · Public URL
This document certifies mastery of
ggplot2 (15 problems) Mastery
Every certificate has a public verification URL that proves the holder passed the assessment. Anyone with the link can confirm the recipient and date.
700 learners have earned this certificate