dplyr Exercises: 15 Data Manipulation Practice Problems
Practice all core dplyr verbs with 15 exercises: filter, select, mutate, group_by, summarise, arrange, joins, and across. Each has an interactive solution you can run.
Easy (1-5)
Exercise 1: Filter and Select
Find all 4-cylinder cars with mpg > 25. Show mpg, cyl, hp, wt.
library(dplyr)
▶ Run
↺ Reset
Click to reveal solution
```r
library(dplyr)
mtcars |> filter(cyl == 4, mpg > 25) |> select(mpg, cyl, hp, wt)
▶ Run
↺ Reset
Exercise 2: Mutate and Arrange
Add a power-to-weight column (hp/wt), sort descending.
library(dplyr)
▶ Run
↺ Reset
Click to reveal solution
```r
library(dplyr)
mtcars |> mutate(pwr = round(hp/wt, 1)) |> arrange(desc(pwr)) |> select(mpg, hp, wt, pwr) |> head(8)
▶ Run
↺ Reset
Exercise 3: Group and Summarise
Mean mpg, mean hp, and count per cylinder group.
library(dplyr)
▶ Run
↺ Reset
Click to reveal solution
```r
library(dplyr)
mtcars |> group_by(cyl) |> summarise(n=n(), avg_mpg=round(mean(mpg),1), avg_hp=round(mean(hp),0), .groups="drop")
▶ Run
↺ Reset
Exercise 4: Count with Proportion
Count iris flowers per Species, add percentage.
library(dplyr)
▶ Run
↺ Reset
Click to reveal solution
```r
library(dplyr)
iris |> count(Species) |> mutate(pct = round(n/sum(n)*100, 1))
▶ Run
↺ Reset
Exercise 5: Rename to snake_case
Convert iris column names to snake_case, move Species first.
library(dplyr)
▶ Run
↺ Reset
Click to reveal solution
```r
library(dplyr)
iris |> rename_with(~ tolower(gsub("\\.", "_", .x))) |> select(species, everything()) |> head(4)
▶ Run
↺ Reset
Medium (6-10)
Exercise 6: Above-Average Filter
Cars with above-average mpg AND below-average weight.
library(dplyr)
▶ Run
↺ Reset
Click to reveal solution
```r
library(dplyr)
mtcars |> filter(mpg > mean(mpg), wt < mean(wt)) |> select(mpg, hp, wt) |> arrange(desc(mpg))
▶ Run
↺ Reset
Exercise 7: Grouped Ranking
Rank cars by mpg within each cyl group (best = 1). Show top 3 per group.
library(dplyr)
▶ Run
↺ Reset
Click to reveal solution
```r
library(dplyr)
mtcars |> mutate(car=rownames(mtcars)) |> group_by(cyl) |> mutate(rank=rank(-mpg)) |> filter(rank<=3) |> select(car,cyl,mpg,rank) |> arrange(cyl,rank) |> ungroup()
▶ Run
↺ Reset
Exercise 8: across() Summary
Per Species: mean and sd of all numeric columns.
library(dplyr)
▶ Run
↺ Reset
Click to reveal solution
```r
library(dplyr)
iris |> group_by(Species) |> summarise(across(where(is.numeric), list(mean=~round(mean(.x),2), sd=~round(sd(.x),2)), .names="{.col}_{.fn}"), .groups="drop")
▶ Run
↺ Reset
Exercise 9: case_when Categories
Label cars: mpg>25 "Economy", 15-25 "Standard", <15 "Gas Guzzler".
library(dplyr)
▶ Run
↺ Reset
Click to reveal solution
```r
library(dplyr)
mtcars |> mutate(type=case_when(mpg>25~"Economy", mpg>=15~"Standard", TRUE~"Gas Guzzler")) |> count(type, sort=TRUE)
▶ Run
↺ Reset
Exercise 10: Join Two Tables
Join employees with departments. Find unassigned employees.
library(dplyr)
employees <- data.frame(name=c("Alice","Bob","Carol","David"), dept=c("Eng","Mkt","Eng","HR"))
departments <- data.frame(dept=c("Eng","Mkt","Sales"), budget=c(500,300,200))
▶ Run
↺ Reset
Click to reveal solution
```r
library(dplyr)
employees <- data.frame(name=c("Alice","Bob","Carol","David"), dept=c("Eng","Mkt","Eng","HR"))
departments <- data.frame(dept=c("Eng","Mkt","Sales"), budget=c(500,300,200))
cat("All employees + budgets:\n")
left_join(employees, departments, by="dept")
cat("\nUnassigned:\n")
anti_join(employees, departments, by="dept")
▶ Run
↺ Reset
Hard (11-15)
Exercise 11: Top N Per Group
2 heaviest cars per cylinder group.
library(dplyr)
▶ Run
↺ Reset
Click to reveal solution
```r
library(dplyr)
mtcars |> mutate(car=rownames(mtcars)) |> group_by(cyl) |> slice_max(wt, n=2) |> select(car,cyl,wt,mpg) |> ungroup()
▶ Run
↺ Reset
Exercise 12: Percentage of Group Total
Each car's hp as % of its cyl group's total hp.
library(dplyr)
▶ Run
↺ Reset
Click to reveal solution
```r
library(dplyr)
mtcars |> mutate(car=rownames(mtcars)) |> group_by(cyl) |> mutate(hp_pct=round(hp/sum(hp)*100,1)) |> select(car,cyl,hp,hp_pct) |> arrange(cyl,desc(hp_pct)) |> ungroup() |> head(10)
▶ Run
↺ Reset
Exercise 13: Multi-Step Pipeline
Manual cars → add kpl → group by cyl → mean kpl → sort desc.
library(dplyr)
▶ Run
↺ Reset
Click to reveal solution
```r
library(dplyr)
mtcars |> filter(am==1) |> mutate(kpl=round(mpg*0.425,2)) |> group_by(cyl) |> summarise(n=n(), avg_kpl=round(mean(kpl),2), .groups="drop") |> arrange(desc(avg_kpl))
▶ Run
↺ Reset
Exercise 14: Conditional Summary
Per iris Species: count, flag if mean Sepal.Length > 6.
library(dplyr)
▶ Run
↺ Reset
Click to reveal solution
```r
library(dplyr)
iris |> group_by(Species) |> summarise(n=n(), avg_sl=round(mean(Sepal.Length),2), long=mean(Sepal.Length)>6, .groups="drop")
▶ Run
↺ Reset
Exercise 15: Stratified 30% Sample
Random 30% from each Species. Count per group.
library(dplyr)
set.seed(42)
▶ Run
↺ Reset
Click to reveal solution
```r
library(dplyr)
set.seed(42)
iris |> group_by(Species) |> slice_sample(prop=0.3) |> ungroup() |> count(Species)
▶ Run
↺ Reset
What's Next?