dplyr Exercises: 15 Data Manipulation Practice Problems

Practice all core dplyr verbs with 15 exercises: filter, select, mutate, group_by, summarise, arrange, joins, and across. Each has an interactive solution you can run.

Easy (1-5)

Exercise 1: Filter and Select

Find all 4-cylinder cars with mpg > 25. Show mpg, cyl, hp, wt.

library(dplyr)


  
Click to reveal solution ```r
library(dplyr) mtcars |> filter(cyl == 4, mpg > 25) |> select(mpg, cyl, hp, wt)

  

Exercise 2: Mutate and Arrange

Add a power-to-weight column (hp/wt), sort descending.

library(dplyr)


  
Click to reveal solution ```r
library(dplyr) mtcars |> mutate(pwr = round(hp/wt, 1)) |> arrange(desc(pwr)) |> select(mpg, hp, wt, pwr) |> head(8)

  

Exercise 3: Group and Summarise

Mean mpg, mean hp, and count per cylinder group.

library(dplyr)


  
Click to reveal solution ```r
library(dplyr) mtcars |> group_by(cyl) |> summarise(n=n(), avg_mpg=round(mean(mpg),1), avg_hp=round(mean(hp),0), .groups="drop")

  

Exercise 4: Count with Proportion

Count iris flowers per Species, add percentage.

library(dplyr)


  
Click to reveal solution ```r
library(dplyr) iris |> count(Species) |> mutate(pct = round(n/sum(n)*100, 1))

  

Exercise 5: Rename to snake_case

Convert iris column names to snake_case, move Species first.

library(dplyr)


  
Click to reveal solution ```r
library(dplyr) iris |> rename_with(~ tolower(gsub("\\.", "_", .x))) |> select(species, everything()) |> head(4)

  

Medium (6-10)

Exercise 6: Above-Average Filter

Cars with above-average mpg AND below-average weight.

library(dplyr)


  
Click to reveal solution ```r
library(dplyr) mtcars |> filter(mpg > mean(mpg), wt < mean(wt)) |> select(mpg, hp, wt) |> arrange(desc(mpg))

  

Exercise 7: Grouped Ranking

Rank cars by mpg within each cyl group (best = 1). Show top 3 per group.

library(dplyr)


  
Click to reveal solution ```r
library(dplyr) mtcars |> mutate(car=rownames(mtcars)) |> group_by(cyl) |> mutate(rank=rank(-mpg)) |> filter(rank<=3) |> select(car,cyl,mpg,rank) |> arrange(cyl,rank) |> ungroup()

  

Exercise 8: across() Summary

Per Species: mean and sd of all numeric columns.

library(dplyr)


  
Click to reveal solution ```r
library(dplyr) iris |> group_by(Species) |> summarise(across(where(is.numeric), list(mean=~round(mean(.x),2), sd=~round(sd(.x),2)), .names="{.col}_{.fn}"), .groups="drop")

  

Exercise 9: case_when Categories

Label cars: mpg>25 "Economy", 15-25 "Standard", <15 "Gas Guzzler".

library(dplyr)


  
Click to reveal solution ```r
library(dplyr) mtcars |> mutate(type=case_when(mpg>25~"Economy", mpg>=15~"Standard", TRUE~"Gas Guzzler")) |> count(type, sort=TRUE)

  

Exercise 10: Join Two Tables

Join employees with departments. Find unassigned employees.

library(dplyr) employees <- data.frame(name=c("Alice","Bob","Carol","David"), dept=c("Eng","Mkt","Eng","HR")) departments <- data.frame(dept=c("Eng","Mkt","Sales"), budget=c(500,300,200))


  
Click to reveal solution ```r
library(dplyr) employees <- data.frame(name=c("Alice","Bob","Carol","David"), dept=c("Eng","Mkt","Eng","HR")) departments <- data.frame(dept=c("Eng","Mkt","Sales"), budget=c(500,300,200)) cat("All employees + budgets:\n") left_join(employees, departments, by="dept") cat("\nUnassigned:\n") anti_join(employees, departments, by="dept")

  

Hard (11-15)

Exercise 11: Top N Per Group

2 heaviest cars per cylinder group.

library(dplyr)


  
Click to reveal solution ```r
library(dplyr) mtcars |> mutate(car=rownames(mtcars)) |> group_by(cyl) |> slice_max(wt, n=2) |> select(car,cyl,wt,mpg) |> ungroup()

  

Exercise 12: Percentage of Group Total

Each car's hp as % of its cyl group's total hp.

library(dplyr)


  
Click to reveal solution ```r
library(dplyr) mtcars |> mutate(car=rownames(mtcars)) |> group_by(cyl) |> mutate(hp_pct=round(hp/sum(hp)*100,1)) |> select(car,cyl,hp,hp_pct) |> arrange(cyl,desc(hp_pct)) |> ungroup() |> head(10)

  

Exercise 13: Multi-Step Pipeline

Manual cars → add kpl → group by cyl → mean kpl → sort desc.

library(dplyr)


  
Click to reveal solution ```r
library(dplyr) mtcars |> filter(am==1) |> mutate(kpl=round(mpg*0.425,2)) |> group_by(cyl) |> summarise(n=n(), avg_kpl=round(mean(kpl),2), .groups="drop") |> arrange(desc(avg_kpl))

  

Exercise 14: Conditional Summary

Per iris Species: count, flag if mean Sepal.Length > 6.

library(dplyr)


  
Click to reveal solution ```r
library(dplyr) iris |> group_by(Species) |> summarise(n=n(), avg_sl=round(mean(Sepal.Length),2), long=mean(Sepal.Length)>6, .groups="drop")

  

Exercise 15: Stratified 30% Sample

Random 30% from each Species. Count per group.

library(dplyr) set.seed(42)


  
Click to reveal solution ```r
library(dplyr) set.seed(42) iris |> group_by(Species) |> slice_sample(prop=0.3) |> ungroup() |> count(Species)

  

What's Next?