R for Data Science Exercises: 50 Practice Problems

Fifty practice problems mapped to R for Data Science topics: visualize, transform, tidy, import, strings, dates, iteration, models. Hidden solutions.

RRun this once before any exercise
library(tidyverse) library(broom)

  

Section 1. Visualize (8 problems)

Exercise 1.1: Scatter plot

Difficulty: Beginner. Scatter of mpg dataset displ vs hwy.

Show solution
RInteractive R
ggplot(mpg, aes(displ, hwy)) + geom_point()

  

Exercise 1.2: Color by class

Difficulty: Beginner. Same scatter, colored by class.

Show solution
RInteractive R
ggplot(mpg, aes(displ, hwy, color = class)) + geom_point()

  

Exercise 1.3: Facet by drv

Difficulty: Intermediate. Same scatter, facet by drv.

Show solution
RInteractive R
ggplot(mpg, aes(displ, hwy)) + geom_point() + facet_wrap(~ drv)

  

Exercise 1.4: Add smoother

Difficulty: Intermediate. Add a smoother (default) to the displ-vs-hwy scatter.

Show solution
RInteractive R
ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth()

  

Exercise 1.5: Bar chart of cut

Difficulty: Beginner. Bar chart of diamonds$cut counts.

Show solution
RInteractive R
ggplot(diamonds, aes(cut)) + geom_bar()

  

Exercise 1.6: Histogram with binwidth

Difficulty: Intermediate. Histogram of diamonds$carat with binwidth = 0.1.

Show solution
RInteractive R
ggplot(diamonds, aes(carat)) + geom_histogram(binwidth = 0.1)

  

Exercise 1.7: Position dodge for grouped bars

Difficulty: Intermediate. Bar of diamonds cut with fill = clarity, dodged side by side.

Show solution
RInteractive R
ggplot(diamonds, aes(cut, fill = clarity)) + geom_bar(position = "dodge")

  

Exercise 1.8: coord_flip for horizontal bars

Difficulty: Intermediate. Make the diamonds$cut bar chart horizontal.

Show solution
RInteractive R
ggplot(diamonds, aes(cut)) + geom_bar() + coord_flip()

  

Section 2. Transform (8 problems)

Exercise 2.1: Filter

Difficulty: Beginner. mtcars rows with mpg > 25.

Show solution
RInteractive R
mtcars |> filter(mpg > 25)

  

Exercise 2.2: Arrange

Difficulty: Beginner. Sort mtcars by hp descending.

Show solution
RInteractive R
mtcars |> arrange(desc(hp))

  

Exercise 2.3: Select with helpers

Difficulty: Intermediate. Select all iris columns starting with "Sepal".

Show solution
RInteractive R
iris |> select(starts_with("Sepal"))

  

Exercise 2.4: Mutate

Difficulty: Beginner. Add kpl = mpg * 0.425 to mtcars.

Show solution
RInteractive R
mtcars |> mutate(kpl = mpg * 0.425)

  

Exercise 2.5: Summarise per group

Difficulty: Intermediate. Mean and SD of mpg per cyl.

Show solution
RInteractive R
mtcars |> group_by(cyl) |> summarise(mean = mean(mpg), sd = sd(mpg))

  

Exercise 2.6: Count

Difficulty: Beginner. Count diamonds per cut.

Show solution
RInteractive R
diamonds |> count(cut)

  

Exercise 2.7: case_when

Difficulty: Intermediate. Bin diamonds price into "budget", "mid", "premium".

Show solution
RInteractive R
diamonds |> mutate(tier = case_when( price < 1000 ~ "budget", price < 5000 ~ "mid", TRUE ~ "premium" ))

  

Exercise 2.8: Pipeline

Difficulty: Intermediate. Filter ideal-cut, group by clarity, summarise mean price, sort desc.

Show solution
RInteractive R
diamonds |> filter(cut == "Ideal") |> group_by(clarity) |> summarise(mean_price = mean(price)) |> arrange(desc(mean_price))

  

Section 3. Tidy (8 problems)

Exercise 3.1: pivot_longer

Difficulty: Beginner. Pivot a wide quarterly table to long.

Show solution
RInteractive R
wide <- tibble(region = c("US","EU"), Q1 = c(100,80), Q2 = c(120,90)) wide |> pivot_longer(Q1:Q2, names_to = "quarter", values_to = "sales")

  

Exercise 3.2: pivot_wider

Difficulty: Beginner. Pivot a long table back to wide.

Show solution
RInteractive R
long <- tibble(region = c("US","US","EU","EU"), quarter = c("Q1","Q2","Q1","Q2"), sales = c(100,120,80,90)) long |> pivot_wider(names_from = quarter, values_from = sales)

  

Exercise 3.3: separate

Difficulty: Intermediate. Split "Last, First" into two columns.

Show solution
RInteractive R
tibble(name = c("Smith, Alice","Jones, Bob")) |> separate_wider_delim(name, delim = ", ", names = c("last","first"))

  

Exercise 3.4: unite

Difficulty: Intermediate. Combine year, month, day to ISO date string.

Show solution
RInteractive R
tibble(year = 2024, month = "01", day = "15") |> unite("date", year, month, day, sep = "-")

  

Exercise 3.5: drop_na

Difficulty: Beginner. Drop airquality rows where Ozone is NA.

Show solution
RInteractive R
airquality |> drop_na(Ozone)

  

Exercise 3.6: fill

Difficulty: Intermediate. Carry forward NA in a region column.

Show solution
RInteractive R
tibble(region = c("US",NA,NA,"EU"), value = 1:4) |> fill(region)

  

Exercise 3.7: complete

Difficulty: Intermediate. Add missing month combinations with sales = 0.

Show solution
RInteractive R
df <- tibble(region = c("US","EU"), month = c(1,1), sales = c(100,80)) df |> complete(region, month = 1:3, fill = list(sales = 0))

  

Exercise 3.8: nest

Difficulty: Advanced. Nest iris by Species.

Show solution
RInteractive R
iris |> group_by(Species) |> nest()

  

Section 4. Import & wrangle (6 problems)

Exercise 4.1: Read a CSV

Difficulty: Beginner. Read a small CSV with read_csv.

Show solution
RInteractive R
write_csv(mtcars, "demo.csv") df <- read_csv("demo.csv") head(df)

  

Exercise 4.2: Custom NA values

Difficulty: Intermediate. Read CSV treating "N/A" and "" as NA.

Show solution
RInteractive R
read_csv("demo.csv", na = c("","NA","N/A"))

  

Exercise 4.3: Force a column type

Difficulty: Intermediate. Read CSV forcing id to character.

Show solution
RInteractive R
read_csv("demo.csv", col_types = cols(id = col_character()))

  

Exercise 4.4: Write CSV

Difficulty: Beginner. Write mtcars to CSV.

Show solution
RInteractive R
write_csv(mtcars, "out.csv")

  

Exercise 4.5: parse_number

Difficulty: Intermediate. Convert "$1,234.50" to 1234.5.

Show solution
RInteractive R
parse_number("$1,234.50")

  

Exercise 4.6: Read multiple files

Difficulty: Advanced. Read 3 CSVs into one tibble with map_dfr.

Show solution
RInteractive R
files <- c("a.csv","b.csv","c.csv") combined <- map_dfr(files, read_csv, .id = "source")

  

Section 5. Strings & dates (6 problems)

Exercise 5.1: Detect substring

Difficulty: Beginner. Filter emails containing "gmail.com".

Show solution
RInteractive R
emails <- c("a@gmail.com","b@yahoo.com","c@gmail.com") emails[str_detect(emails, "gmail.com")]

  

Exercise 5.2: Replace by regex

Difficulty: Intermediate. Strip non-digits from "(415) 555-1234".

Show solution
RInteractive R
str_replace_all("(415) 555-1234", "\\D", "")

  

Exercise 5.3: Pad with zeros

Difficulty: Beginner. Format ID 42 as "000042".

Show solution
RInteractive R
str_pad("42", width = 6, pad = "0")

  

Exercise 5.4: Parse mixed dates

Difficulty: Intermediate. Parse "01/15/2024" and "2024-02-20" together.

Show solution
RInteractive R
parse_date_time(c("01/15/2024","2024-02-20"), orders = c("mdy","ymd"))

  

Exercise 5.5: Extract month

Difficulty: Beginner. Get month name from a date.

Show solution
RInteractive R
month(as.Date("2024-04-15"), label = TRUE, abbr = FALSE)

  

Exercise 5.6: Floor to month start

Difficulty: Intermediate. Snap dates to month-start.

Show solution
RInteractive R
floor_date(as.Date(c("2024-01-15","2024-02-20")), "month")

  

Section 6. Iteration & functions (8 problems)

Exercise 6.1: map_dbl

Difficulty: Beginner. Square each of 1:5.

Show solution
RInteractive R
map_dbl(1:5, ~ .x^2)

  

Exercise 6.2: map_dfr

Difficulty: Intermediate. For each n in 1:3, return a tibble with n and squared.

Show solution
RInteractive R
map_dfr(1:3, ~ tibble(n = .x, sq = .x^2))

  

Exercise 6.3: map2

Difficulty: Intermediate. Element-wise x^y for two vectors.

Show solution
RInteractive R
map2_dbl(c(2,3,4), c(1,2,3), ~ .x^.y)

  

Exercise 6.4: keep

Difficulty: Intermediate. Keep list elements with mean > 5.

Show solution
RInteractive R
lst <- list(c(1,2,3), c(7,8,9), c(4,5)) keep(lst, ~ mean(.x) > 5)

  

Exercise 6.5: Write a simple function

Difficulty: Beginner. Function returning x squared.

Show solution
RInteractive R
sq <- function(x) x^2 sq(5)

  

Exercise 6.6: Default arguments

Difficulty: Intermediate. Function with default n = 10.

Show solution
RInteractive R
fn <- function(x, n = 10) x * n fn(5)

  

Exercise 6.7: Anonymous function

Difficulty: Intermediate. R 4.1+ shorthand for an anonymous function.

Show solution
RInteractive R
sapply(1:5, \(x) x^2)

  

Exercise 6.8: safely

Difficulty: Advanced. Wrap a risky function so it never throws.

Show solution
RInteractive R
safe_log <- safely(log) result <- safe_log(-1) list(result$result, result$error)

  

Section 7. Models (6 problems)

Exercise 7.1: Simple linear model

Difficulty: Beginner. Fit mpg ~ wt.

Show solution
RInteractive R
fit <- lm(mpg ~ wt, data = mtcars) coef(fit)

  

Exercise 7.2: Read model summary

Difficulty: Intermediate. R-squared and slope p-value from the model.

Show solution
RInteractive R
fit <- lm(mpg ~ wt, data = mtcars) s <- summary(fit) list(r2 = s$r.squared, slope_p = s$coefficients["wt","Pr(>|t|)"])

  

Exercise 7.3: Tidy with broom

Difficulty: Intermediate. Use broom::tidy to extract coefficients.

Show solution
RInteractive R
fit <- lm(mpg ~ wt + hp, data = mtcars) broom::tidy(fit)

  

Exercise 7.4: Many models per group

Difficulty: Advanced. Fit lm per cyl, return slopes.

Show solution
RInteractive R
mtcars |> group_by(cyl) |> group_modify(~ broom::tidy(lm(mpg ~ wt, data = .x))) |> filter(term == "wt")

  

Exercise 7.5: Predict

Difficulty: Intermediate. Predict mpg for wt = 3.

Show solution
RInteractive R
fit <- lm(mpg ~ wt, data = mtcars) predict(fit, newdata = data.frame(wt = 3))

  

Exercise 7.6: Train/test split + RMSE

Difficulty: Advanced. 70/30 split, train lm, compute test RMSE.

Show solution
RInteractive R
set.seed(1) n <- nrow(mtcars) idx <- sample(seq_len(n), 0.7 * n) tr <- mtcars[idx, ]; te <- mtcars[-idx, ] fit <- lm(mpg ~ wt + hp, data = tr) sqrt(mean((te$mpg - predict(fit, te))^2))

  

What to do next

After 50 R4DS-aligned problems you have a solid working toolkit. Natural follow-ups:

  • dplyr-Exercises, ggplot2-Exercises, tidyverse-Exercises, Data-Wrangling-Exercises (all shipped), depth on each topic.
  • EDA-Exercises (shipped), applied data-exploration drills.
  • Linear-Regression-Exercises (shipped), modeling beyond R4DS basics.