R for Sports Analytics Exercises: 20 Practice Problems

Twenty practice problems for sports analytics in R: rankings, ratings, ELO, win probability, player metrics, season summaries. Hidden solutions.

RRun this once before any exercise
library(dplyr) library(tibble) library(tidyr) library(ggplot2)

  

Exercise 1: Standings from results

Difficulty: Intermediate.

Show solution
RInteractive R
games <- tibble(home = c("A","B","C","A","B"), away = c("B","C","A","C","A"), home_score = c(2,1,3,0,2), away_score = c(1,1,2,1,2)) games |> transmute(team = home, points = ifelse(home_score > away_score, 3, ifelse(home_score == away_score, 1, 0))) |> bind_rows(games |> transmute(team = away, points = ifelse(away_score > home_score, 3, ifelse(away_score == home_score, 1, 0)))) |> group_by(team) |> summarise(pts = sum(points)) |> arrange(desc(pts))

  

Exercise 2: Win percentage

Difficulty: Beginner.

Show solution
RInteractive R
wins <- 25; losses <- 15 wins / (wins + losses)

  

Exercise 3: Pythagorean expectation

Difficulty: Advanced.

Show solution
RInteractive R
runs_for <- 720; runs_against <- 650 runs_for^2 / (runs_for^2 + runs_against^2)

  

Exercise 4: Top scorers

Difficulty: Beginner.

Show solution
RInteractive R
df <- tibble(player = letters[1:5], goals = c(20, 15, 30, 12, 25)) df |> arrange(desc(goals)) |> head(3)

  

Exercise 5: Z-score for player stat

Difficulty: Intermediate.

Show solution
RInteractive R
goals <- c(20, 15, 30, 12, 25, 18, 22) (goals - mean(goals)) / sd(goals)

  

Exercise 6: Goals per game

Difficulty: Beginner.

Show solution
RInteractive R
df <- tibble(player = c("a","b"), goals = c(20, 25), games = c(30, 28)) df |> mutate(g_per_game = goals / games)

  

Exercise 7: ELO update

Difficulty: Advanced.

Show solution
RInteractive R
elo_update <- function(rA, rB, sA, K = 32) { eA <- 1 / (1 + 10^((rB - rA) / 400)) rA + K * (sA - eA) } elo_update(1500, 1500, 1)

  

Exercise 8: Apply ELO across season

Difficulty: Advanced.

Show solution
RInteractive R
games <- tibble(team1 = c("A","B","A"), team2 = c("B","C","C"), winner = c("A","C","A")) ratings <- c(A = 1500, B = 1500, C = 1500) for (i in seq_len(nrow(games))) { r1 <- ratings[games$team1[i]]; r2 <- ratings[games$team2[i]] s1 <- as.integer(games$winner[i] == games$team1[i]) e1 <- 1 / (1 + 10^((r2 - r1)/400)) ratings[games$team1[i]] <- r1 + 32*(s1 - e1) ratings[games$team2[i]] <- r2 + 32*((1-s1) - (1-e1)) } ratings

  

Exercise 9: Win probability from ELO

Difficulty: Intermediate.

Show solution
RInteractive R
rA <- 1600; rB <- 1500 1 / (1 + 10^((rB - rA) / 400))

  

Exercise 10: Head-to-head record

Difficulty: Intermediate.

Show solution
RInteractive R
games <- tibble(t1 = c("A","B","A","A"), t2 = c("B","A","B","B"), winner = c("A","A","B","A")) games |> filter((t1 == "A" & t2 == "B") | (t1 == "B" & t2 == "A")) |> count(winner)

  

Exercise 11: Streak detection

Difficulty: Advanced.

Show solution
RInteractive R
res <- c("W","W","L","W","W","W","L") rle(res)

  

Exercise 12: Plus-minus per player

Difficulty: Advanced.

Show solution
RInteractive R
events <- tibble(player = c("a","b","c","a"), on_court = c(TRUE, TRUE, FALSE, TRUE), score_change = c(2, 2, 0, -1)) events |> filter(on_court) |> group_by(player) |> summarise(plus_minus = sum(score_change))

  

Exercise 13: Home-field advantage estimate

Difficulty: Advanced.

Show solution
RInteractive R
games <- tibble(home_pts = c(100, 95, 110, 88, 105), away_pts = c(95, 92, 100, 90, 102)) mean(games$home_pts - games$away_pts)

  

Exercise 14: Player percentile

Difficulty: Intermediate.

Show solution
RInteractive R
goals <- c(20, 15, 30, 12, 25, 18, 22, 28) ecdf(goals)(25)

  

Exercise 15: Team form (last 5 games)

Difficulty: Intermediate.

Show solution
RInteractive R
games <- tibble(team = rep("A", 10), result = sample(c("W","L","D"), 10, replace = TRUE), date = Sys.Date() - 9:0) games |> arrange(desc(date)) |> head(5) |> count(result)

  

Exercise 16: Expected goals (xG) aggregate

Difficulty: Advanced.

Show solution
RInteractive R
shots <- tibble(player = c("a","a","b","b","b"), xg = c(0.1, 0.3, 0.05, 0.4, 0.2)) shots |> group_by(player) |> summarise(total_xg = sum(xg))

  

Exercise 17: Goals vs xG (efficiency)

Difficulty: Advanced.

Show solution
RInteractive R
players <- tibble(player = c("a","b"), goals = c(5, 3), xg = c(4.2, 5.1)) players |> mutate(over_perform = goals - xg)

  

Exercise 18: Visualize standings

Difficulty: Intermediate.

Show solution
RInteractive R
standings <- tibble(team = c("A","B","C","D"), pts = c(30, 22, 18, 12)) ggplot2::ggplot(standings, ggplot2::aes(reorder(team, pts), pts)) + ggplot2::geom_col() + ggplot2::coord_flip()

  

Exercise 19: Compare positions

Difficulty: Intermediate.

Show solution
RInteractive R
players <- tibble(pos = c("F","M","D","F","M","D"), goals = c(20, 5, 1, 18, 8, 2)) players |> group_by(pos) |> summarise(mean_goals = mean(goals))

  

Exercise 20: Days rest between games

Difficulty: Intermediate.

Show solution
RInteractive R
games <- tibble(team = "A", date = as.Date(c("2024-09-01","2024-09-05","2024-09-12"))) games |> mutate(rest = as.integer(date - lag(date)))

  

What to do next

  • EDA-Exercises (shipped), explore your dataset.
  • Linear-Regression-Exercises (shipped), model on player stats.