arrange() sorts rows by column values. slice() picks rows by position. slice_max() and slice_min() get the top or bottom N rows by a value — with optional grouping for "top N per group."
The real power: combine group_by() with slice functions to get the top/bottom N within each group.
library(dplyr)
# Top 2 most efficient cars PER cylinder group
mtcars |>
mutate(car = rownames(mtcars)) |>
group_by(cyl) |>
slice_max(mpg, n = 2) |>
select(car, cyl, mpg) |>
ungroup()
library(dplyr)
# Heaviest car per cylinder group
mtcars |>
mutate(car = rownames(mtcars)) |>
group_by(cyl) |>
slice_max(wt, n = 1) |>
select(car, cyl, wt, mpg) |>
ungroup()
Stratified Random Sample
library(dplyr)
set.seed(123)
# 2 random cars per cylinder group
mtcars |>
mutate(car = rownames(mtcars)) |>
group_by(cyl) |>
slice_sample(n = 2) |>
select(car, cyl, mpg) |>
ungroup()
Handling Ties
By default, slice_max/slice_min include ties — you might get more than N rows. Use with_ties = FALSE for exactly N.
library(dplyr)
# With ties (default) — may return more than 3
mtcars |> slice_min(cyl, n = 3) |> nrow()
# Without ties — exactly 3
mtcars |> slice_min(cyl, n = 3, with_ties = FALSE) |> nrow()
Practice Exercises
Exercise 1: Top Performers by Group
Find the 2 most powerful cars (highest hp) in each cylinder group.
**Explanation:** `slice_sample(prop = 0.6)` inside `group_by(Species)` takes 60% from each group — a stratified sample that preserves class proportions.
Summary
Function
Purpose
Example
arrange(col)
Sort ascending
arrange(mpg)
arrange(desc(col))
Sort descending
arrange(desc(mpg))
slice(rows)
Pick by position
slice(1:5)
slice_head(n=)
First N
slice_head(n = 3)
slice_tail(n=)
Last N
slice_tail(n = 3)
slice_max(col, n=)
Top N by value
slice_max(mpg, n = 5)
slice_min(col, n=)
Bottom N by value
slice_min(hp, n = 3)
slice_sample(n=)
Random N
slice_sample(n = 10)
slice_sample(prop=)
Random %
slice_sample(prop = 0.2)
FAQ
Is arrange() stable (preserves original order for ties)?
Yes. dplyr's arrange is a stable sort — rows with identical sort values keep their original relative order.
What replaced top_n()?
slice_max() and slice_min() replaced top_n() in dplyr 1.0. They have a clearer API: slice_max(col, n = 5) vs the old top_n(5, col) where argument order was confusing.
Can I sort by a computed expression?
Yes: arrange(desc(hp / wt)) sorts by power-to-weight ratio without creating a new column first.