dplyr slice() in R: Select Rows by Position
The slice() family in dplyr selects rows by position, by value, or by random sampling. Use slice() for explicit indices, slice_head() / slice_tail() for first or last N, slice_max() / slice_min() for top by value, and slice_sample() for random rows.
slice(df, 1:5) # rows 1 to 5 by position slice_head(df, n = 5) # first 5 rows slice_tail(df, n = 5) # last 5 rows slice_max(df, mpg, n = 5) # top 5 by mpg slice_min(df, mpg, n = 5) # bottom 5 by mpg slice_sample(df, n = 5) # random 5 rows slice_max(df, mpg, n = 1, by = cyl) # top mpg per cyl group
Need explanation? Read on for examples and pitfalls.
What slice() does in one sentence
The slice family selects rows by POSITION or VALUE, not by condition. slice() itself takes integer positions; the variants slice_head, slice_tail, slice_min, slice_max, slice_sample cover the most common positional patterns.
Unlike filter() (which selects by logical condition), slice operates on row indices. Unlike head() and tail() (base R, work on any object), slice is data-frame-specific and pipe-friendly.
Syntax
Each slice variant has its own minimal arguments. slice(df, indices) for explicit row numbers. slice_head(df, n=5) or slice_head(df, prop=0.1) for first 5 rows or first 10%. Same n/prop arguments for slice_tail, slice_max, slice_min, slice_sample.
The full signatures:
slice(.data, ..., .by = NULL, .preserve = FALSE)
slice_head(.data, ..., n, prop, by = NULL)
slice_tail(.data, ..., n, prop, by = NULL)
slice_max(.data, order_by, ..., n, prop, by = NULL, with_ties = TRUE, na_rm = FALSE)
slice_min(.data, order_by, ..., n, prop, by = NULL, with_ties = TRUE, na_rm = FALSE)
slice_sample(.data, ..., n, prop, by = NULL, weight_by = NULL, replace = FALSE)
n for absolute count, prop for fraction. slice_head(df, n = 5) returns 5 rows. slice_head(df, prop = 0.1) returns the first 10% of rows. They are mutually exclusive; pick one.Seven common patterns
1. Rows by explicit positions
slice() takes a vector of positions. Use 1:5 for a range, c(1, 3, 5) for specific rows, or -1 to drop the first row.
2. First or last N rows
slice_head() returns the first N (in row order). slice_tail() returns the last N. Use n= or prop=.
3. Top N by value with slice_max
slice_max() sorts by order_by (here, mpg) and returns the top N. Includes ties by default (with_ties = TRUE).
4. Bottom N with slice_min
slice_min() is the inverse of slice_max. Same arguments.
5. Random sample of rows
slice_sample() picks rows uniformly at random. set.seed() makes the result reproducible. Use replace = TRUE for sampling with replacement.
6. Top per group
by = cyl groups for the slice operation. Each group returns its top N. The result is automatically ungrouped.
7. Drop rows by negative position
Negative indices drop those positions. slice(df, -(1:5)) is equivalent to tail(df, n = nrow(df) - 5).
slice_max(df, x, n=5) is sort-and-take in one step. The equivalent arrange(df, desc(x)) |> head(5) works but is two operations. slice_max is clearer in pipelines and signals intent: "top 5 by x", not "sort, then take 5".slice() variants vs base R
Base R uses bracket subsetting and head() / tail(). dplyr's slice family unifies positional row access with explicit, pipe-friendly verbs.
| Task | dplyr slice | Base R |
|---|---|---|
| Specific positions | slice(df, c(1,3,5)) |
df[c(1,3,5), ] |
| First N | slice_head(df, n=5) |
head(df, 5) |
| Last N | slice_tail(df, n=5) |
tail(df, 5) |
| Top N by value | slice_max(df, x, n=5) |
df[order(-df$x)[1:5], ] |
| Random N | slice_sample(df, n=5) |
df[sample(nrow(df), 5), ] |
| Top per group | slice_max(df, x, n=1, by=g) |
(multiple awkward steps) |
When to use which:
- Use slice variants in any dplyr pipeline.
- Use base R
head()/tail()for one-line scripts on non-data-frame objects.
Common pitfalls
Pitfall 1: confusing slice with filter. slice(df, 1:5) returns rows by POSITION (the first 5). filter(df, x %in% 1:5) returns rows where x is 1, 2, 3, 4, or 5 (a CONDITION on values). Different operations.
Pitfall 2: slice does not arrange first. slice_head(df, n=5) returns the first 5 rows in their CURRENT order, not the 5 smallest values. To get top 5 by a value, use slice_max() or arrange() first.
slice_max(with_ties = TRUE) can return MORE than n rows when ties exist. slice_max(mtcars, mpg, n = 1, with_ties = TRUE) returns 2 rows because two cars both have 30.4 mpg. To always return exactly n rows, set with_ties = FALSE. The default is TRUE because dropping ties silently is usually wrong.Pitfall 3: forgetting set.seed() before slice_sample. Random sampling is non-reproducible without a seed. Always set.seed(N) before slice_sample() if you need consistent results across runs.
Try it yourself
Try it: Use the slice family to get the 3 cars with the LOWEST qsec (fastest quarter mile). Save to ex_fastest.
Click to reveal solution
Explanation: slice_min(qsec, n = 3) orders rows by qsec ascending and returns the first 3. This is "fastest" because lower qsec means quicker quarter mile.
Related dplyr functions
After mastering slice, look at:
head(),tail(): base R equivalents for any object (not just data frames)top_n(): legacy, superseded byslice_max()/slice_min()sample_n(),sample_frac(): legacy, superseded byslice_sample()filter(): select rows by logical condition (not position)arrange(): sort rows (often paired with slice_head for top-N)distinct(): deduplicate rows
For per-group top-N, slice_max(by = group) is cleaner than group_by() |> slice_max() |> ungroup().
FAQ
What is the difference between slice and filter in dplyr?
slice() selects rows by POSITION (row number or value rank). filter() selects rows by CONDITION (logical expression). slice(df, 1:5) returns rows 1 to 5; filter(df, x %in% 1:5) returns rows where column x equals 1, 2, 3, 4, or 5.
How do I get the top N rows in dplyr?
Use slice_max(df, x, n = 5) to get top 5 by column x. For first 5 rows in the current order (not by value), use slice_head(df, n = 5).
How do I sample rows randomly in dplyr?
Use slice_sample(df, n = 5) for 5 random rows or slice_sample(df, prop = 0.1) for 10% random sample. Set replace = TRUE for sampling with replacement. Always call set.seed() before for reproducibility.
Can slice_max return more than n rows?
Yes, with ties. Default with_ties = TRUE keeps all rows tied at the cutoff. slice_max(mtcars, mpg, n = 1) may return 2 rows if two cars share the highest mpg. Set with_ties = FALSE to always return exactly n rows.
What replaced top_n() and sample_n() in dplyr?
top_n() is superseded by slice_max() (top by value) or slice_head() (top by row order). sample_n() is superseded by slice_sample(). The new names are clearer about what is being selected.