dplyr slice_sample() in R: Random Rows From a Tibble
The slice_sample() function in dplyr returns a random sample of n rows (or a fraction) from a data frame, optionally per group, with or without replacement. It supersedes the older sample_n() and sample_frac().
slice_sample(df, n = 5) # 5 random rows slice_sample(df, prop = 0.1) # 10% of rows slice_sample(df, n = 3, by = cyl) # 3 per group (stratified) slice_sample(df, n = 5, replace = TRUE) # bootstrap slice_sample(df, n = 5, weight_by = w) # weighted sample df |> group_by(g) |> slice_sample(n = 3) # equivalent grouped form set.seed(42); slice_sample(df, n = 5) # reproducible
Need explanation? Read on for examples and pitfalls.
What slice_sample() does in one sentence
slice_sample(.data, n) returns a random sample of n rows; slice_sample(.data, prop = 0.1) returns 10% of the rows. On a grouped tibble (or with by = g), the sampling happens within each group.
This is the modern, group-aware random sampler. For new code, prefer it over sample_n() (deprecated).
Syntax
slice_sample(.data, n = NULL, prop = NULL, weight_by = NULL, replace = FALSE, by = NULL). Pass n OR prop, not both.
set.seed() before sampling for reproducibility. Random samples without a seed differ across runs, which makes debugging and report regeneration harder.Five common patterns
1. Random n rows
2. Random fraction
20% of 32 = ~6 rows.
3. Stratified sample (per group)
3 random cars per cylinder group, regardless of group size.
4. Bootstrap (with replacement)
Standard bootstrap: same row count as original, with duplicates allowed.
5. Weighted sample
Rows with higher mpg are more likely to be picked. Useful for importance sampling.
slice_sample() replaces TWO older functions: sample_n() (n random rows) and sample_frac() (fraction). Both are deprecated since dplyr 1.0. The new function unifies them and adds by for stratified sampling.slice_sample() vs sample_n() vs sample()
Three sampling functions in R, with different scope.
| Function | Package | Per group | Status |
|---|---|---|---|
slice_sample(n) |
dplyr | Yes | Recommended |
sample_n(n) |
dplyr | Yes | Deprecated since 1.0 |
base::sample(x, size) |
base | No | Vector sampling, not data frames |
When to use which:
slice_samplefor data frames in dplyr pipelines.samplefor sampling from vectors or generating random indices.- Avoid
sample_nin new code.
A practical workflow
The "stratified sample" pattern is the most common slice_sample use case. Examples:
- Train/test split with balanced classes:
slice_sample(prop = 0.8, by = class) - Sample customers per region equally:
slice_sample(n = 100, by = region) - Bootstrap CI estimation: loop
slice_sample(n = nrow(df), replace = TRUE)1,000 times
For one-off samples, slice_sample(n = 5) is the quick interactive tool. For production analysis, set the seed and document the sample size.
Common pitfalls
Pitfall 1: forgetting to set seed. slice_sample is non-deterministic. Reports / tests / docs should set.seed(42) (or any fixed integer) right before to get reproducible output.
Pitfall 2: per-group surprise. On a grouped tibble, slice_sample(n = 5) returns 5 rows PER GROUP. Often what you want for stratification, sometimes not. Use ungroup() first if you mean a global sample.
slice_sample(n = X) errors if any group has fewer than X rows AND replace = FALSE. A group with only 2 rows fails for n = 5. Either set replace = TRUE or filter groups by size first.Reproducibility and seeds
Random samples are only useful if they are reproducible across runs. Always call set.seed(N) immediately before slice_sample() when the result will be used in a report, plot, or test. Different code paths that need DIFFERENT samples should use different seeds (e.g., set.seed(1) for train, set.seed(2) for test) so the splits are reproducible AND independent. R's RNG state is global, so any operation between set.seed() and slice_sample() that consumes randomness will desync the result. Keep them adjacent.
Try it yourself
Try it: Take a stratified random sample of 2 rows per cyl group from mtcars. Set seed 42 for reproducibility. Save to ex_strat.
Click to reveal solution
Explanation: slice_sample(n = 2, by = cyl) picks 2 random rows per cyl group. Equal-count stratified sample.
Related slice functions
After mastering slice_sample, look at:
slice_head()/slice_tail(): positional first/last nslice_max()/slice_min(): top/bottom n by valueslice(): specific row indexessample_n()/sample_frac(): deprecated; do not usersamplepackage: train/test splits with class balance, k-fold CVbase::sample(): vector sampling
For machine-learning splits, the rsample package builds on slice_sample with initial_split() and vfold_cv().
FAQ
What is the difference between slice_sample and sample_n?
sample_n() is deprecated since dplyr 1.0. slice_sample() is the replacement: clearer name, supports prop, weight_by, and by arguments, group-aware.
How do I do a reproducible random sample in R?
Call set.seed(42) (or any fixed integer) immediately before slice_sample(). The same seed always returns the same rows.
How do I sample with replacement (bootstrap) in dplyr?
Pass replace = TRUE: slice_sample(df, n = nrow(df), replace = TRUE). The size matches the original; rows can repeat.
How do I do a weighted random sample?
Pass weight_by = column: slice_sample(df, n = 5, weight_by = price) makes rows with higher price more likely to be picked.
How do I do a stratified sample per group?
slice_sample(df, n = 3, by = group_col) returns 3 random rows per group. For a fraction per group, use prop = 0.1 instead of n.