dplyr reframe() in R: Summarise With Variable Output Length
The reframe() function in dplyr 1.1 generalizes summarise() to allow variable output rows per group. Where summarise enforces "one row per group" (or n equal rows), reframe permits any number of output rows per group.
df |> group_by(g) |> reframe(q = quantile(x, c(.25,.5,.75))) df |> reframe(top = head(sort(x, decreasing = TRUE), 3)) df |> group_by(g) |> reframe(seq = seq(min(x), max(x))) df |> group_by(g) |> summarise(mean = mean(x)) # 1 row per group (sum) df |> group_by(g) |> reframe(out = some_fn(x)) # any rows (multi)
Need explanation? Read on for examples and pitfalls.
What reframe() does in one sentence
reframe(.data, ...) works like summarise() but does NOT enforce that each expression returns one value per group; output rows expand to match the largest expression. Per-group output can have arbitrarily many rows.
reframe was introduced in dplyr 1.1 because summarise's "1 row per group" rule was too restrictive for common patterns like "per-group quantiles" or "per-group top-N".
Syntax
reframe(.data, ...). Same syntax as summarise; relaxed output-length rule.
3 quantiles per cyl group = 9 rows. summarise would error because each row produces 3 values.
reframe when each group's output has multiple rows or a variable count. For "exactly 1 row per group", summarise is still the right tool.Five common patterns
1. Per-group quantiles
2. Top n per group (alternative)
For top-n, slice_max(mpg, n = 3, by = cyl) is cleaner because it returns the WHOLE row, not just the values.
3. Generated sequences per group
Each cyl group produces a different number of seq values.
4. Multi-stat output
4 rows per cyl group, one per stat.
5. summarise vs reframe demonstration
reframe() vs summarise() vs slice_max()
Three approaches to "per-group output with multiple rows".
| Function | Output rows per group | Best for |
|---|---|---|
summarise() |
Exactly 1 (or n equal) | Aggregations |
reframe() |
Any number | Multi-row aggregations like quantiles |
slice_max(col, n) |
Up to n | Top n by column |
When to use which:
summarisefor aggregation: mean, sd, n.reframefor variable-row output like quantiles or sequences.slice_max/slice_minfor "top n by column" specifically.
A practical workflow
The "per-group quantile table" pattern is reframe's killer use case.
Per-category, returns 5 rows (one per percentile). Useful for distribution comparison across categories.
Common pitfalls
Pitfall 1: reframe is dplyr 1.1+. Older dplyr versions don't have reframe. Workaround: list-column + tidyr::unnest.
Pitfall 2: forgetting that summarise still has the 1-row rule. If your code "used to work" with summarise but now errors, you may have changed the function output length. Switch to reframe.
reframe doesn't enforce output-length consistency across expressions. If one expression returns 3 values and another returns 5, reframe expands to the max (5) and shorter ones are recycled or error. Be careful.Why reframe was added in dplyr 1.1
Pre-1.1, the only way to produce variable-length per-group output was to wrap results in a list and unnest them. This was verbose and slow on big data. The dplyr team added reframe specifically for cases like quantiles, ranks, and per-group sequences where each group naturally produces multiple values. The decision reflects an explicit recognition that summarise's "1 row per group" rule, while useful for safety, was too restrictive for several common analytical patterns. The split between summarise (strict) and reframe (flexible) lets each function communicate clear intent: "I expect one row per group" or "I expect possibly many".
Try it yourself
Try it: Compute the 25th, 50th, and 75th percentile of mpg per cyl group using reframe. Save to ex_quartiles.
Click to reveal solution
Explanation: reframe allows 3 rows per cyl group (one per quartile). summarise would error on this.
Related dplyr functions
After mastering reframe, look at:
summarise(): 1 row per groupslice_max()/slice_min(): top/bottom n by columntidyr::unnest(): flatten list columnsquantile(): percentile valuescur_data_all(): alternative for older dplyrpick(): select columns inside reframe / summarise
For older dplyr (<1.1), the equivalent pattern is summarise(out = list(quantile(x, ...))) |> tidyr::unnest(out).
FAQ
What does reframe do in dplyr?
reframe(.data, ...) is like summarise() but allows expressions to return any number of rows per group, not just 1.
What is the difference between reframe and summarise?
summarise enforces "1 row per group" (or n equal rows across expressions). reframe allows variable-length output. For multi-row per-group computations like quantiles, use reframe.
When was reframe introduced?
In dplyr 1.1.0 (Jan 2023). Older versions used summarise(x = list(...)) |> tidyr::unnest(x) as a workaround.
Can I mix summarise and reframe in a pipeline?
Yes. Use summarise for fixed-output aggregations and reframe for variable-output ones. They have the same syntax otherwise.
Why does my summarise error with "must return 1 row per group"?
Because the expression returns multiple values per group (e.g., quantile returns 3 values for 3 probs). Switch to reframe to allow this.