dplyr row_number() in R: Assign Sequential Row Indexes
The row_number() function in dplyr returns sequential integer ranks 1, 2, 3, ... where TIES are broken by FIRST APPEARANCE. It is the most common ranking function in dplyr.
row_number() # 1, 2, 3, ... over current group row_number(x) # rank by x (ties: first appearance) row_number(desc(x)) # rank by x descending df |> mutate(rn = row_number()) df |> group_by(g) |> mutate(rn = row_number()) df |> filter(row_number() <= 3) # top 3 per group (after sort)
Need explanation? Read on for examples and pitfalls.
What row_number() does in one sentence
row_number() (no arg) returns 1, 2, 3, ... in row order; row_number(x) returns the rank of each element of x with ties broken by FIRST APPEARANCE. Inside group_by(), numbering restarts in each group.
The most common ranking function in dplyr. Use it when you want strictly increasing integers without tied ranks.
Syntax
row_number(x = NULL). With no arg, returns 1..n_rows. With an arg, ranks by that vector.
row_number() (no arg) inside mutate to add a sequential id column. Combined with arrange(), it gives you a stable position index after sorting.Five common patterns
1. Add a sequential index
Sequential id, useful for joining or referencing rows.
2. Rank by a column
Ties are broken by row order, NOT shared rank.
3. Rank descending
4. Per-group row numbers
User a's visits are numbered 1-3; user b's restart at 1.
5. Top n per group (filter)
Sort by mpg desc within each group, then keep rows where row_number is 1 or 2.
row_number() always produces UNIQUE integer ranks. No ties. Two rows with the same value get adjacent ranks (e.g., 3 and 4). For shared ranks on ties, use min_rank() (1, 2, 2, 4) or dense_rank() (1, 2, 2, 3) instead.row_number() vs min_rank() vs dense_rank() vs rank()
Four ranking functions in R, with different tie-handling.
| Function | Ties | Output for c(10, 20, 20, 5) |
|---|---|---|
row_number(x) |
Broken by row order | 2, 3, 4, 1 |
min_rank(x) |
Tied values share min rank | 2, 3, 3, 1 |
dense_rank(x) |
Tied values share rank, no gaps | 2, 3, 3, 1 (same here) |
base::rank(x) |
Tied values get average | 2, 3.5, 3.5, 1 |
When to use which:
row_numberfor unique sequential IDs.min_rankfor "leaderboard with ties".dense_rankto avoid gaps after ties.rankif you need average-tie behavior (rare in dplyr).
A practical workflow
The "top n per group" pattern is row_number's signature use case.
Top 5 by score in each category. Equivalent to slice_max(score, n = 5, by = category) in modern dplyr (1.1+); slice_max is cleaner.
For "first occurrence per group":
Same as slice_min(timestamp, n = 1, by = user).
Common pitfalls
Pitfall 1: forgetting arrange. row_number() is positional. Without arrange(), the numbers reflect whatever order rows happen to be in.
Pitfall 2: per-group reset can surprise. On a grouped tibble, row_number restarts at each group boundary. If you wanted a global row index, ungroup first or use mutate(id = 1:n()) outside group_by.
row_number(x) and row_number() differ semantically. With no arg, it numbers rows 1..n in current order. With an arg, it RANKS by that column. Easy to confuse.Try it yourself
Try it: Rank mtcars cars by hp descending and keep only the top 3 PER cyl group. Save to ex_top3.
Click to reveal solution
Explanation: Sort by hp desc within each cyl, keep rows where row_number <= 3. slice_max is the cleaner alternative in dplyr 1.1+.
Related dplyr functions
After mastering row_number, look at:
min_rank(): ties share min rankdense_rank(): ties share rank, no gapspercent_rank()/cume_dist(): relative-position rankingsntile(): split rows into n binsslice_max()/slice_min(): top/bottom n by column (cleaner than filter + row_number)cur_group_rows(): row indices within current group
For modern dplyr code, prefer slice_max(col, n) over arrange(desc(col)) |> filter(row_number() <= n).
FAQ
What does row_number do in dplyr?
row_number() returns 1, 2, 3, ... sequential integers. row_number(x) ranks by x with ties broken by first appearance.
What is the difference between row_number, min_rank, and dense_rank?
row_number always produces unique ranks (ties broken by row order). min_rank gives ties the same rank but leaves gaps (1, 2, 2, 4). dense_rank gives ties the same rank with no gaps (1, 2, 2, 3).
How do I get top n rows per group with row_number?
df |> group_by(g) |> arrange(desc(col)) |> filter(row_number() <= n). In dplyr 1.1+, slice_max(col, n, by = g) is cleaner.
Why does row_number reset on grouped tibbles?
Because dplyr applies it per group. To get a global row index, call ungroup() first or use mutate(id = 1:n()) outside the grouping.
What is the difference between row_number() and 1:n()?
1:n() is also valid inside dplyr verbs and produces the same result. row_number() is a window function with full dplyr support; 1:n() is shorter but functionally equivalent for the no-arg case.