dplyr n() in R: Count Rows Inside summarise or mutate
The n() function in dplyr returns the number of rows in the CURRENT group when called inside summarise(), mutate(), or filter(). It is the standard "group size" expression.
df |> summarise(count = n()) # one row, total count df |> group_by(g) |> summarise(count = n()) # per group df |> group_by(g) |> mutate(group_size = n()) df |> filter(n() > 5) # filter groups of size > 5 df |> count(g) # shortcut for group_by + summarise(n()) n_distinct(df$x) # different: unique count
Need explanation? Read on for examples and pitfalls.
What n() does in one sentence
n() returns the size of the current group as an integer; it can ONLY be called inside dplyr verbs (summarise, mutate, filter). Outside those contexts, it errors.
This is the canonical way to express "how many rows are in this group" inside a dplyr pipeline.
Syntax
n(). No arguments. Must be called inside a dplyr verb on a (possibly grouped) tibble.
count(df, g) when you only need a count by column. It is a shortcut for df |> group_by(g) |> summarise(n = n()) |> ungroup(): same result, less typing.Five common patterns
1. Total count
For a scalar without the tibble wrapper, use nrow(mtcars).
2. Count per group
3. Add count column without summarising
For a one-step idiom, add_count(df, cyl) does the same.
4. Filter groups by size
n() inside filter is per-group: keeps every row from groups of size 10 or more.
5. Combine n() with other aggregates
n() coexists with mean, sd, sum, etc., inside summarise.
n() only makes sense inside dplyr verbs and a (possibly grouped) tibble. Outside, it errors with "Must only be used inside dplyr verbs". This restriction lets dplyr inject the correct group size at evaluation time.n() vs nrow() vs n_distinct() vs count()
Four counting functions in R, with different scope.
| Function | Counts | Scope | Where |
|---|---|---|---|
n() |
Rows | Current group | Inside dplyr verbs |
nrow(df) |
Rows | Entire data frame | Anywhere |
n_distinct(x) |
Unique values | Vector | Anywhere |
count(df, g) |
Rows | Per group | Top-level dplyr |
length(x) |
Elements | Vector | Anywhere |
When to use which:
n()inside summarise/mutate/filter for group size.nrow(df)outside dplyr; for the total row count as a scalar.n_distinct(col)for unique value counts.count(df, g)for one-step count by column.
A practical workflow
The most common n() usage is inside summarise alongside other aggregates.
This produces a per-category summary with row count, average, SD, and unique-item count in one block. n() captures the group size; n_distinct gets unique counts.
Common pitfalls
Pitfall 1: calling n() outside dplyr. mtcars |> n() errors. n must be inside summarise, mutate, or filter.
Pitfall 2: confusing n() with n_distinct(). n() counts rows; n_distinct(col) counts unique values. n() ignores the column and just counts rows.
n() does NOT take arguments. It is a zero-arg function that gets the current group size from dplyr's internal state. If you want to count NON-NA values in a column, use sum(!is.na(col)), NOT n().Try it yourself
Try it: For each cyl group, compute the count, the mean of mpg, and the count of unique gear values. Save to ex_summary.
Click to reveal solution
Explanation: n() gives the row count. mean(mpg) the average. n_distinct(gear) the unique gear values per cyl group.
Related dplyr functions
After mastering n(), look at:
n_distinct(x): count unique valuescount(df, g): shortcut for group_by + summarise(n = n())tally(): shortcut for summarise(n = n()) when already groupedadd_count()/add_tally(): keep all rows + add countcur_group_id(): integer ID of the current groupcur_group_rows(): row indices within current group
For unique-value counts, n_distinct(col) is the direct counterpart.
FAQ
What does n() do in dplyr?
n() returns the number of rows in the current group when called inside summarise(), mutate(), or filter(). It can only be used inside dplyr verbs.
What is the difference between n() and nrow() in R?
nrow(df) returns total rows in the data frame. n() returns the size of the CURRENT group inside a dplyr verb. They differ on grouped tibbles.
What is the difference between n() and n_distinct()?
n() counts ROWS. n_distinct(col) counts UNIQUE values in a column. n() ignores any column; n_distinct works on a specific vector.
Why does my n() call error with 'Must only be used inside dplyr verbs'?
Because n() is restricted to dplyr verb contexts. You cannot call it as a standalone function. Wrap in summarise(n = n()) or mutate(group_size = n()).
Can I use n() inside filter?
Yes: filter(n() >= 5) keeps every row from groups of size 5+. Inside filter, n() is the current group's size.