dplyr add_count() in R: Add Group Count Without Summarising
The add_count() function in dplyr adds a column with the per-group row count to every row, WITHOUT collapsing the data. It is mutate()-style; count() is summarise()-style.
df |> add_count(g) # adds n column per group df |> add_count(g, sort = TRUE) # sort desc by n df |> add_count(g, name = "group_size") # rename added column df |> add_count(g, wt = price) # weighted (sum of price) df |> group_by(g) |> mutate(n = n()) # equivalent df |> count(g) # collapses (different result!) df |> add_count(g) |> filter(n > 5) # use n in downstream filter
Need explanation? Read on for examples and pitfalls.
What add_count() does in one sentence
add_count(df, ...) is df |> group_by(...) |> mutate(n = n()) |> ungroup(). It adds a column n containing the per-group row count, WITHOUT collapsing rows.
This is the "mutate" cousin of count(). Use it when you need the count alongside the original data, not as a summary.
Syntax
add_count(x, ..., wt = NULL, sort = FALSE, name = NULL). ... are grouping columns.
add_count() whenever you need to filter or sort by group size. Without it, you have to count, then join back. add_count() does it in one step and keeps all rows.Five common patterns
1. Add count column
2. Filter to common groups
A clean idiom for "filter rare categories".
3. Sort by group size
sort = TRUE arranges rows so the largest group appears first.
4. Custom column name
Useful when n would clash with an existing column.
5. Weighted count
total_weight = sum of wt per cyl group. Same idea as tally(wt = ...).
add_count() is the dplyr idiom for "I need group size as a column". Without it, you have to do count, then join : two steps that are easy to misorder. add_count() is one verb and stays inside the pipeline.add_count() vs count() vs add_tally()
Three ways to handle group counts in dplyr.
| Function | Style | Pre-grouped? | Output rows |
|---|---|---|---|
add_count(df, g) |
mutate (keep rows) | No (does grouping) | Original count |
count(df, g) |
summarise (collapse) | No (does grouping) | One per group |
add_tally() |
mutate (keep rows) | YES (already grouped) | Original count |
tally() |
summarise (collapse) | YES (already grouped) | One per group |
When to use which:
add_count(g): most common; one-step add a count column.count(g): when you only need the summary table.add_tally(): when the data is already grouped and you want to keep all rows.tally(): when already grouped and you want a summary.
A practical workflow
The classic "filter rare values" pattern uses add_count + filter.
This is the cleanest base pattern for cleaning categorical variables. Equivalent without add_count:
Same result, more code, easier to make a mistake.
Common pitfalls
Pitfall 1: column name clash. If your data already has an n column, add_count() will silently overwrite it. Use name = "..." to avoid this.
Pitfall 2: confusing add_count with count. count(df, g) returns ONE row per group; add_count(df, g) returns ALL rows with n added. Easy to type the wrong one.
add_count() ungroups the result by default; add_tally() keeps existing grouping. If you have grouped data and want to preserve grouping, use add_tally(). If you want a single-step group + count + ungroup, use add_count().Try it yourself
Try it: Add a column showing each cylinder group's size, then filter to rows where the group has at least 10 cars. Save to ex_common.
Click to reveal solution
Explanation: add_count(cyl) adds an n column with each row's cyl group size. filter(n >= 10) keeps only the rows from groups of at least 10.
Related count functions
After mastering add_count, look at:
count(): collapse-style sister of add_countadd_tally(): same as add_count but for already-grouped datatally(): collapse-style; already-grouped datan(): per-group row count inside mutate / summarisen_distinct(): count of unique valuescur_group_rows(): row indexes within current group
For more complex group-aware mutations, use group_by() |> mutate() directly.
FAQ
What is the difference between add_count and count in dplyr?
count(df, g) collapses to one row per group with column n. add_count(df, g) keeps ALL rows and adds n as a new column. Use add_count when you need the count alongside the original data.
How do I add a count column to a data frame without summarising?
Use add_count(df, group_col). It adds n with the per-group row count and returns all rows.
What is the difference between add_count and add_tally?
add_count(df, g) does the grouping for you. add_tally(df) requires the data to already be grouped via group_by(). Otherwise identical.
How do I filter to common categories with add_count?
df |> add_count(category) |> filter(n >= threshold) keeps only rows whose category has at least threshold members.
Can I weight add_count?
Yes: add_count(df, g, wt = price) sums price per group instead of counting rows.