dplyr n_distinct() in R: Count Unique Values Fast

The n_distinct() function in dplyr counts the number of unique values in one or more vectors. It is the fast, dplyr-native equivalent of length(unique(x)).

By Selva Prabhakaran · Published May 12, 2026 · Last updated May 12, 2026

⚡ Quick Answer

n_distinct(x)                          # unique values in x
n_distinct(x, na.rm = TRUE)            # exclude NAs
n_distinct(x, y)                       # unique combinations of x and y
df |> summarise(n_unique = n_distinct(col))
df |> group_by(g) |> summarise(n_unique = n_distinct(col))
length(unique(x))                       # base R equivalent (slower)

Need explanation? Read on for examples and pitfalls.

📊 Is n_distinct() the right tool?

What n_distinct() does in one sentence

n_distinct(x, ..., na.rm = FALSE) returns the integer count of unique values in x (or unique combinations across x, ...). It is faster than length(unique(x)) because it avoids materializing the full unique vector.

The standard "how many unique customers / products / categories?" function in dplyr.

Syntax

n_distinct(..., na.rm = FALSE). Pass one or more vectors; counts unique combinations across them.

Run live

Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.

RUnique values in a vector

library(dplyr) x <- c(1, 2, 2, 3, 3, 3, NA) n_distinct(x) #> [1] 4 (1, 2, 3, NA all counted as distinct) n_distinct(x, na.rm = TRUE) #> [1] 3

Tip

n_distinct(x) is faster than length(unique(x)). dplyr uses a hash-based approach that avoids materializing the unique vector. On a million-element vector the difference is significant.

Five common patterns

1. Unique values in one column

RHow many cylinder counts in mtcars?

n_distinct(mtcars$cyl) #> [1] 3

2. Inside summarise

RPer-group unique count

mtcars |> group_by(cyl) |> summarise(n_gears = n_distinct(gear)) #> cyl n_gears #> 4 3 #> 6 3 #> 8 2

3. Multi-column combinations

RUnique (cyl, gear) pairs

n_distinct(mtcars$cyl, mtcars$gear) #> [1] 8

Pass multiple vectors to count unique combinations.

4. Excluding NAs

RSkip NAs in the count

x <- c("a", "b", NA, "a", NA) n_distinct(x, na.rm = TRUE) #> [1] 2 n_distinct(x, na.rm = FALSE) #> [1] 3 (NA counted as one of the distinct values)

5. Inside mutate (per-group)

RAdd unique count column

mtcars |> group_by(cyl) |> mutate(n_gear = n_distinct(gear)) #> Each row gets the count of unique gear in its cyl group

Key Insight

n_distinct() differs from n(): it counts UNIQUE values, not ROWS. n() returns the group size; n_distinct(col) returns how many distinct values appear in that column. Easy to confuse but very different semantics.

n_distinct() vs length(unique()) vs distinct() vs n()

Four ways to handle "uniqueness" questions in dplyr / R.

Function	Returns	Best for
`n_distinct(x)`	Integer count	Quick count, dplyr summarise
`length(unique(x))`	Integer count	Base R, equivalent but slower
`dplyr::distinct(df, col)`	Filtered tibble	"Show me the unique rows"
`unique(x)`	Vector of unique values	Inspect what those values are
`n()`	Group size (row count)	Different question

When to use which:

n_distinct(x) inside summarise/mutate.
length(unique(x)) for base R; same result, slightly slower.
distinct(df, col) to keep one row per unique value.
unique(x) to see the actual unique values.

A practical workflow

The "audit" pattern is the most common n_distinct use case in summary tables.

RDataset audit

df |> summarise( rows = n(), unique_users = n_distinct(user_id), unique_items = n_distinct(item_id), avg_qty = mean(qty), .groups = "drop" )

A first-pass dataset audit: how many rows, how many unique users, how many unique items. Tells you the table's shape at a glance.

For per-group audits:

RDaily session and unique-user counts

df |> group_by(date) |> summarise( sessions = n(), unique_users = n_distinct(user_id), .groups = "drop" )

Daily session and unique-user counts.

Common pitfalls

Pitfall 1: NA counted as distinct. Default na.rm = FALSE includes NA in the count. n_distinct(c(1, NA, 1)) returns 2 (1 and NA). Add na.rm = TRUE to exclude.

Pitfall 2: passing multiple cols treats them as combinations. n_distinct(x, y) counts unique (x, y) PAIRS, not unique values in either. To count separately, call twice.

Warning

n_distinct() is faster than length(unique(x)) but they CAN differ on edge cases. With factors and NA handling, results may vary. Pick one in a project and stick with it for consistency.

Performance note

For very large data, n_distinct() is faster than length(unique(x)) thanks to a hash-based implementation. On vectors with millions of elements the difference can be 2-10x. For everyday data sizes (thousands to hundreds of thousands of rows) both functions feel instant, so pick by readability and consistency. Inside dplyr pipelines, n_distinct is the idiomatic choice. For data.table users, uniqueN() plays the same role with similar performance. The underlying algorithm uses a hash set internally rather than building the full unique vector, which is what saves memory and time on big inputs.

Try it yourself

Try it: For each cyl group in mtcars, count the unique number of gears AND the unique number of carburetors. Save to ex_uniq.

RYour turn: 2 unique-counts per group

ex_uniq <- mtcars |> # your code here ex_uniq #> Expected: 3 rows (one per cyl) with n_gear and n_carb columns

Click to reveal solution

RSolution

ex_uniq <- mtcars |> group_by(cyl) |> summarise( n_gear = n_distinct(gear), n_carb = n_distinct(carb), .groups = "drop" ) ex_uniq #> # A tibble: 3 x 3 #> cyl n_gear n_carb #> 4 3 2 #> 6 3 4 #> 8 2 4

Explanation: n_distinct(gear) counts unique gear values per cyl. Same for carb. Two summary stats per group.

After mastering n_distinct, look at:

n(): row count of current group
count(df, g): count rows per group
distinct(df, col): keep one row per unique value
unique(x): base R; show the unique values
summarise(): standard aggregation context
group_by(): per-group n_distinct

For "show me the actual unique rows", distinct(df, col, .keep_all = TRUE) is the cleaner tool.

FAQ

What does n_distinct do in dplyr?

n_distinct(x) returns the count of unique values in x as an integer. Faster than length(unique(x)) and integrates with summarise / mutate.

What is the difference between n_distinct and length(unique())?

Both return the same count. n_distinct is faster (hash-based) and is the dplyr-native idiom. length(unique()) is base R; works anywhere.

How do I exclude NAs from n_distinct?

Pass na.rm = TRUE: n_distinct(x, na.rm = TRUE). Default counts NA as one of the distinct values.

How do I count unique combinations of multiple columns?

n_distinct(x, y, z) counts unique tuples across the three vectors. Pass each column as a separate argument.

Is n_distinct different from n() in dplyr?

Yes. n() counts ROWS in the current group. n_distinct(col) counts UNIQUE values in a column. Different questions.

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

dplyr n_distinct() in R: Count Unique Values Fast

What n_distinct() does in one sentence

Syntax

Five common patterns

1. Unique values in one column

2. Inside summarise

3. Multi-column combinations

4. Excluding NAs

5. Inside mutate (per-group)

n_distinct() vs length(unique()) vs distinct() vs n()

A practical workflow

Common pitfalls

Performance note

Try it yourself

FAQ

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

dplyr n_distinct() in R: Count Unique Values Fast

What n_distinct() does in one sentence

Syntax

Five common patterns

1. Unique values in one column

2. Inside summarise

3. Multi-column combinations

4. Excluding NAs

5. Inside mutate (per-group)

n_distinct() vs length(unique()) vs distinct() vs n()

A practical workflow

Common pitfalls

Performance note

Try it yourself

Related dplyr functions

FAQ