dplyr ntile() in R: Bin Values into N Equal-Count Groups
The ntile() function in dplyr divides a vector into n approximately equal-count quantile bins, returning an integer 1 to n for each value. It is the rank-based binning function for quartiles, deciles, and any equal-count split.
ntile(1:10, 4) # 4 quartiles ntile(x, 10) # 10 deciles ntile(desc(x), 4) # reverse direction df |> mutate(quartile = ntile(score, 4)) df |> group_by(g) |> mutate(quartile = ntile(score, 4)) cut(x, breaks = quantile(x, probs = seq(0, 1, 0.25))) # value-based bins (different)
Need explanation? Read on for examples and pitfalls.
What ntile() does in one sentence
ntile(x, n) returns an integer between 1 and n indicating which quantile bin each value falls into; bins are sized to be approximately equal in COUNT. With 100 rows and n = 4, you get 25 rows per bin.
The standard tool for "split this column into quartiles / deciles / quintiles".
Syntax
ntile(x, n). Returns integers 1..n. Ties are broken arbitrarily by ntile.
Three values per bin (12 / 4 = 3).
ntile(x, n) always returns integers 1 to n. This makes it perfect for grouping or color-coding by quantile. Combine with factor() to get ordered factor labels.Five common patterns
1. Quartiles (n = 4)
Each quartile holds 2 values (8 / 4 = 2).
2. Deciles (n = 10)
Useful for "top 10%", "next 10%", etc.
3. Inside mutate as a derived column
32 rows / 4 = 8 rows per quartile.
4. Reverse direction (highest = bin 1)
desc() makes the highest value get bin 1.
5. Per-group ntile
ntile(x, n) produces equal-COUNT bins; cut(x, breaks) produces equal-WIDTH bins. They are different. Equal-count is what you want for "top decile" thinking; equal-width is what you want for histogram-like binning. Pick based on whether bin SIZE matters (equal-count) or bin RANGE matters (equal-width).ntile() vs cut() vs quantile() vs percent_rank()
Four binning / quantile functions in R.
| Function | Output | Bin type | Best for |
|---|---|---|---|
ntile(x, n) |
Integer 1..n | Equal count | Quartile / decile assignment |
cut(x, breaks) |
Factor | Equal width or custom | Histogram-style binning |
cut(x, quantile(x, probs)) |
Factor | Quantile-based | Same as ntile but factor output |
quantile(x, probs) |
Numeric values | (returns thresholds) | Compute percentile boundaries |
percent_rank(x) |
0 to 1 | (continuous) | Relative position, not bins |
When to use which:
ntilefor clean integer bin assignment.cutif you need factor labels or specific cutpoints.quantileto find the actual percentile values (e.g., median = quantile(x, 0.5)).percent_rankfor continuous relative position.
A practical workflow
The "quartile-bucketed analysis" pattern is ntile's main use case.
Bucket products by price quartile, then aggregate revenue per quartile. Useful for quick "are higher-priced items more profitable" analyses.
For deciles with named labels:
Common pitfalls
Pitfall 1: ties broken arbitrarily. ntile uses row order to break ties. Two rows with the same value may end up in different bins. For deterministic ties, sort the data first.
Pitfall 2: NA propagation. NAs in x become NA in the output. Filter NAs before ntile if integer output is required.
ntile(x, n) does NOT compute true percentiles. Bins are equal-count, not bounded by exact percentile cutpoints. If you need quartile boundaries (Q1 = 25th percentile, etc.), use quantile(x, probs) instead.Try it yourself
Try it: Bin mtcars$mpg into 5 quintiles (1 = lowest, 5 = highest). Save to ex_quintile.
Click to reveal solution
Explanation: ntile(mpg, 5) splits the 32 cars into 5 quintiles. 32/5 is not even; some bins get an extra row.
Related dplyr functions
After mastering ntile, look at:
cume_dist()/percent_rank(): continuous relative positionmin_rank()/dense_rank(): integer rankcut(): base R; bins by value cutpointsquantile(): compute percentile boundariesforcats::cut_number()/cut_interval(): ggplot-style binningdplyr::case_when(): custom bin definitions
For ggplot binning, forcats::cut_number() is similar to ntile but returns a factor.
FAQ
What does ntile do in dplyr?
ntile(x, n) divides x into n approximately equal-count bins and returns an integer 1..n for each value indicating its bin. With 100 rows and n = 4, each bin has 25 rows.
What is the difference between ntile and cut in R?
ntile(x, n) produces equal-COUNT bins. cut(x, n) produces equal-WIDTH bins. ntile is for quantile assignment; cut is for value-range binning.
How do I get quartiles with ntile?
ntile(x, 4) returns 1, 2, 3, or 4 per value. The smallest 25% are in bin 1; the largest 25% are in bin 4.
Does ntile handle ties?
Yes, but ties are broken arbitrarily by row order. Two rows with the same value may end up in different bins. Sort first for determinism.
How do I reverse ntile direction?
Wrap in desc(): ntile(desc(x), n). The highest values get bin 1.