dplyr lead() in R: Look at the Next Row's Value
The lead() function in dplyr returns the value from the row N positions AFTER the current row, padding with NA at the end. It is the mirror of lag() and the standard tool for "next-row" comparisons.
lead(x) # next value (n=1) lead(x, n = 2) # 2 ahead lead(x, default = 0) # fill end with 0 instead of NA lead(x, order_by = ts) # respect timestamp order df |> mutate(next_val = lead(value)) df |> group_by(g) |> mutate(next_val = lead(value)) lag(x) # mirror: previous value
Need explanation? Read on for examples and pitfalls.
What lead() does in one sentence
lead(x, n = 1, default = NA) returns a vector where each position holds the value n rows AFTER the current position; positions past the end are filled with default. It is the natural complement to lag().
Used for time-series differencing, transition detection, and "what's next" calculations.
Syntax
lead(x, n = 1, default = NA, order_by = NULL). n is the lead amount; default fills the trailing NA slots.
lead() and lag() whenever you need to compare consecutive rows. Together they handle "before" and "after" comparisons cleanly without manual indexing.Five common patterns
1. Next-row value
2. Differences (forward-looking)
3. n rows ahead
The last 2 positions become NA (default).
4. Custom default
5. Per-group lead
arrange(time) (or lead(x, order_by = time)) to be explicit.lead() vs lag() vs diff() vs shift
Four ways to access neighboring values in R.
| Function | Direction | Output | Best for |
|---|---|---|---|
lead(x) |
Next row | Same length, NA at end | "What is next?" |
lag(x) |
Previous row | Same length, NA at start | "What was before?" |
diff(x) |
Difference | Length n-1 | Quick differencing |
data.table::shift(x) |
Either direction | Same length | data.table speed |
When to use which:
lead/lagfor dplyr pipelines and per-group window operations.diff(x)forc(NA, x[-1] - x[-length(x)])style differencing.data.table::shiftfor very large data and high-performance code.
A practical workflow
The most common lead pattern is forward differencing or transition detection.
For each event, compute time to the next event. Sorted, grouped, lead handles the lookup.
For transition detection, compare adjacent values:
Marks rows where the next state differs from the current.
Common pitfalls
Pitfall 1: forgetting to arrange. lead is purely positional. If the data isn't sorted, "next row" is meaningless. Always arrange(time_col) first.
Pitfall 2: per-group surprise. On a grouped tibble, lead resets at each group boundary. The LAST row of each group has NA. Often desired but sometimes surprising.
default is NA, which propagates through arithmetic. lead(x) - x gives NA in the last row. Set default = 0 (or another sentinel) if you need a non-NA value at the trailing edge.Why lead and lag together cover most window needs
Most "look across rows" questions fall into two categories: backward (lag) or forward (lead). A pair of these functions, combined with arrange, group_by, and arithmetic, covers period-over-period change, transition detection, rolling differences, and gap calculation. For more complex windows like "average of last 7 days", you outgrow lag/lead and reach for the slider package. For most everyday time-series and longitudinal analyses though, lag and lead are sufficient. They are stable, well-tested, and produce predictable per-group behavior.
Try it yourself
Try it: For chronological log entries, compute the time gap until the next event for each row. Save to ex_gaps.
Click to reveal solution
Explanation: lead(time) - time gives the difftime to the next event. Convert to minutes via as.numeric(..., units = "mins"). Last row is NA because there is no "next".
Related dplyr functions
After mastering lead, look at:
lag(): previous row's value (mirror)cumsum(),cummean(), etc: cumulative aggregatesgroup_by(): per-group window operationsarrange(): sort before lead/lagdata.table::shift(): equivalent in data.tableslider::slide_dbl(): rolling-window operations
For multi-position lookups in one call, slider::slide_dbl(x, identity, .before = 0, .after = 5) returns a vector of next-5-values per row.
FAQ
What does lead do in dplyr?
lead(x, n = 1) returns a vector where each position holds the value n rows AHEAD of the current position. Positions past the end are filled with NA (or default).
What is the difference between lead and lag in dplyr?
lead(x) looks at the NEXT row. lag(x) looks at the PREVIOUS row. Mirror operations; identical arguments otherwise.
How do I compute a forward difference with lead?
lead(x) - x gives the change from current to next row. The last position is NA because there is no "next".
Why does my lead result have NAs at the end?
Because there is no row after the last position. lead defaults default = NA for those slots. Set default = 0 or another value to avoid NA.
How do I lead within groups?
df |> group_by(g) |> mutate(next_val = lead(x)). group_by makes lead reset at each group boundary; the last row per group has NA.