data.table shift() in R: Lag and Lead Columns
The data.table shift() function in R moves the values of a vector or column forward (lag) or backward (lead) by a fixed number of positions. It is the fast, vectorized way to compare each row with an earlier or later one.
shift(x) # lag by 1 (the default) shift(x, type = "lead") # lead by 1 shift(x, n = 3) # lag by 3 positions shift(x, fill = 0) # fill the gap with 0, not NA shift(x, n = 1:2) # list: lag 1 and lag 2 at once shift(x, type = "cyclic") # wrap end values to the front DT[, p := shift(v), by = g] # lag within each group
Need explanation? Read on for examples and pitfalls.
What shift() does
shift() slides every value to a new position. A lag pushes values down so each row sees the one before it. A lead pulls values up so each row sees the one after it. The vector keeps its original length, and the positions left empty at the edge are filled with NA.
This is the building block for period-over-period analysis. To compute a daily change, a growth rate, or a "same value as yesterday" flag, you need each row paired with its neighbour. shift() produces that neighbour as a plain column.
The lag drops NA into position 1 because no row precedes it. The lead drops NA into the last position for the same reason at the other end.
shift() syntax and arguments
shift() takes one vector or list plus four optional controls. The defaults give a single-step lag filled with NA, so calling shift(x) alone is the most common form.
| Argument | Default | Purpose |
|---|---|---|
x |
required | A vector, or a list of vectors of equal length |
n |
1L |
How many positions to move; accepts a vector for several shifts |
fill |
NA |
Value placed in the gap created at the edge |
type |
"lag" |
One of "lag", "lead", "shift", or "cyclic" |
give.names |
FALSE |
If TRUE, names the output list elements automatically |
The type = "shift" option is an alias that behaves like "lag" for positive n. The "cyclic" option is the odd one out: instead of inserting fill, it wraps the values pushed off one end around to the other.
shift() examples by use case
Four patterns cover almost every real use of shift(). Each adds one argument: a custom fill, several lags at once, a cyclic wrap, and finally a shift inside a grouped data.table.
Replace the NA gap with a fill value
The fill argument controls what lands in the empty edge slot. Passing fill = 0 keeps the column fully numeric, which avoids NA propagation in later arithmetic.
Produce several lags in one call
Pass a vector to n and shift() returns a list. Each list element is one lag, which is handy when a model needs values from the last few periods as separate columns.
Wrap values with a cyclic shift
The cyclic type recycles edge values rather than discarding them. The value pushed off the bottom reappears at the top, so no NA is ever introduced.
Shift within each group
Inside a data.table, combine shift() with by to lag per group. This is the single most important pattern: it stops the last value of one group from leaking into the first row of the next.
Notice row 4: group b starts fresh with NA, not the 19 from group a. From there, a change column is one subtraction away.
shift() works on physical row order, not on any logical ordering. Run setorder(DT, grp, date) first so each lag pulls the genuinely previous period and not a random earlier row.shift() compared with alternatives
Several functions move data, but shift() is the data.table-native choice. It is fast, works inside DT[...], and returns the same length it received.
| Function | Package | Notes |
|---|---|---|
shift() |
data.table | Lag and lead in one function; vectorized n; group-aware with by |
lag() / lead() |
dplyr | Tidyverse equivalent; one function per direction |
lag() |
stats | Shifts a time series object's index, not its values; rarely what you want |
head() / tail() with padding |
base R | Manual, error-prone, changes length unless you re-pad |
Use shift() whenever the data already lives in a data.table. Reach for dplyr's lag() and lead() only when the surrounding pipeline is tidyverse.
shift(x) is lag(x), and shift(x, type = "lead") is lead(x). data.table folds both directions into one function controlled by type.Common pitfalls
Three mistakes account for most shift() bugs. Each has a quick fix.
First, forgetting by on grouped data. Without it, shift() lags across the whole table and the first row of every group inherits the previous group's last value. Always add by = grp.
Second, shifting an unsorted table. shift() trusts row order. If the rows are not in time order, the "previous" value is meaningless. Call setorder() before shifting.
Third, expecting n = -1 to mean lead. A negative n does flip direction, but type = "lead" is the clear, readable way to express intent.
Try it yourself
Try it: Lag the mpg column of mtcars by one row, filling the gap with 0. Save the result to ex_lagged.
Click to reveal solution
Explanation: shift() with n = 1 lags the vector one position and fill = 0 replaces the leading NA with 0, keeping the column numeric.
Related data.table functions
These functions pair naturally with shift() for ordered-data work:
frollmean()computes a rolling mean over a sliding window.frollsum()computes a rolling sum for moving totals.rleid()numbers consecutive runs of identical values.rowid()gives a row counter within each group.setorder()sorts a data.table in place so shifts respect time order.
FAQ
What is the difference between shift() and lag() in R?
shift() is the data.table function and lag() is the dplyr (or base stats) function. data.table's shift() handles both directions through its type argument, so shift(x) lags and shift(x, type = "lead") leads. dplyr splits this into two functions, lag() and lead(). The base stats::lag() is different again: it shifts a time series object's time index rather than its values, which surprises most users.
How do I shift a column within groups in data.table?
Add a by clause to the data.table call: DT[, prev := shift(val), by = grp]. The by argument makes shift() restart at every group boundary, so the first row of each group gets NA instead of leaking the previous group's last value. Sort the table with setorder() first if the rows are not already in order.
Why does shift() return NA values?
shift() keeps the vector the same length, so the positions exposed at the edge have nothing to point to. A lag empties the first n rows and a lead empties the last n. Pass the fill argument, for example fill = 0, to replace those NA values with a constant of your choice.
Can shift() move data in both directions?
Yes. Set type = "lead" to pull values backward so each row sees the next one, and keep the default type = "lag" to push values forward. You can also pass a negative n, which reverses direction, but type is the clearer way to express intent in code.
Does shift() work on a whole data.table at once?
shift() accepts a list of equal-length vectors, and a data.table is a list of columns, so shift(DT) lags every column and returns a list. In practice you usually shift specific columns inside DT[, ...] with .SD and .SDcols for clean, named output.