data.table frollsum() in R: Fast Rolling Window Sums
data.table frollsum() computes a fast rolling sum over a sliding window, returning a trailing total for every element of a numeric vector. It is the C-optimized way to build moving totals, rolling counts, and windowed aggregates in R.
frollsum(x, 3) # 3-element trailing sum frollsum(x, c(3, 7)) # two windows at once frollsum(x, 3, align = "center") # centered window frollsum(x, 3, na.rm = TRUE) # skip NAs inside the window frollsum(x, 3, fill = 0) # fill incomplete windows with 0 dt[, s := frollsum(v, 3), by = grp] # rolling sum per group frollsum(x, lens, adaptive = TRUE) # per-element window sizes
Need explanation? Read on for examples and pitfalls.
What frollsum() does
frollsum() returns a rolling sum, not a single total. For each position in a numeric vector it adds up the current value plus the preceding values that fall inside a window of width n. The output is the same length as the input, so it slots straight into a data.table column. It is written in C, which makes it far faster than a hand-rolled loop or sapply().
frollsum() syntax and arguments
The signature is compact but every argument changes the window. The core call is frollsum(x, n).
| Argument | Purpose |
|---|---|
x |
Numeric vector, or a list of vectors, to roll over. |
n |
Window width. A single integer, or a vector for several windows at once. |
fill |
Value placed where the window is incomplete (default NA). |
align |
Window position: "right" (trailing), "left" (leading), "center". |
na.rm |
If TRUE, missing values are dropped inside each window. |
adaptive |
If TRUE, n is a per-element vector of window lengths. |
The first n - 1 positions have no full window, so they receive fill. With the default align = "right", the window ends on the current element, meaning each result includes the current row.
frollsum(x, 3) at position 5 sums elements 3, 4, and 5, not 2, 3, and 4. For a trailing total that excludes today, shift the input first with shift(x, 1).frollsum() examples
Start with a plain vector to see the sliding window. Each result is the sum of the current value and the two before it.
The first two slots are NA because a 3-wide window cannot fill yet. Position 3 is 1 + 2 + 3 = 6.
Pass several window widths in one call. frollsum() then returns a list, one vector per window.
Compute a rolling total per group inside a data.table. Combine frollsum() with by and the walrus := operator to add the column by reference.
The by = store clause restarts the window at each store, so no total ever leaks across groups.
Sum a 0/1 indicator to count events in a window. This is the sum-specific trick frollmean() cannot do: a rolling count.
Position 4 reports 2 events in the last four days; position 5 reports 3.
frollsum() vs frollmean and cumsum
Pick the function by the shape of total you need. All three aggregate, but over different spans.
| Function | Window | Returns |
|---|---|---|
frollsum(x, n) |
Fixed n-wide sliding window |
Rolling total |
frollmean(x, n) |
Fixed n-wide sliding window |
Rolling average |
cumsum(x) |
Expanding window from element 1 | Running total |
Use frollsum() when only the last n observations matter, such as a trailing 7-day revenue figure. Use cumsum() when every past value should keep contributing. Use frollmean() when you want the level rather than the total.
frollsum(x, 3) is series.rolling(3).sum(). pandas defaults to a right-aligned window too, so the results line up directly.Common pitfalls
A single NA wipes out the whole window. With the default na.rm = FALSE, any missing value inside a window makes that result NA. Set na.rm = TRUE to skip it.
frollsum(1:3, 5) yields NA NA NA because no window is ever complete. Check length(x) against n before rolling, especially inside by groups where some groups may be short.Two more traps: forgetting by in a data.table mixes groups into one window, and assuming the window excludes the current row when align = "right" actually includes it.
Try it yourself
Try it: Build a rolling 3-day sum of daily and store it in ex_roll. The first two values should be NA.
Click to reveal solution
Explanation: With n = 3 the first two positions lack a full window and return NA. Position 3 sums 4 + 8 + 6 = 18, and the window then slides one element at a time.
Related data.table functions
- frollmean() computes a rolling average over the same sliding window.
frollapply()runs any custom function over a rolling window.- shift() lags or leads a column, useful for excluding the current row.
frollmax()andfrollmin()return rolling extremes.cumsum()gives an expanding running total instead of a fixed window.
See the official rolling functions reference for the full argument list.
FAQ
What is the difference between frollsum and rollsum?
frollsum() is the data.table function; rollsum() is from the zoo package. They compute the same rolling sum, but frollsum() is written in C, runs faster on large vectors, and supports multiple windows in one call. If you already use data.table, frollsum() avoids a second dependency and integrates cleanly with := and by.
How do I compute a rolling sum by group in R?
Call frollsum() inside a data.table with the by argument: dt[, total := frollsum(value, 3), by = group]. The by clause restarts the window at each group boundary, so totals never leak from one group into the next. This is the standard pattern for per-store or per-customer rolling metrics.
Why does frollsum return NA at the start?
The first n - 1 elements do not have enough preceding values to fill a window of width n, so frollsum() places the fill value there, which is NA by default. Pass fill = 0 to use zero instead, or trim those rows if a partial window is meaningless for your analysis.
Can frollsum use a variable window size?
Yes. Set adaptive = TRUE and pass n as a vector the same length as x, where each element gives the window width for that position. This is useful when the window should grow over time or depend on another column, such as days-since-signup.
Is frollsum faster than a for loop?
Yes, substantially. frollsum() runs its sliding window in compiled C code, so it is typically orders of magnitude faster than an R for loop or sapply() on large vectors. For millions of rows the difference is the gap between milliseconds and seconds.