dplyr cumall() in R: Cumulative All-True Across a Vector
The cumall() function in dplyr returns TRUE for each position as long as ALL preceding logical values were TRUE. Once a FALSE appears, every later position is FALSE. It is the cumulative version of all().
cumall(c(TRUE, TRUE, FALSE, TRUE)) # T T F F cumall(x > 0) # streak of positives from start cumall(!is.na(x)) # take rows until first NA df |> filter(cumall(value > threshold)) df |> arrange(date) |> filter(cumall(complete)) cumany(c(FALSE, FALSE, TRUE, FALSE)) # opposite: F F T T
Need explanation? Read on for examples and pitfalls.
What cumall() does in one sentence
cumall(x) returns a logical vector where position i is TRUE if and only if x[1] & x[2] & ... & x[i] are all TRUE. As soon as ONE FALSE appears, every later position is FALSE.
This is the cumulative version of all(). It answers: "up to this point, has every value been TRUE?" The opposite is cumany().
Syntax
cumall(x). x is a logical vector. Returns a logical vector of the same length.
filter(cumall(condition)) keeps rows from the start of a (sorted) data frame UNTIL the first row where condition is FALSE. Powerful idiom for "take rows until X happens".Five common patterns
1. Streak of positives from start
The streak ends at position 4 where the first negative appears.
2. Take rows until first NA
Stops at the first NA. Subsequent rows are dropped even if non-NA.
3. Take rows below a threshold
Once a value exceeds 0.5 (row 4), all subsequent rows are excluded.
4. Per-group cumall
Within each user, take rows until the first FALSE.
5. Sorted timeline
cumall() and cumany() are mirrors. cumall(x) is "every value up to here is TRUE"; cumany(x) is "at least one value up to here is TRUE". Together they let you express "until X happens" and "since X happened" idioms cleanly.cumall() vs cumany() vs all() vs cumsum()
Four cumulative / aggregate logical functions in R.
| Function | Output | Returns | Best for |
|---|---|---|---|
cumall(x) |
Same-length logical | Streak of TRUE from start | "Take until first FALSE" |
cumany(x) |
Same-length logical | TRUE once any TRUE seen | "Mark from first TRUE on" |
all(x) |
Single boolean | TRUE if every value TRUE | Final check |
any(x) |
Single boolean | TRUE if any value TRUE | Final check |
cumsum(x) |
Same-length numeric | Running sum | Count or accumulate |
When to use which:
cumall(cond)to keep rows up to the first failing condition.cumany(cond)to keep rows from the first matching condition onward.all/anyfor end-of-vector summaries.
A practical workflow
The "take until / drop after" pattern is the killer use case for cumall + filter.
Returns rows from the start until the first non-compliant event, in chronological order. Common in audit / SLA workflows.
The mirror pattern uses cumany:
Returns rows from the first triggered event onward.
Common pitfalls
Pitfall 1: order matters. cumall reads left-to-right, so the data must be in the desired order. Sort with arrange(...) first.
Pitfall 2: NA handling. cumall(c(TRUE, NA, TRUE)) returns c(TRUE, NA, NA). Once NA appears, the streak ends. Filter NAs before cumall if you don't want this.
filter(cumall(cond)) keeps rows TILL the first FALSE; filter(cond) keeps ALL rows where cond is TRUE. Subtle but completely different semantics. Make sure cumall is what you want.Try it yourself
Try it: From a sequence of integer scores, keep only the rows up to (and including) the first time the score drops below 50. Save to ex_streak.
Click to reveal solution
Explanation: cumall(score >= 50) is TRUE for rows 1-3, FALSE from row 4 onward (because score=45 fails). Filter keeps the TRUE rows.
Related dplyr functions
After mastering cumall, look at:
cumany(): mirror; cumulative any-TRUEcummean(): running meancumsum(),cumprod(),cummin(),cummax(): base R cumulativeslead()/lag(): shift values for transition detectionrle(): run-length encoding for streak countingsliderpackage: rolling-window operations
For more flexible cumulative operations (rolling window, custom function), the slider package generalizes cumulative patterns.
FAQ
What does cumall do in dplyr?
cumall(x) returns a logical vector where each position is TRUE if EVERY preceding value was TRUE. As soon as one FALSE appears, every later position is FALSE.
What is the difference between cumall and all in R?
all(x) returns ONE logical value: TRUE if every element of x is TRUE. cumall(x) returns a vector the same length as x: TRUE up to the first FALSE, then all FALSE.
What is the difference between cumall and cumany?
cumall is the running version of all (every value so far is TRUE). cumany is the running version of any (at least one value so far is TRUE). They are mirrors.
How do I take rows until a condition fails?
df |> filter(cumall(condition)). Sort the data first if order matters. The filter keeps rows until the first FALSE.
Does cumall handle NA?
Once NA appears in the input, all subsequent positions become NA in the output (because TRUE & NA is NA). Filter NAs first if this is unwanted.