data.table .SDcols in R: Apply Functions to Many Columns
The .SDcols argument in data.table picks which columns .SD exposes inside j, so one lapply call can summarize or rewrite many columns selected by name, index, regex, or a type predicate like is.numeric.
dt[, lapply(.SD, mean), .SDcols = c("mpg","hp")] # by name
dt[, lapply(.SD, mean), .SDcols = is.numeric] # by type predicate
dt[, lapply(.SD, mean), .SDcols = patterns("^d")] # by regex
dt[, lapply(.SD, mean), .SDcols = !c("vs","am")] # by exclusion
dt[, lapply(.SD, mean), .SDcols = mpg:hp] # by range
dt[, lapply(.SD, mean), by = cyl, .SDcols = c("mpg","hp")] # combined with by
dt[, (cols) := lapply(.SD, round, 1), .SDcols = cols] # update in placeNeed explanation? Read on for examples and pitfalls.
What .SDcols does in one sentence
.SDcols is the column filter for .SD. Inside DT[i, j, by, .SDcols = ...], the argument tells data.table which columns to include in the Subset of Data that j works on. Without .SDcols, .SD contains every column except those named in by.
The filter accepts many input shapes: a character vector, an integer index, a column range, a regex via patterns(), or even a function like is.numeric. That flexibility is what makes .SDcols the standard idiom for applying one operation across many related columns.
Syntax
.SDcols only has meaning inside the DT[i, j, by] bracket call. It pairs with .SD and never appears on its own.
.SDcols accepts any of these selectors:
| Selector form | Example | When to use |
|---|---|---|
| Character vector | c("mpg", "hp") |
Explicit list of columns |
| Integer index | 2:4 or -1 |
Position-based, drop one column |
| Column range | mpg:hp |
Contiguous block by name |
| Logical negation | !c("vs", "am") |
Drop a few, keep the rest |
| Regex helper | patterns("^d") |
Match a prefix or suffix |
| Predicate function | is.numeric |
Type-based selection |
The selected columns become available as .SD inside j. The columns named in by are excluded automatically and do not need to be removed by hand.
.SDcols is the argument, not a function. It does not take parentheses on its own. Write .SDcols = is.numeric, not .SDcols(is.numeric). The latter is a syntax error.Examples by use case
Each selector form below solves a different column-picking problem. Load data.table and convert mtcars to a data.table once; the examples reuse DT through the section.
Select columns by name. Pass a character vector of column names; the result has one row per group when combined with by, or one row total when by is omitted.
Select columns by regex. The patterns() helper accepts one or more regex strings and returns the matching column names. Useful when columns share a prefix or suffix.
Select columns by type. Pass a predicate function like is.numeric; data.table evaluates it on each column and keeps the ones returning TRUE. This is the cleanest way to aggregate every numeric column without hard-coding names.
Update many columns in place. Combine .SDcols with := to overwrite the chosen columns; this avoids copying the whole table.
dt[, (cols) := ...] updates the columns named in cols; without the parens, data.table creates one literal column called cols. The parentheses force evaluation of the variable..SDcols decouples column selection from the operation. The same lapply(.SD, fn) pattern handles three of name lists, regex matches, and type predicates without rewriting j. Treat .SDcols as the where and j as the what.Common pitfalls
Forgetting that by columns are excluded. If cyl appears in by, it will not be in .SD even when .SDcols = is.numeric would otherwise match it. Add it back explicitly in j if you need it in the output.
Confusing .SDcols = cols with .SDcols = "cols". The first uses the variable's value (a vector of names). The second selects the literal column called cols, which probably does not exist. Always use the bare variable.
Using a non-vectorized function inside lapply(.SD, fn). .SD is a list of columns, so fn must accept one column at a time. lapply(.SD, sum) works; lapply(.SD, function(x) x + .SD$other) does not, because each call sees only one column. Reach for mapply or write an explicit loop when columns interact.
patterns() only works inside melt, dcast, and .SDcols. Calling patterns("^d") at the top level throws an error. It is a data.table helper recognized only by these contexts.Try it yourself
Try it: Use iris to compute the mean of every column whose name ends in Length, grouped by Species. Save the result to ex_means.
Click to reveal solution
Explanation: patterns("Length$") matches column names ending in Length, so .SD exposes only Sepal.Length and Petal.Length. lapply(.SD, mean) then runs once per group.
Related data.table keywords
.SD: the Subset of Data itself..SDcolsdecides which columns are in it..N: the row count of the current group. Pairs well with.SDfor picking the last row.by: the grouping keys. Columns named here are excluded from.SD.:=: in-place assignment. Combine with.SDcolsto update many columns at once.patterns(): regex selector that data.table recognizes inside.SDcols,melt, anddcast.
FAQ
What is the difference between .SD and .SDcols in data.table?
.SD is the data, .SDcols is the filter. .SD evaluates inside j to a data.table holding the current group's rows. .SDcols is the argument that tells data.table which columns to put into .SD before j runs. You almost always use them together: .SDcols shapes .SD, and j operates on .SD.
Can I use .SDcols without lapply?
Yes. .SDcols works with any j expression that references .SD. Common shapes include DT[, .SD, .SDcols = cols] for selecting columns, DT[, head(.SD, 2), .SDcols = cols] for the first two rows of a subset, and DT[, .SD[1L], by = grp, .SDcols = cols] for the first row per group. lapply is the most common partner but not the only one.
How do I pick columns by type with .SDcols?
Pass a predicate function. .SDcols = is.numeric keeps numeric columns; .SDcols = is.character keeps character columns. data.table calls the function on each candidate column and keeps the ones returning TRUE. The function must return a single logical value per column.
Does .SDcols accept negative selection?
Yes. Two shapes work: .SDcols = !c("vs", "am") (logical negation of a name vector) and .SDcols = -c(8, 9) (negative integer index). Both drop the listed columns and keep the rest. The first is safer because it survives column reordering.
What does .SDcols stand for?
The SD portion refers to the Subset of Data symbol .SD, and cols is short for columns. So .SDcols reads as "the columns that go into Subset of Data". The leading dot follows the data.table convention for special symbols that only exist inside the bracket call.