data.table setDT() in R: Convert by Reference
The data.table setDT() function converts a list or data.frame into a data.table by reference, modifying the object in place with no copy. It is the fastest way to upgrade an existing object to a data.table.
setDT(df) # data.frame to data.table in place setDT(my_list) # named or unnamed list to data.table setDT(df, keep.rownames = TRUE) # keep row names as column "rn" setDT(df, keep.rownames = "id") # row names into a named column setDT(df, key = "grp") # convert and set a key at once setDT(df)[, sum(x), by = grp] # use the result in a compound call setDF(dt) # reverse it: data.table back to data.frame
Need explanation? Read on for examples and pitfalls.
What setDT() does in one sentence
setDT() coerces an object to a data.table without copying it. You pass a list or a data.frame, and the same object becomes a data.table in memory. Unlike most R functions, which return a modified copy and leave the original alone, setDT() changes its input directly. This makes it both fast and memory-light, since no second copy of the data ever exists.
The function belongs to data.table's family of set* functions, all of which modify by reference. The payoff shows on large objects: converting a million-row data.frame with as.data.table() briefly holds two copies in memory, while setDT() holds only one.
Syntax
The signature is short, but each argument changes the result. Here is the full call:
The arguments are:
x: the list or data.frame to convert. Adata.tablepassed in is returned unchanged.keep.rownames: ifTRUE, row names are moved into a new column calledrn. Pass a string to name that column yourself. DefaultFALSEdrops row names.key: a character vector of column names to set as the data.table key during conversion. Same effect as callingsetkey()afterward.check.names: ifTRUE, syntactically invalid column names are repaired, the same waydata.frame()does it.
setDT() returns the converted object invisibly. That return value lets you chain straight into a data.table query, as shown later.
Examples by use case
Start with the most common case: a data.frame. The mtcars dataset is a base R data.frame, so it is a good test subject.
Notice that df itself changed class. No assignment was needed, because setDT() modified df by reference.
setDT() also converts plain lists. A named list becomes columns by name; an unnamed list gets V1, V2, and so on.
Use keep.rownames when the row names carry data. A plain setDT() would discard the car names in mtcars, so keep them as a column.
Set a key during the conversion to skip a step. Passing key sorts the table and marks the key column in one call.
Compare with as.data.table() and setDF()
setDT() is one of three conversion routes, and they differ on copying. Pick the function that matches whether you need the original preserved.
| Function | Direction | Copies? | Use when |
|---|---|---|---|
setDT() |
list/data.frame to data.table | No, by reference | You no longer need the original object |
as.data.table() |
any object to data.table | Yes, returns a copy | You must keep the original intact |
setDF() |
data.table to data.frame | No, by reference | You need a plain data.frame back |
The decision rule is simple. If keeping the source object as it was matters, use as.data.table(), which leaves the input alone. If the source object is disposable and speed matters, use setDT().
setDT(df), there is no separate data.table, df is the data.table. as.data.table(df) instead produces a second object and leaves df as a data.frame. That single distinction explains every behavior difference between the two.Common pitfalls
setDT() modifies the caller's variable, which surprises people. Because it works by reference, the change is visible everywhere the object is referenced, even outside the current function.
Here twin was never passed to setDT(), yet it changed too, because it pointed at the same object.
safe <- copy(df) before setDT(df), or use as.data.table(df) instead. A plain safe <- df does not protect you, since both names still point to one object.A second trap is calling setDT() on something already a data.table. That is harmless, it simply returns the object, but the missing assignment can mislead readers into thinking nothing happened.
setDT() returns the object, you can write setDT(df)[, sum(x), by = grp] in a single line, converting and querying together.Try it yourself
Try it: Convert the airquality data.frame to a data.table by reference and set Month as its key. Save the result to ex_aq.
Click to reveal solution
Explanation: Passing key = "Month" to setDT() converts the object and sets the key in one call, so a separate setkey() is not needed.
Related data.table functions
setDT() works alongside the rest of the conversion and setup toolkit. Explore these next:
as.data.table(): copy-based conversion when the original must survive.setDF(): the reverse trip, data.table back to data.frame by reference.fread(): read a file directly into a data.table, no conversion step needed.setkey(): set or change the key on an existing data.table.rbindlist(): bind a list of data.tables or data.frames into one.
See the official setDT reference for the complete argument list.
FAQ
What is the difference between setDT() and as.data.table()?
setDT() converts by reference, modifying the object in place with no copy, and works only on lists and data.frames. as.data.table() returns a new copy and leaves the original unchanged, and it accepts more input types such as vectors and matrices. Use setDT() when the original object is disposable and speed matters; use as.data.table() when you must keep the source intact.
Does setDT() modify the original data frame?
Yes. setDT() changes the object in place, so the variable you passed in becomes a data.table without any assignment. Any other variable pointing at the same object also reflects the change. If you need the original data.frame preserved, call copy() on it first or use as.data.table() instead.
Can setDT() convert a list to a data.table?
Yes. setDT() accepts both named and unnamed lists. A named list becomes columns named after the list elements. An unnamed list gets default column names V1, V2, and so on. Every element must be the same length, just as columns in a table must align.
How do I convert a data.table back to a data.frame?
Use setDF(), the mirror image of setDT(). It coerces a data.table back to a plain data.frame by reference, again with no copy. This is useful when passing data to a function or package that does not understand data.table syntax.
Is setDT() faster than as.data.table()?
For large objects, yes. setDT() skips the full copy that as.data.table() makes, so it uses roughly half the peak memory and less time. On small objects the difference is negligible, but on multi-million-row tables the in-place conversion is a clear win.