data.table setorder() in R: Sort Rows by Reference
The data.table setorder() function sorts a data.table by one or more columns by reference, ascending or descending, without making a copy of the data.
setorder(dt, mpg) # ascending sort setorder(dt, -mpg) # descending sort setorder(dt, cyl, -mpg) # cyl up, mpg down setorderv(dt, "mpg") # column name as a string setorderv(dt, c("cyl","mpg"), c(1,-1)) # vector of columns and directions setorder(df, mpg) # works on a data.frame too setorder(dt, Ozone, na.last = TRUE) # push NA values to the end
Need explanation? Read on for examples and pitfalls.
What setorder() does in one sentence
setorder() reorders the rows of a data.table in place. You pass a data.table and one or more column names, and the rows are sorted by those columns without any copy being made. The function returns the table invisibly, so the change is permanent on the object you passed.
By default the sort is ascending. Prefix a column with a minus sign to sort it descending. Because the work happens by reference, setorder() is both fast and memory-light on large tables, which is why data.table provides it instead of relying on base R's order().
Syntax
setorder() takes bare column names; setorderv() takes a character vector. The two functions do the same job and differ only in how you name the sort columns.
The arguments are:
x: the data.table or data.frame to sort. It is modified in place, not copied....: one or more unquoted column names forsetorder(). Prefix a name with-to sort it descending.cols: a character vector of column names forsetorderv(). Use this when the names live in a variable.order: forsetorderv(),1for ascending or-1for descending. Pass a vector to set a direction per column.na.last: where to placeNAvalues.FALSE(the default) sends them to the front;TRUEsends them to the end.
Examples by use case
Start by turning a data.frame into a data.table. The mtcars dataset becomes a data.table with as.data.table(), keeping the row names in a model column.
Sort ascending, then descending, on a single column. A plain column name sorts low to high; a - prefix flips it.
Sort several columns with mixed directions. List the columns in priority order and prefix each one you want descending.
Use setorderv() when the column names sit in a variable. This is the form you reach for inside functions and loops.
setorder(dt, my_var) looks for a column literally named my_var, which usually does not exist. setorderv(dt, my_var) reads the names the variable holds. Reach for setorderv() in any function or loop.setorder() also works on a plain data.frame. Since data.table 1.9.5 you can sort a base data.frame in place, row names included.
setorder() vs setkey(), order() and arrange()
setorder() is one of several sorting tools, and they split along two questions. Decide whether the sort should happen by reference and whether you also need a stored key.
| Function | Sorts by reference? | Direction | Stores a key? |
|---|---|---|---|
setorder() |
Yes | Ascending or descending | No |
setkey() |
Yes | Ascending only | Yes |
order() |
No, returns indices | Ascending or descending | No |
dplyr::arrange() |
No, copies the data | Ascending or descending | No |
The decision rule is short. Use setorder() for a fast in-place sort in any direction. Use setkey() when you will subset or join on those columns repeatedly and want the key stored. Use order() when you need the sort indices rather than a sorted table, for example to reorder a related vector. Use arrange() when you are working in a dplyr pipeline and want a fresh copy.
setorder() rearranges the rows of the existing object rather than building a sorted duplicate. That saves time and memory on large tables, but it also means there is no separate "sorted version" to fall back on. The original order is gone once the call returns.setorder(dt, mpg) is df.sort_values('mpg', inplace=True), and setorder(dt, -mpg) maps to df.sort_values('mpg', ascending=False, inplace=True).Common pitfalls
setorder() changes every name that points to the same table. Assigning a data.table to a new variable does not copy it, so sorting through one name sorts the other.
dt_b <- copy(dt_a) makes an independent table, so sorting dt_b leaves dt_a untouched. Without copy(), both names share one object and both get sorted.setorder() cannot sort by a computed expression. It only accepts column names. To sort by something like a ratio, use base order() inside the data.table i argument.
setorder() sends NA values to the front by default. This is the opposite of base order(), which keeps NA last. Set na.last = TRUE to match the familiar behaviour.
Try it yourself
Try it: Convert the airquality data.frame to a data.table and sort it by Temp descending. Save the result to ex_dt.
Click to reveal solution
Explanation: The - prefix on Temp sorts the column from high to low. Because setorder() works by reference, ex_dt itself is reordered and no copy is created.
Related data.table functions
setorder() works alongside the rest of data.table's ordering toolkit. Explore these next:
setorderv(): the vector-input version ofsetorder(), for programmatic use.setkey(): sort by reference and also store the columns as a key.setcolorder(): reorder the columns of a data.table, not the rows.frank(): compute fast ranks of values without reordering rows.order(): return sort indices, useful inside the data.tableiargument.
See the official setorder reference for the complete argument list.
FAQ
What does setorder() do in data.table in R?
setorder() sorts the rows of a data.table by one or more columns. The sort happens by reference, meaning the existing object is rearranged in place and no copy is created. By default the sort is ascending, and the function returns the table invisibly so you can chain a query after it.
How do I sort a data.table in descending order?
Prefix the column name with a minus sign: setorder(dt, -mpg) sorts mpg from high to low. For setorderv(), pass order = -1 instead, as in setorderv(dt, "mpg", -1). You can mix directions across columns, for example setorder(dt, cyl, -mpg) sorts cyl ascending and mpg descending.
What is the difference between setorder() and setkey()?
Both sort a data.table by reference, but setkey() also records the columns as the table's key and only sorts ascending. setorder() stores nothing and can sort in either direction. Use setkey() when you will subset or join on those columns repeatedly; use setorder() for a one-time sort.
Does setorder() create a copy of the data.table?
No. setorder() reorders the rows of the existing object in place, which is what "by reference" means. If another variable points to the same table, it is sorted too. Use copy() first when you need to keep the original row order.
How do I sort by a column name stored in a variable?
Use setorderv(), which accepts a character vector of column names. For example, cols <- c("cyl", "mpg"); setorderv(dt, cols) sorts by both columns. Pass an order vector such as c(1, -1) to set the direction of each column independently.