data.table haskey() in R: Check If a Table Has a Key
data.table haskey() returns TRUE if a data.table has a key set and FALSE if it does not. It is the fast, side-effect-free way to test key status before a keyed join or subset.
haskey(DT) # TRUE if DT has a key set key(DT) # the key column names, or NULL setkey(DT, id) # set a key on column id setkey(DT, NULL) # remove the key if (!haskey(DT)) setkey(DT, id) # set the key only if missing haskey(as.data.table(mtcars)) # FALSE for a freshly built table indices(DT) # secondary indexes (haskey ignores these)
Need explanation? Read on for examples and pitfalls.
What haskey() does in one sentence
haskey() is a yes/no key probe. It takes one data.table and returns a single logical value: TRUE when a primary key is attached, FALSE when none is. Internally it is just !is.null(key(x)), so it never sorts, copies, or modifies your table. That makes it cheap to call inside loops, package code, and assertions.
A key in data.table is an attribute recording which columns the table is sorted by. Keyed tables support fast binary-search joins and the DT[.(value)] subset syntax. Before you rely on either, haskey() tells you whether the key you expect is actually present.
Syntax
The signature has one argument. You pass a data.table and get a logical back.
haskey(x) accepts a single object x. A freshly built data.table has no key, so haskey() returns FALSE until you call setkey() or setkeyv(). There are no other arguments and no options to configure.
data.frame or other object returns FALSE rather than raising an error, because no sorted attribute exists. Treat a FALSE result as "not keyed", not as proof the input is a data.table.Examples by use case
Most real uses fall into four patterns. Each example below uses a built-in dataset so you can run it directly.
A direct check on a converted table:
A guard clause that sets a key only when one is missing, which avoids re-sorting an already-keyed table:
Comparing haskey() with key() to see what each returns:
Confirming that a secondary index does not count as a key:
haskey() vs key(), setkey() and indices()
Four functions touch keys, and each answers a different question. Pick by what you need back.
| Function | Returns | Use it to |
|---|---|---|
haskey(DT) |
TRUE / FALSE |
Test whether any key exists |
key(DT) |
Character vector or NULL |
Read which columns form the key |
setkey(DT, col) |
Invisibly, the table | Set or remove the key (sorts the data) |
indices(DT) |
Character vector or NULL |
List secondary indexes, not the key |
Decision rule: use haskey() for a boolean branch, and key() when you need the column names themselves. haskey() is the cheaper call when the names do not matter.
stopifnot(haskey(dt)) fails fast with a clear message instead of returning silently wrong results from an unkeyed subset.Common pitfalls
Three mistakes account for most haskey() bugs.
Treating the return value as column names. haskey() gives a logical, never a string, so comparing it to a column name never matches:
Expecting haskey() to detect secondary indexes. A table built with setindex() but no setkey() returns FALSE; check indices() for those. Finally, remember that subsetting with DT[order(...)] or setorder() reorders rows without setting a key, so haskey() still returns FALSE afterward.
Try it yourself
Try it: Convert airquality to a data.table, then set a key on Month only if the table does not already have one. Save the result to ex_dt.
Click to reveal solution
Explanation: haskey() returns FALSE for the freshly converted table, so the if branch runs and setkey() sorts ex_dt by Month. A second call to haskey() now returns TRUE.
Related data.table functions
These functions pair naturally with haskey() when managing keys.
setkey()sets the primary key and physically sorts the table.key()returns the key column names so you can act on them.setindex()andindices()manage fast secondary indexes.setorder()reorders rows without attaching a key.setkeyv()sets a key from a character vector of column names.
FAQ
What does haskey() return in R?
haskey() returns a single logical value: TRUE if the data.table has a primary key attached and FALSE otherwise. It never returns column names or NULL. Because it only reads the table's sorted attribute, the call is instant and has no side effects, which makes it safe to use inside loops, assertions, and conditional branches.
What is the difference between haskey() and key() in data.table?
haskey() answers "is there a key?" with TRUE or FALSE. key() answers "which columns form the key?" by returning a character vector, or NULL when no key is set. Use haskey() for a boolean branch and key() when you need the actual column names. haskey() is effectively shorthand for !is.null(key(x)).
Does haskey() detect secondary indexes?
No. haskey() only reports the primary key created by setkey(). A table with one or more secondary indexes from setindex() but no primary key returns FALSE. To check for secondary indexes, call indices(), which lists their column names, or returns NULL when none exist.
How do I remove a key so haskey() returns FALSE?
Call setkey(DT, NULL). This drops the sorted attribute without changing the row order, so a following haskey(DT) call returns FALSE. The data stays sorted in memory, but data.table no longer treats it as keyed, and keyed subsetting like DT[.(value)] will no longer work until you set a key again.
For the full key reference, see the data.table setkey documentation.