R Subsetting: One Definitive Rule for [], [[]], $, and @, No More Guessing

R has four subsetting operators, [, [[, $, and @, and each one returns something different. The single rule that unifies them: [ keeps the container; [[, $, and @ extract the contents inside it.

This guide explains every operator with runnable base-R examples, a decision flowchart, and the five classic mistakes that trip up almost every R user, so you never have to guess which operator to reach for again.

Why does R have four subsetting operators?

R distinguishes between taking a slice of a container and reaching inside to pull out one item. That single distinction is the reason mtcars[1] gives you a one-column data frame while mtcars[[1]] gives you a plain numeric vector, same object, two very different results. Before the rules and gotchas, let's see all four operators in action on one data frame so the difference is concrete, not abstract.

The code below takes a column of mtcars three different ways, then builds a tiny S4 object to show the fourth operator. Read the #> outputs and compare the types, not just the values.

RFour operators on one data frame
# Four operators, one object: notice what each one returns car_df <- mtcars[1:3, 1:3] # small slice to make output readable col_slice <- car_df["mpg"] # [ : same container (data frame) col_vec <- car_df[["mpg"]] # [[: extracts contents (numeric) col_vec2 <- car_df$mpg # $ : shorthand for [["mpg"]] class(col_slice) #> [1] "data.frame" class(col_vec) #> [1] "numeric" identical(col_vec, col_vec2) #> [1] TRUE # And the fourth operator, @, works on S4 objects: setClass("Point", representation(x = "numeric", y = "numeric")) p1 <- new("Point", x = 3, y = 4) p1@x #> [1] 3

  

car_df["mpg"] came back as a one-column data.frame, but car_df[["mpg"]] and car_df$mpg both returned the raw numeric vector underneath. $ is just a friendlier way to spell [[. And @ is the S4 equivalent of $: it pulls a named slot out of a formally-defined object. Four operators, one unifying idea, which is what the rest of the post makes rock-solid.

Try it: Using the car_df above, pull the cyl column out twice, once so the result is still a data frame, and once so it's a plain numeric vector. Check class() on both.

RExercise: keep vs extract on cyl
# Try it: keep-vs-extract on the cyl column ex_keep <- NULL # your code here, should be a data.frame ex_extract <- NULL # your code here, should be numeric class(ex_keep) #> Expected: "data.frame" class(ex_extract) #> Expected: "numeric"

  
Click to reveal solution
RKeep-vs-extract solution
ex_keep <- car_df["cyl"] ex_extract <- car_df[["cyl"]] class(ex_keep) #> [1] "data.frame" class(ex_extract) #> [1] "numeric"

  

Explanation: [ preserves the data frame; [[ reaches in and hands you the column itself.

How does [ differ from [[ in R?

The single-bracket operator [ always returns the same kind of object you started with. Feed it a vector, you get a vector back; feed it a list, you get a list back; feed it a data frame, you get a data frame back. The double-bracket [[, by contrast, drills one level deeper and hands you whatever was stored inside a single position.

Think of a list as a train. x[1:2] is a shorter train, still a train, still carrying cargo. x[[1]] is the cargo itself, taken out of the first car. That mental image settles about 80% of the confusion around R subsetting.

Let's start with an atomic vector where the distinction is subtle, then move to a list where it becomes dramatic.

R[ on an atomic vector
# [ on an atomic vector: returns an atomic vector grades <- c(math = 92, english = 85, science = 78, history = 88) grades[c(1, 3)] # by position #> math science #> 92 78 grades[c("math", "science")] # by name #> math science #> 92 78 grades[-2] # drop index 2 #> math science history #> 92 78 88 grades[grades > 80] # logical #> math english history #> 92 85 88

  

All four styles, positive integers, negative integers, names, logicals, return a numeric vector, just like grades. The container didn't change. Now watch what happens when the container is a list.

R[ vs [[ on a list
# [ vs [[ on a list: the difference is now dramatic cfg <- list(host = "localhost", port = 5432, tags = c("db", "prod")) cfg[1] # still a list #> $host #> [1] "localhost" class(cfg[1]) #> [1] "list" cfg[[1]] # the element itself #> [1] "localhost" class(cfg[[1]]) #> [1] "character" cfg[[3]][1] # chain: extract the vector, then subset it #> [1] "db"

  

cfg[1] is a one-item list, if you tried to paste0("Server: ", cfg[1]), R would complain. cfg[[1]] is the raw string "localhost", ready to use. This is the #1 reason R beginners get "non-character argument" errors: they bracket-subsetted where they needed double-bracket.

Container vs content, the one rule

Figure 1: [ returns a smaller train (still a list); [[ returns what's inside a single car (the object).

Data frames behave the same way, because under the hood, a data frame is a list of columns. iris[2] is a one-column data frame; iris[[2]] is the numeric vector of sepal widths.

RContainer vs content on data frames
# Data frames follow the same container-vs-content rule iris_slice <- iris[2] # one-column data frame class(iris_slice) #> [1] "data.frame" sepal_len <- iris[[1]] # first column as a plain vector class(sepal_len) #> [1] "numeric" length(sepal_len) #> [1] 150

  

Remember that the next time you pass an iris$Species-style column into a function: it's already a vector, not a data frame. The rule only cares whether you used [ (keep the container) or [[ (extract the content).

Key Insight
One rule governs all four operators. [ always returns an object of the same class as the input. [[, $, and @ always reach one level deeper to hand you the contents. If you can answer "do I want to preserve the shape, or get the raw element?", you've picked the right operator.

Try it: Given cfg above, write an expression that returns the string "prod" (the second tag). You'll need to chain operators.

RExercise: extract prod from cfg
# Try it: extract "prod" from cfg ex_prod <- NULL # your code here ex_prod #> Expected: "prod"

  
Click to reveal solution
RExtract-prod solution
ex_prod <- cfg[["tags"]][2] ex_prod #> [1] "prod"

  

Explanation: cfg[["tags"]] extracts the character vector stored under tags. Then [2] picks its second element, standard atomic-vector subsetting.

When should you use $ instead of [[ in R?

$ is [[ with sugar on top. df$col is essentially df[["col", exact = FALSE]], it looks up a named element, just like [[, but it saves you two characters and two quote marks. For interactive analysis where you're typing column names by hand, $ is the obvious choice and almost everyone uses it.

There are two moments, however, when $ will bite you. First, $ cannot take a computed name, a variable holding a column name won't work. Second, $ does partial matching by default: if you ask for a name that doesn't exist but shares a prefix with one that does, R silently returns the partial match instead of NULL. This is a real, hard-to-debug source of bugs.

R$ sugar, partial match, and traps
# $ is sugar for [["..."]], until it isn't settings <- list(timeout = 30, timeout_ms = 30000, retries = 3) settings$timeout # exact match, fine #> [1] 30 settings$timeo # PARTIAL match, silently returns timeout #> [1] 30 settings[["timeo"]] # [[ with exact matching, returns NULL #> NULL col_name <- "timeout" settings$col_name # $ looks for a slot called "col_name", not the value of col_name #> NULL settings[[col_name]] # [[ evaluates col_name first, correct #> [1] 30

  

The partial-match settings$timeo returned 30 even though no slot is called timeo. Switching to [[ makes that bug impossible: [[ matches exactly, and it happily takes a variable as the key. Rule of thumb: use $ when you know the name at write-time and reading df$col makes the code clearer; use [[ whenever the name lives in a variable, or whenever you want R to fail loudly on a typo.

Warning
$ does silent partial matching and does not evaluate variables. Inside functions and loops, where column names are often passed as arguments, always use [[. The handful of extra characters buys you correctness and makes bugs throw errors instead of returning plausible-but-wrong values.

Try it: Rewrite settings$retries two different ways, once with [[ using a literal string, and once with [[ using a variable called key.

RExercise: two [[ rewrites of $retries
# Try it: two [[ rewrites of settings$retries key <- "retries" ex_a <- NULL # your code here, literal string ex_b <- NULL # your code here, variable key c(ex_a, ex_b) #> Expected: [1] 3 3

  
Click to reveal solution
RRetries-rewrite solution
ex_a <- settings[["retries"]] ex_b <- settings[[key]] c(ex_a, ex_b) #> [1] 3 3

  

Explanation: [[ works with both a literal string and a variable holding a string. $ only works with the literal form.

What does the @ operator do in S4 objects?

S4 is R's stricter object system. Unlike an ordinary list, an S4 object has formally declared slots with fixed names and types, and you reach them with @ instead of $. You'll run into S4 constantly once you step past base R: the Matrix package, Bioconductor, lubridate's Period class, and many formal-model packages all use it. The mechanics are refreshingly small.

Below, we define a tiny Point class with two numeric slots, build an instance, and access the slots with @. Notice how R prevents you from setting a slot to the wrong type, that's the whole reason S4 exists.

RDefine an S4 Segment with @ slots
# Defining and using an S4 class setClass("Segment", representation(start = "numeric", end = "numeric")) seg1 <- new("Segment", start = 3, end = 10) seg1@start # read a slot #> [1] 3 seg1@end # read another #> [1] 10 slotNames(seg1) # inspect all slot names #> [1] "start" "end" # slot() is the programmatic equivalent of @ slot(seg1, "start") #> [1] 3

  

seg1@start reads the start slot directly. slotNames() shows you every declared slot, and slot(seg1, name) is the variable-friendly version of @, the same reason you'd prefer [[col_name]] over $col_name in functions. If you try seg1@start <- "three", R throws an error because the class declared start as numeric.

Tip
Prefer slot(object, name) inside functions. Just like [[ beats $ when the name is stored in a variable, slot() beats @ when you're writing reusable code. For one-off interactive work, @ is faster to type and reads cleaner.

Try it: Create a second Segment with start = 10 and end = -2 and extract its end slot two ways, once with @, once with slot().

RExercise: read end slot two ways
# Try it: two ways to read the end slot ex_seg <- new("Segment", start = 10, end = -2) ex_end_at <- NULL # your code here, use @ ex_end_slot <- NULL # your code here, use slot() c(ex_end_at, ex_end_slot) #> Expected: [1] -2 -2

  
Click to reveal solution
REnd-slot solution
ex_end_at <- ex_seg@end ex_end_slot <- slot(ex_seg, "end") c(ex_end_at, ex_end_slot) #> [1] -2 -2

  

Explanation: Both return the same thing. @ is a literal-name shortcut; slot() accepts a variable, which makes it the right choice inside functions.

What's the one rule that unifies [, [[, $, and @?

Here is the whole rule, compressed to one sentence: **[ keeps the container; [[, $, and @ extract one element out of it.* Everything else is a detail about which container the operator works on, and how* it names the element.

Operator Works on What it returns Name form
[ vectors, lists, data frames, matrices same container, possibly smaller position, name, logical
[[ lists, data frames, environments the element itself single position or name (variable OK)
$ lists, data frames, environments the element itself literal name only
@ S4 objects the slot itself literal slot name only

The flowchart below turns the table into three questions you can ask in your head before typing the operator.

Decision flowchart for R subsetting operators

Figure 2: Three questions, keep the container? S4 object? literal name?, pick the right operator every time.

Let's apply the rule to every data structure in one block so you can see it hold up.

ROne rule across all four structures
# One rule, four structures # 1. Atomic vector grades[["math"]] # extract -> numeric scalar #> [1] 92 grades["math"] # keep -> named vector of length 1 #> math #> 92 # 2. List cfg[["port"]] # extract -> integer #> [1] 5432 cfg["port"] # keep -> one-item list #> $port #> [1] 5432 # 3. Data frame (iris already in base R) iris[["Species"]][1:3] # extract column, then slice it #> [1] setosa setosa setosa #> Levels: setosa versicolor virginica # 4. S4 object seg1@start # extract -> numeric #> [1] 3

  

Same rule, four structures, no exceptions. Once that clicks, R's subsetting stops being a lookup table of special cases and becomes one idea: container or content?

Try it: Use cfg from earlier. Return a one-item list whose single element is the host value, without redefining cfg. Then return the host value as a plain character.

RExercise: keep vs extract on cfg$host
# Try it: keep vs extract on cfg$host ex_keep_host <- NULL # your code here, should be a list ex_extract_host <- NULL # your code here, should be character class(ex_keep_host) #> Expected: "list" class(ex_extract_host) #> Expected: "character"

  
Click to reveal solution
RHost-keep-extract solution
ex_keep_host <- cfg["host"] ex_extract_host <- cfg[["host"]] class(ex_keep_host) #> [1] "list" class(ex_extract_host) #> [1] "character"

  

Explanation: [ preserves the list; [[ drills into it. The same rule you saw with data frames.

What are the most common R subsetting mistakes (and how do you fix them)?

Every long-time R user has a personal scar collection from subsetting. These five mistakes account for the overwhelming majority, the fixes are all one-character changes once you know what to look for.

RFive common subsetting mistakes
# Mistake 1: single-bracket when you needed double df_x <- data.frame(a = 1:3, b = 4:6) mean(df_x["a"]) # warning: argument is not numeric or logical #> [1] NA mean(df_x[["a"]]) # correct #> [1] 2 # Mistake 2: partial matching bite lst_x <- list(alpha = 1, alphabet = 2) lst_x$alph # returns 1 silently (partial match on "alpha") #> [1] 1 lst_x[["alph"]] # returns NULL, the safe form #> NULL # Mistake 3: chaining [ when you meant [[ lst_x[1][1] # one-item list, then same one-item list #> $alpha #> [1] 1 lst_x[[1]] # the number 1 #> [1] 1 # Mistake 4: using $ with a variable col <- "a" df_x$col # looks for a column literally named "col" #> NULL df_x[[col]] # evaluates col -> "a" -> correct #> [1] 1 2 3 # Mistake 5: forgetting drop = FALSE on a matrix m <- matrix(1:12, nrow = 3) class(m[1, ]) # drops to a vector (often unwanted) #> [1] "integer" class(m[1, , drop = FALSE]) # stays a matrix #> [1] "matrix" "array"

  

Each of these has the same shape: R silently gave you something plausible that was the wrong type for the next step. The fix is always to ask the question from the decision flowchart, container or content?, and match the operator to the answer.

Warning
x[1, 2] is not the same as x[1][2]. The first picks row 1 column 2 from a matrix or data frame; the second picks the first element and then tries to subset that. The second form almost always returns NA or a one-column object and is a top-5 source of silent bugs. When in doubt, read the expression out loud, each [ ] is one step.

Try it: Write one line that extracts the mean of column a from df_x using [[, and one line that does the same with $.

RExercise: two mean-extractors on column a
# Try it: two mean-extractors ex_m1 <- NULL # your code here, use [[ ex_m2 <- NULL # your code here, use $ c(ex_m1, ex_m2) #> Expected: [1] 2 2

  
Click to reveal solution
RMean-extractor solution
ex_m1 <- mean(df_x[["a"]]) ex_m2 <- mean(df_x$a) c(ex_m1, ex_m2) #> [1] 2 2

  

Explanation: Both extract the column as a numeric vector first, then pass it to mean(). mean() on a one-column data frame does not work in modern R, [[ and $ both give you the vector it needs.

Practice Exercises

Two capstone exercises that combine what you've learned. Use distinct variable names (my_*) so your solutions don't overwrite tutorial variables.

Exercise 1: Same data, two shapes

Extract the third row of mtcars twice: once as a one-row data.frame, and once as a named numeric vector. Save the first to my_row_df and the second to my_row_vec. Verify with class() that they differ.

RExercise: third row two shapes
# Exercise 1: row extraction, two shapes # Hint: rows and columns both obey the container-vs-content rule. # A row as a data frame uses [ ; a row as a named vector uses unlist() on it. my_row_df <- NULL my_row_vec <- NULL class(my_row_df) class(my_row_vec)

  
Click to reveal solution
RRow-extraction solution
my_row_df <- mtcars[3, ] # still a data frame my_row_vec <- unlist(mtcars[3, ]) # flatten to a named numeric vector class(my_row_df) #> [1] "data.frame" class(my_row_vec) #> [1] "numeric" my_row_vec #> mpg cyl disp hp drat wt qsec vs am gear carb #> 22.80 4.00 108.0 93.0 3.85 2.32 18.61 1.00 1.00 4.00 1.00

  

Explanation: Row subsetting with mtcars[3, ] keeps the container (data.frame). unlist() collapses the one-row data frame into a flat named numeric vector, useful when a function expects a plain vector of parameters.

Exercise 2: Nested list, [[ only

Build my_inv <- list(fruit = list(count = 12, unit = "kg")). Extract the numeric 12 using only [[ and integer/character keys, no $, no unlist(). Save the result to my_count and verify is.numeric(my_count) is TRUE.

RExercise: drill into a nested list
# Exercise 2: drill into a nested list with [[ only my_inv <- list(fruit = list(count = 12, unit = "kg")) my_count <- NULL is.numeric(my_count) #> Expected: TRUE my_count #> Expected: 12

  
Click to reveal solution
RNested-list solution
my_inv <- list(fruit = list(count = 12, unit = "kg")) my_count <- my_inv[["fruit"]][["count"]] is.numeric(my_count) #> [1] TRUE my_count #> [1] 12

  

Explanation: Each [[ drops you one level deeper. The first reaches into the outer list and returns the inner list; the second reaches into the inner list and returns the number. Chained [[ is how you walk nested lists without partial matching or $ surprises.

Complete Example

Here is an end-to-end workflow that uses every operator the post introduced. We take the built-in airquality data set, pull the May rows, extract the temperature column as a plain vector, summarise it, and then package the summary as a tiny S4 result object so downstream code knows exactly what to expect.

REnd-to-end airquality subsetting workflow
# Complete example: from data frame to S4 summary # 1. Keep the container ([), filter to May aq_may <- airquality[airquality$Month == 5, ] class(aq_may) #> [1] "data.frame" nrow(aq_may) #> [1] 31 # 2. Extract a column ([[), temperature as numeric aq_temp <- aq_may[["Temp"]] class(aq_temp) #> [1] "numeric" # 3. Use $ for a quick second column (literal name, interactive style) aq_wind <- aq_may$Wind # 4. Summarise aq_mean_temp <- mean(aq_temp) aq_mean_wind <- mean(aq_wind) # 5. Wrap in an S4 "WeatherSummary" so @ makes sense setClass("WeatherSummary", representation(month = "character", mean_temp = "numeric", mean_wind = "numeric")) may_summary <- new("WeatherSummary", month = "May", mean_temp = aq_mean_temp, mean_wind = aq_mean_wind) may_summary@month #> [1] "May" may_summary@mean_temp #> [1] 65.54839 may_summary@mean_wind #> [1] 11.62258

  

Walking through: airquality[airquality$Month == 5, ] uses [ to keep the data-frame shape while filtering rows. aq_may[["Temp"]] uses [[ to extract the numeric column for downstream arithmetic. aq_may$Wind uses $ because we're typing an interactive literal. Finally, the WeatherSummary S4 object uses @ to expose fields with guaranteed types, a pattern you'll see in real packages that return structured results.

Summary

Operator Mental model Use when
[ Keep the container, possibly smaller Slicing rows/columns, returning the same type
[[ Extract one element You want the value itself; name can be a variable
$ Extract one element, literal name Interactive code where the column name is hard-coded
@ Extract an S4 slot, literal name Reading a formally declared slot
slot(obj, name) @ with a variable name Inside functions that pass slot names as arguments
Key Insight
Memorise one sentence and the rest is details. [ keeps the container; [[, $, and @ extract the contents. Everything you ever need to know about R subsetting follows from that single idea, plus the discipline to prefer [[ and slot() whenever the name lives in a variable.

References

  1. Wickham, H. Advanced R (2nd ed.), Chapter 4: Subsetting. CRC Press (2019). adv-r.hadley.nz/subsetting.html
  2. Wickham, H. Advanced R (2nd ed.), Chapter 15: S4. adv-r.hadley.nz/s4.html
  3. R Core Team. An Introduction to R, §6 Lists and data frames. cran.r-project.org/doc/manuals/r-release/R-intro.html
  4. R Core Team. R Language Definition, §3.4 Indexing. cran.r-project.org/doc/manuals/r-release/R-lang.html
  5. Peng, R. R Programming for Data Science, Chapter 9: Subsetting R Objects. bookdown.org/rdpeng/rprogdatascience
  6. Chambers, J. Software for Data Analysis: Programming with R. Springer (2008). Chapter 9: S4 classes.
  7. R documentation: ?Extract, ?"[[", ?slot, ?setClass.

Continue Learning