r-statistics.co by Selva Prabhakaran


R Subsetting: One Definitive Rule for [], [[]], $, and @ — No More Guessing

R has three subsetting operators: [] returns a subset of the same type, [[]] extracts a single element, and $ is shorthand for [[]] by name. That's the rule. This tutorial shows you exactly how each one works on every R data structure.

Subsetting in R confuses everyone at first. You have [], [[]], $, and they all seem to do similar things but return different results. The confusion ends here. By the end of this tutorial, you'll know exactly which operator to use and what it returns — for vectors, lists, data frames, and matrices.

The One Rule

Here it is:

OperatorWhat it doesReturns
x[i]Subsets — keeps the containerSame type as x
x[[i]]Extracts — removes the containerThe element itself
x$nameExtracts by name — shorthand for x[["name"]]The element itself

Think of it with a train analogy:

  • x[1] gives you the first train car (still a train — same type)
  • x[[1]] gives you the cargo inside the first car (the element itself)
  • x$name gives you the cargo inside the car labeled "name"
# The one rule in action my_list <- list(numbers = 1:5, text = "hello", flag = TRUE) # [] returns a list (same type) sub <- my_list[1] cat("[] type:", class(sub), "\n") cat("[] content:\n"); str(sub) # [[]] returns the element itself elem <- my_list[[1]] cat("\n[[]] type:", class(elem), "\n") cat("[[]] content:", elem, "\n") # $ same as [[]] by name dollar <- my_list$numbers cat("\n$ type:", class(dollar), "\n") cat("$ content:", dollar, "\n")

  

Subsetting Vectors

Vectors are the simplest case. Since vector elements are individual values (not containers), [] is all you need:

By position (positive integers)

x <- c(10, 20, 30, 40, 50) cat("x[1]:", x[1], "\n") # First element cat("x[3]:", x[3], "\n") # Third element cat("x[c(1,3,5)]:", x[c(1,3,5)], "\n") # Multiple positions cat("x[2:4]:", x[2:4], "\n") # Range

  

By exclusion (negative integers)

x <- c(10, 20, 30, 40, 50) cat("x[-1]:", x[-1], "\n") # Everything except first cat("x[-c(2,4)]:", x[-c(2,4)], "\n") # Exclude positions 2 and 4 cat("x[-(1:3)]:", x[-(1:3)], "\n") # Exclude first three

  

You cannot mix positive and negative indices: x[c(1, -2)] is an error.

By logical vector

x <- c(10, 20, 30, 40, 50) # Logical vector must be same length as x cat("x[c(T,F,T,F,T)]:", x[c(TRUE, FALSE, TRUE, FALSE, TRUE)], "\n") # Usually generated by a condition cat("x[x > 25]:", x[x > 25], "\n") cat("x[x %% 20 == 0]:", x[x %% 20 == 0], "\n") # Divisible by 20

  

By name

ages <- c(Alice = 25, Bob = 32, Carol = 28) cat("ages['Alice']:", ages["Alice"], "\n") cat("ages[c('Bob','Carol')]:", ages[c("Bob", "Carol")], "\n")

  

[[]] on vectors

For vectors, [[]] works but only for single elements — it's rarely needed:

x <- c(a = 10, b = 20, c = 30) cat("x['a']:", x["a"], "\n") # Named numeric (keeps the name) cat("x[['a']]:", x[["a"]], "\n") # Just the value (drops the name)

  

The difference is subtle: x["a"] returns a named element, x[["a"]] returns the bare value. For vectors, this rarely matters.

Subsetting Lists

This is where [] vs [[]] really matters:

info <- list( name = "Alice", scores = c(88, 92, 75), active = TRUE ) # [] returns a LIST (a sub-list) sub_list <- info[1] cat("info[1] is a:", class(sub_list), "\n") str(sub_list) # [[]] returns the ELEMENT (the value inside) element <- info[[1]] cat("\ninfo[[1]] is a:", class(element), "\n") cat("Value:", element, "\n") # $ is shorthand for [["name"]] cat("\ninfo$name:", info$name, "\n") cat("info[['name']]:", info[["name"]], "\n") cat("Same?", identical(info$name, info[["name"]]), "\n")

  

Practical consequence: why it matters

info <- list(scores = c(88, 92, 75)) # This works: [[]] gives you the numeric vector cat("Mean with [[]]:", mean(info[["scores"]]), "\n") cat("Mean with $:", mean(info$scores), "\n") # This FAILS: [] gives you a list, and mean() doesn't work on lists tryCatch( mean(info["scores"]), warning = function(w) cat("[] warning:", w$message, "\n") )

  

Rule of thumb for lists: Use $ or [[]] to get at data. Use [] only when you want a smaller list.

Multiple elements from a list

info <- list(a = 1, b = 2, c = 3, d = 4, e = 5) # [] can extract multiple elements (returns a list) sub <- info[c("a", "c", "e")] str(sub) # [[]] can only extract ONE element # info[[c("a", "c")]] # This would try to drill into nested lists, not select multiple

  

Nested list access

nested <- list( level1 = list( level2 = list( value = "found it!" ) ) ) # Chain [[ ]] to drill into nested lists cat("Method 1:", nested[["level1"]][["level2"]][["value"]], "\n") cat("Method 2:", nested$level1$level2$value, "\n") # Or use a vector of indices cat("Method 3:", nested[[c("level1", "level2", "value")]], "\n")

  

Subsetting Data Frames

Data frames are lists of columns, so both list-style and matrix-style subsetting work:

Column access (list-style)

df <- data.frame( name = c("Alice", "Bob", "Carol"), age = c(25, 32, 28), score = c(92, 85, 78) ) # $ — returns the column vector cat("df$age:", df$age, "\n") cat("Type:", class(df$age), "\n\n") # [["column"]] — same as $ cat("df[['age']]:", df[["age"]], "\n\n") # ["column"] — returns a one-column data frame sub <- df["age"] cat("df['age'] type:", class(sub), "\n") print(sub)

  

Row and column access (matrix-style)

df <- data.frame( name = c("Alice", "Bob", "Carol", "David"), age = c(25, 32, 28, 45), score = c(92, 85, 78, 95) ) # df[rows, columns] cat("df[1, 2]:", df[1, 2], "\n") # Row 1, Column 2 cat("df[1, ]:\n"); print(df[1, ]) # Row 1, all columns cat("\ndf[, 'score']:", df[, "score"], "\n") # All rows, score column cat("\ndf[1:2, c('name','score')]:\n") print(df[1:2, c("name", "score")])

  

Filtering rows (most common operation)

df <- data.frame( name = c("Alice", "Bob", "Carol", "David", "Eve"), age = c(25, 32, 28, 45, 26), dept = c("Eng", "Sales", "Eng", "Sales", "Eng") ) # Filter with a logical condition engineers <- df[df$dept == "Eng", ] cat("Engineers:\n"); print(engineers) # Multiple conditions senior_eng <- df[df$dept == "Eng" & df$age > 26, ] cat("\nSenior engineers:\n"); print(senior_eng) # Using which() (slightly safer with NAs) idx <- which(df$age >= 30) cat("\nAge >= 30 (positions:", idx, "):\n") print(df[idx, ])

  

The subset() function

df <- data.frame( name = c("Alice", "Bob", "Carol", "David"), age = c(25, 32, 28, 45), score = c(92, 85, 78, 95) ) # subset() is a cleaner alternative for interactive use result <- subset(df, age > 27, select = c(name, score)) cat("subset() result:\n") print(result)

  

subset() is convenient for interactive use (no df$ needed), but avoid it inside functions — it uses non-standard evaluation that can cause subtle bugs.

Subsetting Matrices

Matrices use [row, col] notation:

m <- matrix(1:12, nrow = 3, dimnames = list( c("r1", "r2", "r3"), c("A", "B", "C", "D") )) cat("Matrix:\n"); print(m) # Single element cat("\nm[2, 3]:", m[2, 3], "\n") # Row/column cat("Row 1:", m[1, ], "\n") cat("Column 'B':", m[, "B"], "\n") # Submatrix cat("\nSubmatrix:\n"); print(m[1:2, c("A", "C")])

  

The drop problem

When you subset a matrix and the result has only one row or one column, R drops the matrix structure and returns a vector:

m <- matrix(1:12, nrow = 3) # This returns a vector, not a 1-row matrix! result <- m[1, ] cat("m[1,] class:", class(result), "\n") cat("m[1,]:", result, "\n") # Use drop = FALSE to keep the matrix structure result_keep <- m[1, , drop = FALSE] cat("\nm[1,,drop=FALSE] class:", class(result_keep), "\n") print(result_keep)

  

drop = FALSE is important in functions where you need to guarantee the result is always a matrix, regardless of how many rows/columns are selected.

Subsetting with $ vs [[ ]] — When to Use Which

my_list <- list(first_name = "Alice", age = 30) # $ — convenient but limited cat("$:", my_list$first_name, "\n") # $ with partial matching (can be dangerous!) cat("Partial match:", my_list$fir, "\n") # Matches "first_name"! # [[ ]] — programmatic, exact matching cat("[[]]:", my_list[["first_name"]], "\n") # [[ ]] works with variables — $ doesn't col_name <- "age" cat("Variable with [[]]:", my_list[[col_name]], "\n") # my_list$col_name # This looks for a column literally named "col_name"!

  
Use $ when...Use [[]] when...
Interactive workInside functions
You know the exact nameName is in a variable
Quick explorationProgrammatic access
Column names are simpleNames have spaces or special chars

Warning: $ does partial matching — my_list$f matches first_name. This can introduce subtle bugs. [[]] requires exact matches, making it safer for production code.

Replacement: Subsetting on the Left Side

Everything you can subset, you can also replace:

# Vector replacement x <- 1:5 x[3] <- 99 cat("Replace one:", x, "\n") x[x > 50] <- 0 cat("Conditional replace:", x, "\n") # List replacement info <- list(a = 1, b = 2, c = 3) info$d <- 4 # Add new element info[["a"]] <- 100 # Replace existing info$b <- NULL # Delete element str(info) # Data frame replacement df <- data.frame(x = 1:3, y = 4:6) df$z <- df$x + df$y # Add column df[df$x > 1, "y"] <- 0 # Conditional column replacement print(df)

  

Practice Exercises

Exercise 1: Subsetting Drill

# Exercise: Given this list, extract the requested values data <- list( city = "Springfield", population = 30000, temps = c(Mon = 72, Tue = 68, Wed = 75, Thu = 80, Fri = 77), mayor = list(name = "Jane Smith", term_start = 2022) ) # Extract: # 1. The city name (as a character, not a list) # 2. Wednesday's temperature # 3. The mayor's name # 4. A sub-list containing only city and population # 5. All temperatures above 73 # Write your code below:

  
Click to reveal solution
# Solution data <- list( city = "Springfield", population = 30000, temps = c(Mon = 72, Tue = 68, Wed = 75, Thu = 80, Fri = 77), mayor = list(name = "Jane Smith", term_start = 2022) ) # 1. City name (use $ or [[]]) cat("1. City:", data$city, "\n") # 2. Wednesday's temp (chain: list element -> named vector element) cat("2. Wed temp:", data$temps["Wed"], "\n") # 3. Mayor's name (nested list access) cat("3. Mayor:", data$mayor$name, "\n") # 4. Sub-list (use [] to keep as list) sub <- data[c("city", "population")] cat("4. Sub-list:\n"); str(sub) # 5. Temps above 73 (filter named vector) hot <- data$temps[data$temps > 73] cat("5. Above 73:", hot, "\n")

  

Explanation: #1 uses $ to extract the element. #2 chains $temps then ["Wed"]. #3 chains into the nested list. #4 uses [] (not [[]]) because we want multiple elements. #5 uses logical subsetting on the vector.

Exercise 2: Data Frame Surgery

# Exercise: Using mtcars: # 1. Extract the mpg column as a vector (not a data frame) # 2. Get rows 5-10, columns mpg, hp, wt as a data frame # 3. Find all cars with mpg > 25 AND hp < 100 # 4. Replace all hp values > 200 with 200 (cap them) # 5. Add a column 'efficient' that's TRUE when mpg > 20 # Write your code below:

  
Click to reveal solution
# Solution df <- mtcars # 1. Column as vector mpg_vec <- df$mpg # or df[["mpg"]] cat("1. Type:", class(mpg_vec), "Length:", length(mpg_vec), "\n") # 2. Rows 5-10, specific columns sub_df <- df[5:10, c("mpg", "hp", "wt")] cat("\n2. Subset:\n"); print(sub_df) # 3. Filter with multiple conditions efficient <- df[df$mpg > 25 & df$hp < 100, c("mpg", "hp", "cyl")] cat("\n3. MPG>25 & HP<100:\n"); print(efficient) # 4. Cap hp at 200 df$hp[df$hp > 200] <- 200 cat("\n4. Max HP after cap:", max(df$hp), "\n") # 5. Add boolean column df$efficient <- df$mpg > 20 cat("5. Efficient cars:", sum(df$efficient), "of", nrow(df), "\n")

  

Explanation: #1 uses $ for vector extraction. #2 uses [rows, cols] matrix-style. #3 combines conditions with &. #4 uses conditional subsetting on the left side of assignment. #5 creates a new column from a logical expression.

Exercise 3: The Drop Trap

# Exercise: This function should return the sum of a matrix column, # but it breaks when the matrix has only one row. Fix it. sum_column <- function(mat, col) { column_data <- mat[, col] return(sum(column_data)) } # Works fine: m1 <- matrix(1:12, nrow = 3) cat("3-row matrix, col 2:", sum_column(m1, 2), "\n") # Breaks (or gives wrong result) with 1 row: m2 <- matrix(1:4, nrow = 1) cat("1-row matrix, col 2:", sum_column(m2, 2), "\n") # What happened? Fix the function. # Write your fix below:

  
Click to reveal solution
# Solution: Use drop = FALSE sum_column <- function(mat, col) { column_data <- mat[, col, drop = FALSE] # Keep matrix structure return(sum(column_data)) } # Now both work correctly: m1 <- matrix(1:12, nrow = 3) cat("3-row matrix, col 2:", sum_column(m1, 2), "\n") m2 <- matrix(1:4, nrow = 1) cat("1-row matrix, col 2:", sum_column(m2, 2), "\n") # Why? Without drop=FALSE: # m2[, 2] returns a single number (dropped to scalar) # With drop=FALSE: m2[, 2, drop=FALSE] returns a 1x1 matrix # sum() works on both, but other functions might not

  

Explanation: When a matrix subset results in a single row or column, R drops the matrix to a vector. Inside functions, this can cause unexpected behavior. drop = FALSE prevents this, ensuring the result is always a matrix.

Summary

Structure[i] returns[[i]] returns$name returns
VectorSubset vectorSingle value (bare)N/A
ListSub-listElement itselfElement itself
Data frameSub-data frameColumn vectorColumn vector
MatrixSubmatrixSingle valueN/A

The cheat sheet:

  • Need the data? Use $ or [[]]
  • Need a smaller container? Use []
  • Inside a function? Use [[]] (safer than $)
  • Column name in a variable? Must use [[var_name]]

FAQ

Why does R have three different subsetting operators?

Each serves a different purpose. [] preserves the container type (useful for subsetting). [[]] extracts the contents (useful for getting at data). $ is syntactic sugar for [[""]] — convenient for interactive use.

What does @ do?

@ accesses slots in S4 objects (R's formal OOP system). It works like $ but for S4 classes. Most beginners won't encounter S4 objects. When you do, @ works exactly like $: object@slot_name.

Why does df[, 1] return a vector but df[, 1:2] returns a data frame?

The drop parameter — R drops the data frame structure when the result has one column. Use df[, 1, drop = FALSE] to always get a data frame. This default behavior is convenient for interactive work but can cause bugs in functions.

Can I use negative indexing with names?

No. x[-"name"] doesn't work. Use x[names(x) != "name"] or x[setdiff(names(x), "name")] to exclude by name.

What's the difference between subset() and [?

subset() uses non-standard evaluation — you don't need df$ before column names. It's cleaner for interactive work but shouldn't be used inside functions because the column names are evaluated in a special way that can cause bugs.

What's Next?

Subsetting is a foundational skill used in every R operation. Related tutorials:

  1. R Type Coercion — what happens to types when you subset mixed structures
  2. R Attributes — metadata that subsetting can preserve or drop
  3. Data Wrangling with dplyr — modern, pipe-friendly alternatives to manual subsetting