R Subsetting: One Definitive Rule for [], [[]], $, and @ — No More Guessing
R has three subsetting operators: [] returns a subset of the same type, [[]] extracts a single element, and $ is shorthand for [[]] by name. That's the rule. This tutorial shows you exactly how each one works on every R data structure.
Subsetting in R confuses everyone at first. You have [], [[]], $, and they all seem to do similar things but return different results. The confusion ends here. By the end of this tutorial, you'll know exactly which operator to use and what it returns — for vectors, lists, data frames, and matrices.
The One Rule
Here it is:
Operator
What it does
Returns
x[i]
Subsets — keeps the container
Same type as x
x[[i]]
Extracts — removes the container
The element itself
x$name
Extracts by name — shorthand for x[["name"]]
The element itself
Think of it with a train analogy:
x[1] gives you the first train car (still a train — same type)
x[[1]] gives you the cargo inside the first car (the element itself)
x$name gives you the cargo inside the car labeled "name"
# The one rule in action
my_list <- list(numbers = 1:5, text = "hello", flag = TRUE)
# [] returns a list (same type)
sub <- my_list[1]
cat("[] type:", class(sub), "\n")
cat("[] content:\n"); str(sub)
# [[]] returns the element itself
elem <- my_list[[1]]
cat("\n[[]] type:", class(elem), "\n")
cat("[[]] content:", elem, "\n")
# $ same as [[]] by name
dollar <- my_list$numbers
cat("\n$ type:", class(dollar), "\n")
cat("$ content:", dollar, "\n")
Subsetting Vectors
Vectors are the simplest case. Since vector elements are individual values (not containers), [] is all you need:
By position (positive integers)
x <- c(10, 20, 30, 40, 50)
cat("x[1]:", x[1], "\n") # First element
cat("x[3]:", x[3], "\n") # Third element
cat("x[c(1,3,5)]:", x[c(1,3,5)], "\n") # Multiple positions
cat("x[2:4]:", x[2:4], "\n") # Range
By exclusion (negative integers)
x <- c(10, 20, 30, 40, 50)
cat("x[-1]:", x[-1], "\n") # Everything except first
cat("x[-c(2,4)]:", x[-c(2,4)], "\n") # Exclude positions 2 and 4
cat("x[-(1:3)]:", x[-(1:3)], "\n") # Exclude first three
You cannot mix positive and negative indices: x[c(1, -2)] is an error.
By logical vector
x <- c(10, 20, 30, 40, 50)
# Logical vector must be same length as x
cat("x[c(T,F,T,F,T)]:", x[c(TRUE, FALSE, TRUE, FALSE, TRUE)], "\n")
# Usually generated by a condition
cat("x[x > 25]:", x[x > 25], "\n")
cat("x[x %% 20 == 0]:", x[x %% 20 == 0], "\n") # Divisible by 20
By name
ages <- c(Alice = 25, Bob = 32, Carol = 28)
cat("ages['Alice']:", ages["Alice"], "\n")
cat("ages[c('Bob','Carol')]:", ages[c("Bob", "Carol")], "\n")
[[]] on vectors
For vectors, [[]] works but only for single elements — it's rarely needed:
x <- c(a = 10, b = 20, c = 30)
cat("x['a']:", x["a"], "\n") # Named numeric (keeps the name)
cat("x[['a']]:", x[["a"]], "\n") # Just the value (drops the name)
The difference is subtle: x["a"] returns a named element, x[["a"]] returns the bare value. For vectors, this rarely matters.
Subsetting Lists
This is where [] vs [[]] really matters:
info <- list(
name = "Alice",
scores = c(88, 92, 75),
active = TRUE
)
# [] returns a LIST (a sub-list)
sub_list <- info[1]
cat("info[1] is a:", class(sub_list), "\n")
str(sub_list)
# [[]] returns the ELEMENT (the value inside)
element <- info[[1]]
cat("\ninfo[[1]] is a:", class(element), "\n")
cat("Value:", element, "\n")
# $ is shorthand for [["name"]]
cat("\ninfo$name:", info$name, "\n")
cat("info[['name']]:", info[["name"]], "\n")
cat("Same?", identical(info$name, info[["name"]]), "\n")
Practical consequence: why it matters
info <- list(scores = c(88, 92, 75))
# This works: [[]] gives you the numeric vector
cat("Mean with [[]]:", mean(info[["scores"]]), "\n")
cat("Mean with $:", mean(info$scores), "\n")
# This FAILS: [] gives you a list, and mean() doesn't work on lists
tryCatch(
mean(info["scores"]),
warning = function(w) cat("[] warning:", w$message, "\n")
)
Rule of thumb for lists: Use $ or [[]] to get at data. Use [] only when you want a smaller list.
Multiple elements from a list
info <- list(a = 1, b = 2, c = 3, d = 4, e = 5)
# [] can extract multiple elements (returns a list)
sub <- info[c("a", "c", "e")]
str(sub)
# [[]] can only extract ONE element
# info[[c("a", "c")]] # This would try to drill into nested lists, not select multiple
Nested list access
nested <- list(
level1 = list(
level2 = list(
value = "found it!"
)
)
)
# Chain [[ ]] to drill into nested lists
cat("Method 1:", nested[["level1"]][["level2"]][["value"]], "\n")
cat("Method 2:", nested$level1$level2$value, "\n")
# Or use a vector of indices
cat("Method 3:", nested[[c("level1", "level2", "value")]], "\n")
Subsetting Data Frames
Data frames are lists of columns, so both list-style and matrix-style subsetting work:
Column access (list-style)
df <- data.frame(
name = c("Alice", "Bob", "Carol"),
age = c(25, 32, 28),
score = c(92, 85, 78)
)
# $ — returns the column vector
cat("df$age:", df$age, "\n")
cat("Type:", class(df$age), "\n\n")
# [["column"]] — same as $
cat("df[['age']]:", df[["age"]], "\n\n")
# ["column"] — returns a one-column data frame
sub <- df["age"]
cat("df['age'] type:", class(sub), "\n")
print(sub)
df <- data.frame(
name = c("Alice", "Bob", "Carol", "David"),
age = c(25, 32, 28, 45),
score = c(92, 85, 78, 95)
)
# subset() is a cleaner alternative for interactive use
result <- subset(df, age > 27, select = c(name, score))
cat("subset() result:\n")
print(result)
subset() is convenient for interactive use (no df$ needed), but avoid it inside functions — it uses non-standard evaluation that can cause subtle bugs.
When you subset a matrix and the result has only one row or one column, R drops the matrix structure and returns a vector:
m <- matrix(1:12, nrow = 3)
# This returns a vector, not a 1-row matrix!
result <- m[1, ]
cat("m[1,] class:", class(result), "\n")
cat("m[1,]:", result, "\n")
# Use drop = FALSE to keep the matrix structure
result_keep <- m[1, , drop = FALSE]
cat("\nm[1,,drop=FALSE] class:", class(result_keep), "\n")
print(result_keep)
drop = FALSE is important in functions where you need to guarantee the result is always a matrix, regardless of how many rows/columns are selected.
Subsetting with $ vs [[ ]] — When to Use Which
my_list <- list(first_name = "Alice", age = 30)
# $ — convenient but limited
cat("$:", my_list$first_name, "\n")
# $ with partial matching (can be dangerous!)
cat("Partial match:", my_list$fir, "\n") # Matches "first_name"!
# [[ ]] — programmatic, exact matching
cat("[[]]:", my_list[["first_name"]], "\n")
# [[ ]] works with variables — $ doesn't
col_name <- "age"
cat("Variable with [[]]:", my_list[[col_name]], "\n")
# my_list$col_name # This looks for a column literally named "col_name"!
Use $ when...
Use [[]] when...
Interactive work
Inside functions
You know the exact name
Name is in a variable
Quick exploration
Programmatic access
Column names are simple
Names have spaces or special chars
Warning:$ does partial matching — my_list$f matches first_name. This can introduce subtle bugs. [[]] requires exact matches, making it safer for production code.
Replacement: Subsetting on the Left Side
Everything you can subset, you can also replace:
# Vector replacement
x <- 1:5
x[3] <- 99
cat("Replace one:", x, "\n")
x[x > 50] <- 0
cat("Conditional replace:", x, "\n")
# List replacement
info <- list(a = 1, b = 2, c = 3)
info$d <- 4 # Add new element
info[["a"]] <- 100 # Replace existing
info$b <- NULL # Delete element
str(info)
# Data frame replacement
df <- data.frame(x = 1:3, y = 4:6)
df$z <- df$x + df$y # Add column
df[df$x > 1, "y"] <- 0 # Conditional column replacement
print(df)
Practice Exercises
Exercise 1: Subsetting Drill
# Exercise: Given this list, extract the requested values
data <- list(
city = "Springfield",
population = 30000,
temps = c(Mon = 72, Tue = 68, Wed = 75, Thu = 80, Fri = 77),
mayor = list(name = "Jane Smith", term_start = 2022)
)
# Extract:
# 1. The city name (as a character, not a list)
# 2. Wednesday's temperature
# 3. The mayor's name
# 4. A sub-list containing only city and population
# 5. All temperatures above 73
# Write your code below:
Click to reveal solution
# Solution
data <- list(
city = "Springfield",
population = 30000,
temps = c(Mon = 72, Tue = 68, Wed = 75, Thu = 80, Fri = 77),
mayor = list(name = "Jane Smith", term_start = 2022)
)
# 1. City name (use $ or [[]])
cat("1. City:", data$city, "\n")
# 2. Wednesday's temp (chain: list element -> named vector element)
cat("2. Wed temp:", data$temps["Wed"], "\n")
# 3. Mayor's name (nested list access)
cat("3. Mayor:", data$mayor$name, "\n")
# 4. Sub-list (use [] to keep as list)
sub <- data[c("city", "population")]
cat("4. Sub-list:\n"); str(sub)
# 5. Temps above 73 (filter named vector)
hot <- data$temps[data$temps > 73]
cat("5. Above 73:", hot, "\n")
Explanation: #1 uses $ to extract the element. #2 chains $temps then ["Wed"]. #3 chains into the nested list. #4 uses [] (not [[]]) because we want multiple elements. #5 uses logical subsetting on the vector.
Exercise 2: Data Frame Surgery
# Exercise: Using mtcars:
# 1. Extract the mpg column as a vector (not a data frame)
# 2. Get rows 5-10, columns mpg, hp, wt as a data frame
# 3. Find all cars with mpg > 25 AND hp < 100
# 4. Replace all hp values > 200 with 200 (cap them)
# 5. Add a column 'efficient' that's TRUE when mpg > 20
# Write your code below:
Click to reveal solution
# Solution
df <- mtcars
# 1. Column as vector
mpg_vec <- df$mpg # or df[["mpg"]]
cat("1. Type:", class(mpg_vec), "Length:", length(mpg_vec), "\n")
# 2. Rows 5-10, specific columns
sub_df <- df[5:10, c("mpg", "hp", "wt")]
cat("\n2. Subset:\n"); print(sub_df)
# 3. Filter with multiple conditions
efficient <- df[df$mpg > 25 & df$hp < 100, c("mpg", "hp", "cyl")]
cat("\n3. MPG>25 & HP<100:\n"); print(efficient)
# 4. Cap hp at 200
df$hp[df$hp > 200] <- 200
cat("\n4. Max HP after cap:", max(df$hp), "\n")
# 5. Add boolean column
df$efficient <- df$mpg > 20
cat("5. Efficient cars:", sum(df$efficient), "of", nrow(df), "\n")
Explanation: #1 uses $ for vector extraction. #2 uses [rows, cols] matrix-style. #3 combines conditions with &. #4 uses conditional subsetting on the left side of assignment. #5 creates a new column from a logical expression.
Exercise 3: The Drop Trap
# Exercise: This function should return the sum of a matrix column,
# but it breaks when the matrix has only one row. Fix it.
sum_column <- function(mat, col) {
column_data <- mat[, col]
return(sum(column_data))
}
# Works fine:
m1 <- matrix(1:12, nrow = 3)
cat("3-row matrix, col 2:", sum_column(m1, 2), "\n")
# Breaks (or gives wrong result) with 1 row:
m2 <- matrix(1:4, nrow = 1)
cat("1-row matrix, col 2:", sum_column(m2, 2), "\n")
# What happened? Fix the function.
# Write your fix below:
Click to reveal solution
# Solution: Use drop = FALSE
sum_column <- function(mat, col) {
column_data <- mat[, col, drop = FALSE] # Keep matrix structure
return(sum(column_data))
}
# Now both work correctly:
m1 <- matrix(1:12, nrow = 3)
cat("3-row matrix, col 2:", sum_column(m1, 2), "\n")
m2 <- matrix(1:4, nrow = 1)
cat("1-row matrix, col 2:", sum_column(m2, 2), "\n")
# Why? Without drop=FALSE:
# m2[, 2] returns a single number (dropped to scalar)
# With drop=FALSE: m2[, 2, drop=FALSE] returns a 1x1 matrix
# sum() works on both, but other functions might not
Explanation: When a matrix subset results in a single row or column, R drops the matrix to a vector. Inside functions, this can cause unexpected behavior. drop = FALSE prevents this, ensuring the result is always a matrix.
Summary
Structure
[i] returns
[[i]] returns
$name returns
Vector
Subset vector
Single value (bare)
N/A
List
Sub-list
Element itself
Element itself
Data frame
Sub-data frame
Column vector
Column vector
Matrix
Submatrix
Single value
N/A
The cheat sheet:
Need the data? Use $ or [[]]
Need a smaller container? Use []
Inside a function? Use [[]] (safer than $)
Column name in a variable? Must use [[var_name]]
FAQ
Why does R have three different subsetting operators?
Each serves a different purpose. [] preserves the container type (useful for subsetting). [[]] extracts the contents (useful for getting at data). $ is syntactic sugar for [[""]] — convenient for interactive use.
What does @ do?
@ accesses slots in S4 objects (R's formal OOP system). It works like $ but for S4 classes. Most beginners won't encounter S4 objects. When you do, @ works exactly like $: object@slot_name.
Why does df[, 1] return a vector but df[, 1:2] returns a data frame?
The drop parameter — R drops the data frame structure when the result has one column. Use df[, 1, drop = FALSE] to always get a data frame. This default behavior is convenient for interactive work but can cause bugs in functions.
Can I use negative indexing with names?
No. x[-"name"] doesn't work. Use x[names(x) != "name"] or x[setdiff(names(x), "name")] to exclude by name.
What's the difference between subset() and [?
subset() uses non-standard evaluation — you don't need df$ before column names. It's cleaner for interactive work but shouldn't be used inside functions because the column names are evaluated in a special way that can cause bugs.
What's Next?
Subsetting is a foundational skill used in every R operation. Related tutorials:
R Type Coercion — what happens to types when you subset mixed structures
R Attributes — metadata that subsetting can preserve or drop
Data Wrangling with dplyr — modern, pipe-friendly alternatives to manual subsetting