Subsetting, pulling specific pieces out of vectors, lists, and data frames with [], [[]], and $, is one of the most-used skills in R, and one of the trickiest to master. These 10 exercises take you from basic vector indexing to nested list extraction, each with starter code and a full worked solution.
Run every block, predict the output before you peek at the answer, and you'll have the three operators locked in by the end.
How does [] work on vectors?
R gives you three subsetting operators, and each returns something different. The fastest way to build intuition is to run code and check your predictions. Let's start with vectors, the simplest structure, and [], the operator you'll use most.
Problem 1: Extract by position and by name
Given a named vector of exam scores, extract the 2nd and 4th elements by position, then extract the same two elements by name.
Click to reveal solution
Explanation: Both approaches return the same named numeric vector. Position-based indexing (c(2, 4)) is concise when you know the order, but name-based indexing (c("science", "history")) is safer because it still works even if someone reorders the vector later.
Problem 2: Logical and negative indexing
From the same scores vector, extract all scores above 85 using a logical condition. Then exclude the 3rd element using negative indexing.
Click to reveal solution
Explanation: Logical subsetting is powerful because you don't need to know where the elements are, just what they look like. R evaluates scores > 85 into a logical vector (TRUE, TRUE, FALSE, TRUE, FALSE) and keeps only the TRUE positions. Negative indexing is the flip side: instead of "give me these", you say "give me everything except these."
Try it: Create a vector of 5 city names and extract the first and last elements using positive indexing and length().
Click to reveal solution
Explanation: length(ex_cities) returns 5, so ex_cities[c(1, 5)] grabs the first and last elements. This pattern works regardless of vector length.
How does [] behave differently on lists?
Here's where most R learners hit their first wall. When you use [] on a list, you don't get the contents, you get a smaller list containing those elements.
Think of a list as a train with numbered cars. [] gives you a train car (still a train), while [[]] opens the car door and hands you what's inside.
Problem 3: [] on a list returns a sub-list
Create a list and use [] to extract the first two elements. What does class() return?
Click to reveal solution
Explanation: Even though you only asked for two elements, the result is still a list. That's the key rule: [] on a list always returns a list. This is preserving subsetting, the output structure matches the input structure.
Problem 4: [] vs [[]], what's the actual difference?
Compare student[2] and student[[2]]. What type does each return? Why does mean(student[2]) fail while mean(student[[2]]) works?
Click to reveal solution
Explanation: student[2] gives you a list with one element (a train car still on the track). student[[2]] gives you the numeric vector inside (the cargo pulled out of the car). You can't take the mean of a list, so mean(student[2]) throws an error. But mean(student[[2]]) works because it receives a plain numeric vector.
mean(), sum(), or max() will fail with "argument is not numeric." Use [[ or $ when you need the value for computation.Try it: Create a list with 3 named elements and use [] to extract a sub-list of elements 1 and 3. Verify the result is a list with is.list().
Click to reveal solution
Explanation: ex_info[c(1, 3)] returns a list with elements city and country. The is.list() check confirms the structure is preserved.
When should you use [[]] to extract elements?
Use [[]] whenever you need the value itself for computation, not a container holding the value. [[]] works by name or by position, but it can only extract one element at a time. Its superpower over $ is programmatic access: you can store a name in a variable and pass it to [[]].
Problem 5: Extract and compute with [[]]
Given a configuration list, extract the port value using [[]], by name and by position. Add 1 to prove it's a plain number, not a list.
Click to reveal solution
Explanation: Both approaches return the naked number 8080, not a list containing 8080. That's why port_val + 1 works. If you had used config["port"] (single bracket), adding 1 would throw an error because you'd be trying to add 1 to a list.
Problem 6: [[]] on a data frame column
Using mtcars, extract the mpg column with [[]] by name and by position. Compute the mean.
Click to reveal solution
Explanation: A data frame is just a list of equal-length vectors. So mtcars[["mpg"]] extracts the mpg vector the same way student[["grades"]] extracted the grades vector from a list earlier. mtcars[[1]] gives you the first column as a vector, same idea, just by position.
mean(), sum(), or max(), you almost certainly want [[]] or $. If you're building a smaller data frame, use [].Try it: Given col_name <- "hp", extract that column from mtcars using [[col_name]] (programmatic access) and compute its median.
Click to reveal solution
Explanation: [[col_name]] evaluates the variable col_name to get the string "hp", then extracts that column. This is why [[]] is preferred in functions and loops, $ can't accept variables.
How does $ simplify named access?
The $ operator is shorthand for [["name"]]. It's the most readable option for interactive use, mtcars$mpg is quicker to type and easier to scan than mtcars[["mpg"]].
But it has two limitations: it only works with literal names (not variables), and it does partial matching (which can silently return the wrong element).
Problem 7: $ on a data frame
Use $ to extract the cyl column from mtcars and count how many cars have each cylinder count using table().
Click to reveal solution
Explanation: mtcars$cyl extracts the cyl vector, and table() counts how many times each value appears. 14 cars have 8 cylinders, 11 have 4, and 7 have 6. For interactive exploration, $ is the fastest way to grab a column.
Problem 8: The partial matching trap
Create a list and observe how $ partial-matches names. What does person$f return? What about person$a? Why is this dangerous?
Click to reveal solution
Explanation: person$f matches first_name because f is an unambiguous prefix, only one element starts with "f". Similarly, person$a matches age. The real danger is that partial matching happens silently, no warning, no error. If someone later adds a favorite_color element, person$f becomes ambiguous and returns NULL instead. Your code breaks without any obvious reason.
person$f returns first_name without any warning. In scripts and functions, prefer [[]] for safety. For interactive work, add options(warnPartialMatchDollar = TRUE) to your .Rprofile to catch these.Try it: Use $ to extract the Species column from iris and count how many unique species there are.
Click to reveal solution
Explanation: iris$Species extracts the Species factor, unique() returns the three distinct levels, and length() counts them.
How do you combine [], [[]], and $ on data frames?
Data frames respond to all three operators, but the return types differ. Understanding this is the final piece of the puzzle, and it's where most bugs hide.
Here's the rule of thumb:
df["col"]→ a 1-column data frame (structure preserved)df[["col"]]ordf$col→ a vector (element extracted)df[rows, cols]→ a sub-data-frame (2D subsetting)
Problem 9: Three ways to slice mtcars
From mtcars, perform three operations:
(a) Extract a data frame containing just mpg and hp using [].
(b) Extract the mpg column as a vector using [[]].
(c) Extract rows where cyl == 6 and columns mpg, hp, wt using [rows, cols].
Click to reveal solution
Explanation: Part (a) returns a data frame because [] preserves structure. Part (b) returns a numeric vector because [[]] extracts the element. Part (c) combines row filtering with column selection, mtcars$cyl == 6 creates a logical vector for rows, and c("mpg", "hp", "wt") selects columns. This [rows, cols] syntax is the workhorse of data frame subsetting.
Problem 10: Nested list extraction
Given a nested list of team data, extract team_b's second score in a single expression. Then do it again using $ notation.
Click to reveal solution
Explanation: Read left to right: records[["team_b"]] extracts the team_b list, [["scores"]] extracts the scores vector from that list, and [2] grabs the second element. The $ version reads more naturally. Notice the switch from $ (named access into lists) to [] (position access into a vector) at the final step, that's because scores is a vector, not a list.
Try it: Extract the value in the 3rd row and 2nd column of mtcars as a single number using [row, col] notation.
Click to reveal solution
Explanation: Row 3 is the Datsun 710, column 2 is cyl. The result is 4 (a 4-cylinder car). When you provide both a row and column index, [] returns a single value.
Practice Exercises
These capstone exercises combine multiple subsetting operators. Each one requires chaining techniques from earlier problems. Try to solve each before revealing the answer.
Exercise 1: Filter-and-extract pipeline
Given a survey data frame, build a multi-step pipeline: (a) extract the score column as a vector with [[]], (b) create a logical vector for scores above 80, (c) use that logical vector to subset the data frame rows with [], (d) extract the id column from the filtered result with $.
Click to reveal solution
Explanation: This pipeline uses all three operators: [[]] to extract a vector for computation, [] for 2D row subsetting with a logical condition, and $ for quick column access. In practice, you'd often write this more compactly as survey[survey$score > 80, ]$id, but breaking it into steps makes the logic visible.
Exercise 2: Navigate a nested list
Given a nested department list, write two expressions: (a) extract the 2nd and 3rd staff members of the eng department using [[]] and [], (b) build a named character vector of all department heads using sapply().
Click to reveal solution
Explanation: Part (a) chains [[]] to drill into the nested structure, then switches to [] for multi-element selection from the character vector. Part (b) uses sapply() to iterate over each department and extract the "head" element, sapply() simplifies the result into a named character vector because each extraction returns a single string.
Complete Example
Let's put everything together in a realistic scenario. You have a data frame of cars and want to answer: which high-mileage cars (mpg > 25) have 4 cylinders, and what's their average horsepower?
This example used all three operators naturally: $ for quick column checks in the filter condition, [] with a logical condition for row filtering and column selection, and [[]] to extract a vector for computing the mean. Six 4-cylinder cars exceed 25 mpg, with an average horsepower of about 76, confirming that fuel-efficient 4-cylinder cars tend to have modest power.
Summary
Here's a side-by-side comparison of the three subsetting operators:
| Feature | [] |
[[]] |
$ |
|---|---|---|---|
| Works on | Vectors, lists, data frames | Lists, data frames | Lists, data frames |
| Returns | Same structure as input | The element itself | The element itself |
| Multiple elements? | Yes | No (one at a time) | No (one at a time) |
| Programmatic names? | Yes (x[var]) |
Yes (x[[var]]) |
No (literal only) |
| Partial matching? | No | No (exact by default) | Yes (silently!) |
| Best use case | Subsetting, filtering | Extracting for computation | Interactive access |
Key takeaways:
[]preserves structure, subset a list, get a list; subset a data frame, get a data frame[[]]extracts the raw value, essential when you need to compute with the result$is convenient shorthand for[["name"]]but watch out for partial matching- A data frame is a list of vectors, so
[[]]and$work on data frames the same way they work on lists - For 2D subsetting,
df[rows, cols]is the workhorse, combine logical row filters with column name vectors
References
- Wickham, H., Advanced R, 2nd Edition. CRC Press (2019). Chapter 4: Subsetting. Link
- R Core Team, An Introduction to R. Section 2.7: Index vectors. Link
- R Documentation, Extract or Replace Parts of an Object. Link
- Wickham, H. & Grolemund, G., R for Data Science, 2nd Edition. Chapter 27: A field guide to base R. Link
- Burns, P., The R Inferno. Circle 8: Believing it does as intended. Link
- McMaster University, Indexing in R: why you should use [ more and [, $ less. [Link
Continue Learning
- R Vectors: The Foundation of Everything in R, deep dive on creating, naming, and operating on the data structure these exercises are built on
- R Lists Exercises, 10 more practice problems focused on list creation, nested access, and lapply/sapply
- R Data Types, understand the type system that determines how subsetting behaves on different objects