R Lists: When Data Frames Aren't Flexible Enough (Complete Guide)
An R list is a container that can hold anything — numbers, text, vectors of different lengths, data frames, other lists, even model outputs. When you need to store related but differently-shaped data together, lists are the answer.
Vectors require all elements to be the same type. Data frames require all columns to be the same length. Lists have no such constraints. They're R's most flexible data structure, and they're everywhere — every statistical model returns a list, every JSON/API response is a list, and every time you need to group together heterogeneous objects, you use a list.
Introduction
A list is an ordered collection where each element can be any type and any size:
A single number
A character vector of length 100
A data frame with 1,000 rows
Another list (nesting)
A function
A model object
Think of a list as a filing cabinet — each drawer can hold a completely different kind of document. Drawer 1 might hold a single page, drawer 2 a thick folder, and drawer 3 another filing cabinet.
# A list that holds different types and sizes
person <- list(
name = "Alice",
age = 30,
scores = c(88, 92, 75, 95),
address = data.frame(
street = "123 Main St",
city = "Springfield",
state = "IL"
),
employed = TRUE
)
# See the structure
str(person)
One person, five pieces of data, all different shapes. A vector or data frame can't hold this — but a list can.
Creating Lists
The list() function
# Named list (most common — always name your elements)
config <- list(
database = "production",
port = 5432,
tables = c("users", "orders", "products"),
verbose = FALSE
)
str(config)
# Unnamed list (works but harder to use)
unnamed <- list(42, "hello", TRUE)
cat("Unnamed list elements:", "\n")
str(unnamed)
Best practice: Always name your list elements. my_list$name is much clearer than my_list[[1]].
From existing objects
# Combine existing objects into a list
x <- 1:10
y <- c("a", "b", "c")
z <- data.frame(id = 1:3, value = c(10, 20, 30))
combined <- list(numbers = x, letters = y, data = z)
cat("Elements:", length(combined), "\n")
cat("Names:", names(combined), "\n")
str(combined)
Empty list (to fill later)
# Start empty, add elements in a loop
results <- list()
for (i in 1:3) {
results[[paste0("run_", i)]] <- rnorm(5, mean = i * 10)
}
str(results)
This pattern — creating an empty list and filling it in a loop — is common when running multiple analyses or simulations.
Accessing List Elements: [] vs [[]] vs $
This is the #1 source of confusion with lists. R has three ways to access list elements, and they return different things:
fruits <- list(
first = "apple",
second = c("banana", "blueberry"),
third = "cherry"
)
# $ — access by name (returns the element itself)
cat("$ operator:", fruits$first, "\n")
cat("Type:", class(fruits$first), "\n\n")
# [[ ]] — access by name or position (returns the element itself)
cat("[[ ]] by name:", fruits[["second"]], "\n")
cat("[[ ]] by position:", fruits[[2]], "\n")
cat("Type:", class(fruits[[2]]), "\n\n")
# [ ] — subset (returns a smaller LIST, not the element)
sub <- fruits[1]
cat("[ ] returns a LIST:\n")
str(sub)
cat("Type:", class(sub), "\n")
Here's the critical difference:
Syntax
Returns
Analogy
list$name
The element itself
Take the item OUT of the box
list[["name"]]
The element itself
Take the item OUT of the box
list["name"]
A list containing that element
Give me the box with the item still in it
list[c(1,3)]
A sublist with elements 1 and 3
Give me boxes 1 and 3
The train car analogy: [1] gives you a train car (still a train). [[1]] gives you the contents inside the train car.
# Practical demonstration of the difference
my_list <- list(nums = c(10, 20, 30))
# This works — [[]] gives you the vector
cat("Sum with [[]]:", sum(my_list[["nums"]]), "\n")
# This fails — [] gives you a list, and sum() doesn't work on lists
# sum(my_list["nums"]) # Would error!
cat("my_list['nums'] is a", class(my_list["nums"]), "— can't sum a list\n")
Rule of thumb: Use $ or [[]] to get at the actual data. Use [] only when you want a sublist.
config <- list(host = "localhost", port = 8080)
cat("Before:", length(config), "elements\n")
# Add by name
config$database <- "mydb"
config[["timeout"]] <- 30
cat("After:", length(config), "elements\n")
str(config)
Remove elements
config <- list(host = "localhost", port = 8080, debug = TRUE, temp = "delete_me")
cat("Before:", names(config), "\n")
# Set to NULL to remove
config$temp <- NULL
config[["debug"]] <- NULL
cat("After:", names(config), "\n")
str(config)
Setting an element to NULL removes it entirely — the list shrinks. This is different from vectors, where you can't set elements to NULL.
Nested Lists
Lists can contain other lists, creating hierarchical structures. This is how JSON data, API responses, and complex configurations are represented in R:
# A company with departments
company <- list(
name = "Acme Corp",
founded = 2010,
departments = list(
engineering = list(
head = "Alice",
size = 25,
projects = c("Backend", "Frontend", "DevOps")
),
marketing = list(
head = "Bob",
size = 12,
projects = c("Brand", "Digital", "Events")
)
)
)
# Access nested elements by chaining $ or [[]]
cat("Company:", company$name, "\n")
cat("Engineering head:", company$departments$engineering$head, "\n")
cat("Marketing size:", company$departments$marketing$size, "\n")
cat("Eng projects:", company$departments$engineering$projects, "\n")
Each $ digs one level deeper. You can mix $ and [[]]:
# These all do the same thing
company <- list(departments = list(eng = list(head = "Alice")))
cat("Method 1:", company$departments$eng$head, "\n")
cat("Method 2:", company[["departments"]][["eng"]][["head"]], "\n")
cat("Method 3:", company$departments[["eng"]]$head, "\n")
Real-World Use Case: Model Results
The most common place you'll encounter lists is in statistical model output. Every model in R returns a list:
# Fit a linear regression model
model <- lm(mpg ~ wt + hp, data = mtcars)
# The model object is a list!
cat("Type:", class(model), "\n")
cat("Elements:", length(model), "\n")
cat("Names:", paste(names(model), collapse = ", "), "\n")
# Access specific parts of the model
model <- lm(mpg ~ wt + hp, data = mtcars)
# Coefficients
cat("Intercept:", coef(model)[1], "\n")
cat("Weight effect:", round(coef(model)["wt"], 3), "\n")
cat("HP effect:", round(coef(model)["hp"], 3), "\n")
# R-squared from summary
s <- summary(model)
cat("R-squared:", round(s$r.squared, 4), "\n")
cat("Adj R-squared:", round(s$adj.r.squared, 4), "\n")
# Residuals
cat("First 5 residuals:", round(model$residuals[1:5], 2), "\n")
This is why understanding lists is essential — you can't effectively use model results without knowing how to navigate list structures.
Iterating Over Lists: lapply() and sapply()
Loops work on lists, but R provides more elegant alternatives:
# A list of numeric vectors
data_sets <- list(
set_a = c(23, 45, 12, 67),
set_b = c(89, 34, 56, 78, 90),
set_c = c(11, 22, 33)
)
# lapply — apply a function to each element, returns a LIST
means <- lapply(data_sets, mean)
cat("lapply result (list):\n")
str(means)
# sapply — same but simplifies to a vector when possible
means_vec <- sapply(data_sets, mean)
cat("\nsapply result (vector):", means_vec, "\n")
# Custom function with sapply
sapply(data_sets, function(x) {
c(mean = round(mean(x), 1), max = max(x), n = length(x))
})
Function
Returns
Use when
lapply()
Always a list
You want consistent output
sapply()
Vector/matrix if possible
You want simplified output
vapply()
Type-safe vector
You want guaranteed output type
lapply() is the safest — it always returns a list, never surprises you. sapply() is convenient but can return unexpected types with edge cases.
Converting Between Lists and Other Types
# List to vector (only works if all elements are single values)
my_list <- list(a = 1, b = 2, c = 3)
my_vec <- unlist(my_list)
cat("Unlisted:", my_vec, "\n")
cat("Type:", class(my_vec), "\n")
cat("Names preserved:", names(my_vec), "\n")
# Data frame to list (each column becomes a list element)
df <- data.frame(x = 1:3, y = c("a", "b", "c"))
as_list <- as.list(df)
str(as_list)
# List to data frame (each element becomes a column)
my_list <- list(name = c("A", "B", "C"), value = c(10, 20, 30))
as_df <- as.data.frame(my_list)
print(as_df)
unlist() flattens a list into a vector. Be careful — if the list contains mixed types, everything gets coerced to the most flexible type (usually character).
Useful List Functions
my_list <- list(a = 1:5, b = "hello", c = TRUE, d = list(x = 10))
# Basic info
cat("Length:", length(my_list), "\n")
cat("Names:", names(my_list), "\n")
cat("Is list?", is.list(my_list), "\n")
# Check if a name exists
cat("Has 'a'?", "a" %in% names(my_list), "\n")
cat("Has 'z'?", "z" %in% names(my_list), "\n")
# Rename elements
names(my_list)[2] <- "greeting"
cat("New names:", names(my_list), "\n")
Practice Exercises
Exercise 1: Build a Student Record
# Exercise: Create a list representing a student with:
# - name (character)
# - id (integer)
# - courses: a named numeric vector of grades (Math=92, English=85, Science=90)
# - graduated (logical)
#
# Then: print the student's name, their Science grade,
# and their GPA (mean of all grades)
# Write your code below:
Explanation:student$courses gives you the named vector, then ["Science"] extracts by name. mean() computes the average of all grades.
Exercise 2: Extract Model Information
# Exercise: Fit a linear model predicting mpg from wt in mtcars
# Then extract and print:
# 1. The R-squared value
# 2. The p-value of the wt coefficient
# 3. The predicted MPG for a 3000-lb car (wt = 3.0)
# Hint: Use summary() and coef(), explore with names() and str()
# Write your code below:
Click to reveal solution
# Solution
model <- lm(mpg ~ wt, data = mtcars)
s <- summary(model)
# 1. R-squared
cat("R-squared:", round(s$r.squared, 4), "\n")
# 2. P-value of wt coefficient
pval <- s$coefficients["wt", "Pr(>|t|)"]
cat("P-value for wt:", format(pval, scientific = TRUE), "\n")
# 3. Prediction for 3000-lb car
predicted <- predict(model, newdata = data.frame(wt = 3.0))
cat("Predicted MPG at 3000 lbs:", round(predicted, 1), "\n")
Explanation:summary(model) returns a list with r.squared and a coefficients matrix. The coefficients matrix has rows (variables) and columns (Estimate, Std. Error, t value, p-value). predict() uses the model to estimate new values.
Exercise 3: Process Multiple Datasets
# Exercise: Given a list of temperature readings from 3 cities:
city_temps <- list(
NYC = c(32, 28, 35, 40, 38, 42, 45),
LA = c(65, 68, 70, 72, 75, 73, 71),
Chicago = c(25, 22, 30, 28, 35, 32, 38)
)
# Use sapply to create a summary with mean, min, and max for each city
# Write your code below:
Click to reveal solution
# Solution
city_temps <- list(
NYC = c(32, 28, 35, 40, 38, 42, 45),
LA = c(65, 68, 70, 72, 75, 73, 71),
Chicago = c(25, 22, 30, 28, 35, 32, 38)
)
result <- sapply(city_temps, function(temps) {
c(mean = round(mean(temps), 1),
min = min(temps),
max = max(temps),
range = max(temps) - min(temps))
})
print(result)
Explanation:sapply() applies the anonymous function to each city's temperatures. Since each function call returns a named vector of the same length, sapply() combines them into a matrix — columns are cities, rows are statistics.
Summary
Operation
Code
Returns
Create
list(a = 1, b = "x")
Named list
Access element
list$name or list[["name"]]
The element itself
Subset
list["name"] or list[1:2]
A smaller list
Add element
list$new <- value
Modified list
Remove element
list$old <- NULL
Shorter list
Length
length(list)
Integer
Names
names(list)
Character vector
Apply function
lapply(list, func)
List of results
Simplify apply
sapply(list, func)
Vector/matrix
Flatten
unlist(list)
Vector
When to use lists vs other structures:
Vector — same type, same purpose (a column of numbers)
Data frame — tabular data (rows x columns of equal length)
List — mixed types, different sizes, hierarchical data, model outputs
FAQ
When should I use a list instead of a data frame?
Use a data frame for tabular data — rows and columns where each column has the same number of rows. Use a list when your data is irregularly shaped: elements of different lengths, nested structures, model outputs, or a collection of data frames.
What's the difference between [] and [[]]?
[] returns a sublist (a smaller list). [[]] returns the actual element. If my_list is a train, my_list[1] gives you the first train car (still a train), while my_list[[1]] gives you the cargo inside the first car.
Can I have a list of data frames?
Yes, and this is very common. For example, split(mtcars, mtcars$cyl) returns a list of three data frames (one per cylinder group). The purrr package is designed specifically for working with lists of data frames.
How do I convert a list to a data frame?
If each list element is a vector of the same length: as.data.frame(my_list). For lists of data frames: do.call(rbind, my_list) or dplyr::bind_rows(my_list).
Are data frames actually lists?
Yes! A data frame is technically a list where every element (column) is a vector of the same length. That's why df$column works — it's the same $ operator you use on lists.
What's Next?
With vectors, data frames, and lists covered, you have all of R's core data structures. Next:
R Control Flow — if/else, for loops, while loops
Writing R Functions — encapsulate your logic for reuse
R Special Values — NA, NULL, NaN, Inf in depth
Each tutorial builds on the structures you've learned here.