r-statistics.co by Selva Prabhakaran


R Lists: When Data Frames Aren't Flexible Enough (Complete Guide)

An R list is a container that can hold anything — numbers, text, vectors of different lengths, data frames, other lists, even model outputs. When you need to store related but differently-shaped data together, lists are the answer.

Vectors require all elements to be the same type. Data frames require all columns to be the same length. Lists have no such constraints. They're R's most flexible data structure, and they're everywhere — every statistical model returns a list, every JSON/API response is a list, and every time you need to group together heterogeneous objects, you use a list.

Introduction

A list is an ordered collection where each element can be any type and any size:

  • A single number
  • A character vector of length 100
  • A data frame with 1,000 rows
  • Another list (nesting)
  • A function
  • A model object

Think of a list as a filing cabinet — each drawer can hold a completely different kind of document. Drawer 1 might hold a single page, drawer 2 a thick folder, and drawer 3 another filing cabinet.

# A list that holds different types and sizes person <- list( name = "Alice", age = 30, scores = c(88, 92, 75, 95), address = data.frame( street = "123 Main St", city = "Springfield", state = "IL" ), employed = TRUE ) # See the structure str(person)

  

One person, five pieces of data, all different shapes. A vector or data frame can't hold this — but a list can.

Creating Lists

The list() function

# Named list (most common — always name your elements) config <- list( database = "production", port = 5432, tables = c("users", "orders", "products"), verbose = FALSE ) str(config) # Unnamed list (works but harder to use) unnamed <- list(42, "hello", TRUE) cat("Unnamed list elements:", "\n") str(unnamed)

  

Best practice: Always name your list elements. my_list$name is much clearer than my_list[[1]].

From existing objects

# Combine existing objects into a list x <- 1:10 y <- c("a", "b", "c") z <- data.frame(id = 1:3, value = c(10, 20, 30)) combined <- list(numbers = x, letters = y, data = z) cat("Elements:", length(combined), "\n") cat("Names:", names(combined), "\n") str(combined)

  

Empty list (to fill later)

# Start empty, add elements in a loop results <- list() for (i in 1:3) { results[[paste0("run_", i)]] <- rnorm(5, mean = i * 10) } str(results)

  

This pattern — creating an empty list and filling it in a loop — is common when running multiple analyses or simulations.

Accessing List Elements: [] vs [[]] vs $

This is the #1 source of confusion with lists. R has three ways to access list elements, and they return different things:

fruits <- list( first = "apple", second = c("banana", "blueberry"), third = "cherry" ) # $ — access by name (returns the element itself) cat("$ operator:", fruits$first, "\n") cat("Type:", class(fruits$first), "\n\n") # [[ ]] — access by name or position (returns the element itself) cat("[[ ]] by name:", fruits[["second"]], "\n") cat("[[ ]] by position:", fruits[[2]], "\n") cat("Type:", class(fruits[[2]]), "\n\n") # [ ] — subset (returns a smaller LIST, not the element) sub <- fruits[1] cat("[ ] returns a LIST:\n") str(sub) cat("Type:", class(sub), "\n")

  

Here's the critical difference:

SyntaxReturnsAnalogy
list$nameThe element itselfTake the item OUT of the box
list[["name"]]The element itselfTake the item OUT of the box
list["name"]A list containing that elementGive me the box with the item still in it
list[c(1,3)]A sublist with elements 1 and 3Give me boxes 1 and 3

The train car analogy: [1] gives you a train car (still a train). [[1]] gives you the contents inside the train car.

# Practical demonstration of the difference my_list <- list(nums = c(10, 20, 30)) # This works — [[]] gives you the vector cat("Sum with [[]]:", sum(my_list[["nums"]]), "\n") # This fails — [] gives you a list, and sum() doesn't work on lists # sum(my_list["nums"]) # Would error! cat("my_list['nums'] is a", class(my_list["nums"]), "— can't sum a list\n")

  

Rule of thumb: Use $ or [[]] to get at the actual data. Use [] only when you want a sublist.

Modifying Lists

Change existing elements

config <- list(host = "localhost", port = 8080, debug = TRUE) cat("Before:", config$port, "\n") config$port <- 5432 config[["debug"]] <- FALSE cat("After port:", config$port, "\n") cat("After debug:", config$debug, "\n") str(config)

  

Add new elements

config <- list(host = "localhost", port = 8080) cat("Before:", length(config), "elements\n") # Add by name config$database <- "mydb" config[["timeout"]] <- 30 cat("After:", length(config), "elements\n") str(config)

  

Remove elements

config <- list(host = "localhost", port = 8080, debug = TRUE, temp = "delete_me") cat("Before:", names(config), "\n") # Set to NULL to remove config$temp <- NULL config[["debug"]] <- NULL cat("After:", names(config), "\n") str(config)

  

Setting an element to NULL removes it entirely — the list shrinks. This is different from vectors, where you can't set elements to NULL.

Nested Lists

Lists can contain other lists, creating hierarchical structures. This is how JSON data, API responses, and complex configurations are represented in R:

# A company with departments company <- list( name = "Acme Corp", founded = 2010, departments = list( engineering = list( head = "Alice", size = 25, projects = c("Backend", "Frontend", "DevOps") ), marketing = list( head = "Bob", size = 12, projects = c("Brand", "Digital", "Events") ) ) ) # Access nested elements by chaining $ or [[]] cat("Company:", company$name, "\n") cat("Engineering head:", company$departments$engineering$head, "\n") cat("Marketing size:", company$departments$marketing$size, "\n") cat("Eng projects:", company$departments$engineering$projects, "\n")

  

Each $ digs one level deeper. You can mix $ and [[]]:

# These all do the same thing company <- list(departments = list(eng = list(head = "Alice"))) cat("Method 1:", company$departments$eng$head, "\n") cat("Method 2:", company[["departments"]][["eng"]][["head"]], "\n") cat("Method 3:", company$departments[["eng"]]$head, "\n")

  

Real-World Use Case: Model Results

The most common place you'll encounter lists is in statistical model output. Every model in R returns a list:

# Fit a linear regression model model <- lm(mpg ~ wt + hp, data = mtcars) # The model object is a list! cat("Type:", class(model), "\n") cat("Elements:", length(model), "\n") cat("Names:", paste(names(model), collapse = ", "), "\n")

  
# Access specific parts of the model model <- lm(mpg ~ wt + hp, data = mtcars) # Coefficients cat("Intercept:", coef(model)[1], "\n") cat("Weight effect:", round(coef(model)["wt"], 3), "\n") cat("HP effect:", round(coef(model)["hp"], 3), "\n") # R-squared from summary s <- summary(model) cat("R-squared:", round(s$r.squared, 4), "\n") cat("Adj R-squared:", round(s$adj.r.squared, 4), "\n") # Residuals cat("First 5 residuals:", round(model$residuals[1:5], 2), "\n")

  

This is why understanding lists is essential — you can't effectively use model results without knowing how to navigate list structures.

Iterating Over Lists: lapply() and sapply()

Loops work on lists, but R provides more elegant alternatives:

# A list of numeric vectors data_sets <- list( set_a = c(23, 45, 12, 67), set_b = c(89, 34, 56, 78, 90), set_c = c(11, 22, 33) ) # lapply — apply a function to each element, returns a LIST means <- lapply(data_sets, mean) cat("lapply result (list):\n") str(means) # sapply — same but simplifies to a vector when possible means_vec <- sapply(data_sets, mean) cat("\nsapply result (vector):", means_vec, "\n") # Custom function with sapply sapply(data_sets, function(x) { c(mean = round(mean(x), 1), max = max(x), n = length(x)) })

  
FunctionReturnsUse when
lapply()Always a listYou want consistent output
sapply()Vector/matrix if possibleYou want simplified output
vapply()Type-safe vectorYou want guaranteed output type

lapply() is the safest — it always returns a list, never surprises you. sapply() is convenient but can return unexpected types with edge cases.

Converting Between Lists and Other Types

# List to vector (only works if all elements are single values) my_list <- list(a = 1, b = 2, c = 3) my_vec <- unlist(my_list) cat("Unlisted:", my_vec, "\n") cat("Type:", class(my_vec), "\n") cat("Names preserved:", names(my_vec), "\n")

  
# Data frame to list (each column becomes a list element) df <- data.frame(x = 1:3, y = c("a", "b", "c")) as_list <- as.list(df) str(as_list) # List to data frame (each element becomes a column) my_list <- list(name = c("A", "B", "C"), value = c(10, 20, 30)) as_df <- as.data.frame(my_list) print(as_df)

  

unlist() flattens a list into a vector. Be careful — if the list contains mixed types, everything gets coerced to the most flexible type (usually character).

Useful List Functions

my_list <- list(a = 1:5, b = "hello", c = TRUE, d = list(x = 10)) # Basic info cat("Length:", length(my_list), "\n") cat("Names:", names(my_list), "\n") cat("Is list?", is.list(my_list), "\n") # Check if a name exists cat("Has 'a'?", "a" %in% names(my_list), "\n") cat("Has 'z'?", "z" %in% names(my_list), "\n") # Rename elements names(my_list)[2] <- "greeting" cat("New names:", names(my_list), "\n")

  

Practice Exercises

Exercise 1: Build a Student Record

# Exercise: Create a list representing a student with: # - name (character) # - id (integer) # - courses: a named numeric vector of grades (Math=92, English=85, Science=90) # - graduated (logical) # # Then: print the student's name, their Science grade, # and their GPA (mean of all grades) # Write your code below:

  
Click to reveal solution
# Solution student <- list( name = "Alex Johnson", id = 12345L, courses = c(Math = 92, English = 85, Science = 90), graduated = FALSE ) cat("Name:", student$name, "\n") cat("Science grade:", student$courses["Science"], "\n") cat("GPA:", round(mean(student$courses), 1), "\n")

  

Explanation: student$courses gives you the named vector, then ["Science"] extracts by name. mean() computes the average of all grades.

Exercise 2: Extract Model Information

# Exercise: Fit a linear model predicting mpg from wt in mtcars # Then extract and print: # 1. The R-squared value # 2. The p-value of the wt coefficient # 3. The predicted MPG for a 3000-lb car (wt = 3.0) # Hint: Use summary() and coef(), explore with names() and str() # Write your code below:

  
Click to reveal solution
# Solution model <- lm(mpg ~ wt, data = mtcars) s <- summary(model) # 1. R-squared cat("R-squared:", round(s$r.squared, 4), "\n") # 2. P-value of wt coefficient pval <- s$coefficients["wt", "Pr(>|t|)"] cat("P-value for wt:", format(pval, scientific = TRUE), "\n") # 3. Prediction for 3000-lb car predicted <- predict(model, newdata = data.frame(wt = 3.0)) cat("Predicted MPG at 3000 lbs:", round(predicted, 1), "\n")

  

Explanation: summary(model) returns a list with r.squared and a coefficients matrix. The coefficients matrix has rows (variables) and columns (Estimate, Std. Error, t value, p-value). predict() uses the model to estimate new values.

Exercise 3: Process Multiple Datasets

# Exercise: Given a list of temperature readings from 3 cities: city_temps <- list( NYC = c(32, 28, 35, 40, 38, 42, 45), LA = c(65, 68, 70, 72, 75, 73, 71), Chicago = c(25, 22, 30, 28, 35, 32, 38) ) # Use sapply to create a summary with mean, min, and max for each city # Write your code below:

  
Click to reveal solution
# Solution city_temps <- list( NYC = c(32, 28, 35, 40, 38, 42, 45), LA = c(65, 68, 70, 72, 75, 73, 71), Chicago = c(25, 22, 30, 28, 35, 32, 38) ) result <- sapply(city_temps, function(temps) { c(mean = round(mean(temps), 1), min = min(temps), max = max(temps), range = max(temps) - min(temps)) }) print(result)

  

Explanation: sapply() applies the anonymous function to each city's temperatures. Since each function call returns a named vector of the same length, sapply() combines them into a matrix — columns are cities, rows are statistics.

Summary

OperationCodeReturns
Createlist(a = 1, b = "x")Named list
Access elementlist$name or list[["name"]]The element itself
Subsetlist["name"] or list[1:2]A smaller list
Add elementlist$new <- valueModified list
Remove elementlist$old <- NULLShorter list
Lengthlength(list)Integer
Namesnames(list)Character vector
Apply functionlapply(list, func)List of results
Simplify applysapply(list, func)Vector/matrix
Flattenunlist(list)Vector

When to use lists vs other structures:

  • Vector — same type, same purpose (a column of numbers)
  • Data frame — tabular data (rows x columns of equal length)
  • List — mixed types, different sizes, hierarchical data, model outputs

FAQ

When should I use a list instead of a data frame?

Use a data frame for tabular data — rows and columns where each column has the same number of rows. Use a list when your data is irregularly shaped: elements of different lengths, nested structures, model outputs, or a collection of data frames.

What's the difference between [] and [[]]?

[] returns a sublist (a smaller list). [[]] returns the actual element. If my_list is a train, my_list[1] gives you the first train car (still a train), while my_list[[1]] gives you the cargo inside the first car.

Can I have a list of data frames?

Yes, and this is very common. For example, split(mtcars, mtcars$cyl) returns a list of three data frames (one per cylinder group). The purrr package is designed specifically for working with lists of data frames.

How do I convert a list to a data frame?

If each list element is a vector of the same length: as.data.frame(my_list). For lists of data frames: do.call(rbind, my_list) or dplyr::bind_rows(my_list).

Are data frames actually lists?

Yes! A data frame is technically a list where every element (column) is a vector of the same length. That's why df$column works — it's the same $ operator you use on lists.

What's Next?

With vectors, data frames, and lists covered, you have all of R's core data structures. Next:

  1. R Control Flow — if/else, for loops, while loops
  2. Writing R Functions — encapsulate your logic for reuse
  3. R Special Values — NA, NULL, NaN, Inf in depth

Each tutorial builds on the structures you've learned here.