R Data Frames: Every Operation You'll Need, With 10 Real Examples
A data frame is R's equivalent of a spreadsheet — a table where each column is a vector and each row is an observation. Most real-world R work involves data frames, and mastering them unlocks everything from data cleaning to statistical modeling.
This tutorial covers every essential data frame operation with 10 real-world examples. Every code block is interactive — click Run to execute, edit to experiment, and Reset to restore the original. Variables persist across blocks, so run them in order.
What Is a Data Frame?
A data frame is a list of vectors of equal length, arranged as columns. Each column can have a different type (numeric, character, logical), but all values within a column must be the same type.
Think of it as a spreadsheet: columns are variables, rows are records.
Example 1: Explore a Built-In Dataset
R ships with dozens of built-in datasets. The mtcars dataset has specs for 32 cars from 1974.
The str() function is your best friend. It shows every column's type, the first few values, and the dimensions — all in one call.
Example 2: Access Columns
There are three ways to access a column. The $ syntax is the most readable and common.
Use [[ ]] when the column name is stored in a variable:
Example 3: Access Rows and Subsets
Use [row, column] syntax. Leave a side blank to get all rows or all columns.
Example 4: Filter Rows by Condition
Filtering is the most common operation on data frames. Use logical conditions inside [ ].
Example 5: Add and Remove Columns
Example 6: Add and Remove Rows
Example 7: Sort Data
The order() function returns the row positions in sorted order. The minus sign reverses the sort for numeric columns.
Example 8: Create Summary Statistics
The aggregate() function splits the data by groups and applies a function to each group. The formula mpg ~ cyl means "mpg grouped by cyl."
Example 9: Merge Two Data Frames
Merge types at a glance:
| Argument | Join Type | Keeps |
|---|---|---|
| (default) | Inner | Only matching rows |
all.x = TRUE | Left | All rows from first table |
all.y = TRUE | Right | All rows from second table |
all = TRUE | Full | All rows from both tables |
Example 10: Work with the Iris Dataset
The iris dataset has measurements for 150 flowers across 3 species. Let's combine everything we've learned.
Quick Reference: Essential Data Frame Functions
| Function | Purpose | Example |
|---|---|---|
data.frame() | Create a data frame | data.frame(x = 1:3, y = c("a","b","c")) |
head() / tail() | First/last rows | head(df, 10) |
str() | Structure overview | str(df) |
summary() | Column statistics | summary(df) |
dim() | Rows x columns | dim(df) |
nrow() / ncol() | Row/column count | nrow(df) |
names() | Column names | names(df) |
subset() | Filter rows/columns | subset(df, x > 5) |
order() | Sort rows | df[order(df$x), ] |
merge() | Join two data frames | merge(df1, df2, by = "id") |
aggregate() | Group summaries | aggregate(y ~ x, data = df, FUN = mean) |
rbind() / cbind() | Add rows/columns | rbind(df1, df2) |
sapply() | Apply function to columns | sapply(df, mean) |
Practice Exercises
Exercise 1: Build and Inspect
Create a data frame with 5 products (name, price, quantity). Print its structure and summary.
Show Solution
products <- data.frame(
name = c("Laptop", "Mouse", "Keyboard", "Monitor", "Headset"),
price = c(999, 29, 79, 349, 89),
quantity = c(10, 50, 30, 15, 25),
stringsAsFactors = FALSE
)
str(products)
summary(products)Exercise 2: Filter and Aggregate
Using mtcars, find the average horsepower of cars with more than 20 mpg.
Show Solution
efficient <- mtcars[mtcars$mpg > 20, ]
cat("Number of efficient cars:", nrow(efficient), "\n")
cat("Average horsepower:", round(mean(efficient$hp), 1), "\n")Exercise 3: Sort and Rank
Sort the iris dataset by Petal.Length in descending order and show the top 5 flowers.
Show Solution
sorted_iris <- iris[order(-iris$Petal.Length), ]
head(sorted_iris[, c("Species", "Petal.Length", "Petal.Width")], 5)Exercise 4: Merge Challenge
Create two data frames — one with student names and majors, another with student names and GPAs. Merge them and find the highest GPA per major.
Show Solution
students <- data.frame(
name = c("Alice", "Bob", "Carol", "Dave", "Eve"),
major = c("CS", "Math", "CS", "Math", "CS"),
stringsAsFactors = FALSE
)
grades <- data.frame(
name = c("Alice", "Bob", "Carol", "Dave", "Eve"),
gpa = c(3.8, 3.5, 3.9, 3.7, 3.6),
stringsAsFactors = FALSE
)
combined <- merge(students, grades, by = "name")
aggregate(gpa ~ major, data = combined, FUN = max)FAQ
What is the difference between a data frame and a matrix?
A data frame can have columns of different types (numeric, character, logical). A matrix must have all values of the same type. Use data frames for real-world datasets. Use matrices for mathematical operations.
What is a tibble?
A tibble is a modern version of the data frame from the tidyverse. It prints more neatly, never converts strings to factors, and never changes column names. Create one with tibble::tibble() or convert with tibble::as_tibble().
How do I handle large data frames that are slow?
For data frames with millions of rows, consider the data.table package, which is much faster for grouping, filtering, and joining. Alternatively, use dplyr from the tidyverse for a clean syntax with good performance.
How do I save a data frame to a CSV file?
Use write.csv(df, "filename.csv", row.names = FALSE). The row.names = FALSE prevents R from adding a row number column. To read it back, use read.csv("filename.csv").
How do I convert a matrix to a data frame?
Use as.data.frame(my_matrix). Column names will be V1, V2, etc. unless the matrix had column names.
Conclusion
Data frames are where most R work happens. You now know how to create them, inspect them with str() and summary(), access rows and columns, filter with conditions, add and remove columns, sort, aggregate by groups, and merge multiple tables. These ten operations cover 90% of everyday data manipulation.
For more powerful data wrangling, explore dplyr — it provides a cleaner syntax for the same operations you learned here, plus powerful tools like group_by() and mutate().