R Data Frames Exercises: 15 Practice Questions (Beginner to Advanced, Solved Step-by-Step)
Fifteen focused exercises that take you from creating a data frame from scratch to filtering, grouping, merging and reshaping. Every problem runs in the browser with an expandable worked solution. No downloads, no setup, just code and check.
Data frames are the workhorse of R. Almost every analysis you will ever write consumes or produces one. These exercises use only base R so they work in any R session, including the one embedded in this page. Once you are fluent here, the dplyr version of the same operations will feel obvious.
Setup
Every exercise uses the built-in mtcars and iris datasets, plus one small data frame we construct in the first exercise. The code blocks share state, so what you create in Exercise 1 is available in Exercise 15.
Section 1, Creating and inspecting data frames
Exercise 1. Build a data frame from vectors
Create a data frame students with three columns, name (character), age (integer), score (double), and five rows of your own made-up data. Confirm the types with str().
Solution
In R 4.0 and later, stringsAsFactors = FALSE is the default, the argument is only needed for compatibility with older code.
Exercise 2. Inspect a data frame
Using mtcars, report: its dimensions, the column names, the class of every column, and a compact summary with summary().
Solution
sapply(df, class) is the cleanest way to see the type of every column at once. For big data frames where that is too noisy, use str(df) instead.
Exercise 3. Access one column, four ways
From students, extract the score column using: $, [[, [, "score"], and [, 3]. Confirm all four return the same numeric vector.
Solution
All four return an atomic vector. Note that students["score"] (without the comma) returns a one-column data frame, not a vector, a surprising but consistent rule.
Section 2, Subsetting rows and columns
Exercise 4. Select specific columns
From mtcars, return a new data frame with only mpg, cyl and hp.
Solution
When you use character names instead of integer positions, the code keeps working even if columns are reordered or renamed.
Exercise 5. Drop specific columns
From mtcars, return a data frame with everything except qsec and vs.
Solution
Both idioms are common. The first is safer inside a pipeline; the second is clearer for ad-hoc exploration.
Exercise 6. Filter rows with one condition
Return the rows of mtcars where mpg > 25.
Solution
Note the trailing comma, mtcars[mtcars$mpg > 25] (no comma) tries to index columns, not rows, and will give you something unexpected.
Exercise 7. Filter with multiple conditions
Return the rows of mtcars where mpg > 20 AND cyl == 4, keeping only the columns mpg, cyl, hp.
Solution
Combine the row filter with column selection in a single [ , ] call. Use & for element-wise AND, never &&.
Section 3, Adding and transforming columns
Exercise 8. Add a computed column
Add a column kpl to a copy of mtcars that converts mpg (US miles per US gallon) to kilometres per litre. The conversion is kpl = mpg * 0.425.
Solution
Assigning to mt$kpl creates the column if it does not exist, otherwise overwrites it.
Exercise 9. Conditional column with ifelse
Add a column efficiency to mt with value "high" when mpg > 25, "medium" when mpg is between 15 and 25 inclusive, and "low" otherwise.
Solution
ifelse() is vectorised, it walks the condition element by element. Nest calls to build more than two branches, or switch to dplyr::case_when() for many branches.
Exercise 10. Rename a column
Rename the column wt to weight_1000lbs in mt.
Solution
You assign into names(mt) at the position where the current name matches. This leaves every other column name untouched.
Section 4, Aggregation, merging and reshaping
Exercise 11. Group and summarise with aggregate
Use aggregate() on mtcars to compute the mean mpg for each number of cylinders.
Solution
The formula syntax mpg ~ cyl reads as "mpg by cyl". You can group by multiple variables with mpg ~ cyl + gear.
Exercise 12. Count rows per group
Count how many cars in mtcars have each combination of cyl and gear.
Solution
table() returns a contingency table. Wrapping in as.data.frame() gives you a tidy long-format version with one row per combination.
Exercise 13. Merge two data frames
Create a small data frame cyl_info with columns cyl (4, 6, 8) and fuel_type ("economy", "mid", "performance"). Merge it onto mtcars so every car gets a fuel_type label.
Solution
merge() is base R's SQL-like join. By default it does an inner join on the common column(s). Use all.x = TRUE for a left join, all.y = TRUE for right, and all = TRUE for a full outer join.
Exercise 14. Wide to long with stack
Create a small wide data frame of quarterly sales and reshape it to long format with one row per (quarter, sales) combination.
Solution
stack() is base R's minimal reshape tool. For anything bigger than this, reach for tidyr::pivot_longer(), but stack() is fine for toy examples and quick work.
Exercise 15. Sort by multiple columns
Sort mtcars by cyl ascending and then by mpg descending within each cylinder group. Return the first 10 rows.
Solution
order() accepts multiple keys and returns the row indices. A minus sign in front of a numeric column reverses the sort direction for that column only.
Summary
- Build data frames with
data.frame()and inspect withdim(),str(),summary(),head(). - Subset with
df[row_condition, column_selector]. Always include the comma. - Add columns by assigning into
df$new_col, vectorised arithmetic applies automatically. - Conditional columns:
ifelse()for binary, nestedifelse()ordplyr::case_when()for more branches. - Aggregation:
aggregate(y ~ g, data, FUN). Counts:table(). Joins:merge(). - Sorting:
df[order(...), ], a minus sign reverses a numeric column.