dplyr filter() & select() Exercises: 12 Practice Problems
Twelve hands-on exercises to master dplyr::filter() and select() — from simple condition filtering to helper-function selection like starts_with(), where(), and matches(). Every exercise has a runnable solution you can execute in your browser.
Introduction
Reading about filter() and select() is easy. Using them fluently under real conditions — with compound logic, missing values, and messy column names — takes practice. These 12 exercises give you that practice.
The problems start simple and build up. The first four cover basic row filtering and column picking on the mtcars and starwars datasets. The middle four combine conditions, handle NA values, and use selection helpers. The last four mix filter() and select() in pipelines that resemble real analysis work.
Each exercise states the task, gives you a starter block, and hides the solution behind a click-to-reveal. Try to write your own answer first, run it, compare with the solution, then read the explanation. If you are new to these verbs, review the parent tutorial on dplyr filter() and select() before starting.
All code runs in a shared R session on this page — variables you create in one block are available in the next. Use distinct names like my_result in your exercise code so you do not overwrite tutorial variables from earlier blocks.
Quick Reference
Here is a one-screen cheat sheet before you start. Skim it, then begin Exercise 1.
| Task | Function | Example | ||
|---|---|---|---|---|
| Keep rows matching a condition | filter() |
filter(df, mpg > 20) |
||
| Combine conditions with AND | , or & |
filter(df, mpg > 20, cyl == 4) |
||
| Combine conditions with OR | `\ | ` | `filter(df, cyl == 4 \ | cyl == 6)` |
| Match any of a set | %in% |
filter(df, cyl %in% c(4, 6)) |
||
| Handle missing values | is.na() |
filter(df, !is.na(mpg)) |
||
| Keep columns by name | select() |
select(df, mpg, cyl) |
||
| Drop columns | - |
select(df, -cyl) |
||
| Range of columns | : |
select(df, mpg:hp) |
||
| Names starting with prefix | starts_with() |
select(df, starts_with("d")) |
||
| Columns by type | where() |
select(df, where(is.numeric)) |
||
| Regex match | matches() |
select(df, matches("^d")) |
dplyr and the built-in datasets. Because all blocks on this page share one R session, later blocks can call filter() and select() without reloading.Both datasets are loaded. mtcars has 11 numeric columns across 32 cars. starwars has 14 columns (including list-columns) across 87 characters, with several NA values — perfect for practising real-world filtering.
Easy (1-4): Basic Filtering and Selection
Start here if you are new to dplyr. These four exercises use one verb at a time.
Exercise 1: Filter by a single condition
Find all cars in mtcars with more than 25 miles per gallon. Store the result in my_fuel_efficient.
Click to reveal solution
Explanation: filter() takes a data frame as its first argument and a logical condition as its second. It returns only the rows where the condition is TRUE. Six cars in mtcars have mpg > 25, and all six happen to be 4-cylinder models.
Exercise 2: Select columns by name
From mtcars, keep only the mpg, hp, and wt columns. Save to my_cols.
Click to reveal solution
Explanation: select() picks columns by unquoted name. Column order in the result follows the order you list them — here mpg, hp, wt — not their original order in mtcars.
Exercise 3: Drop columns with the minus sign
From mtcars, remove the vs, am, and carb columns. Save to my_kept. Use a single select() call with the minus sign.
Click to reveal solution
Explanation: Prefixing a column with - drops it. The other eight columns stay in their original order. You could also write select(mtcars, -c(vs, am, carb)) with a vector — both work identically.
Exercise 4: Filter on equality
From starwars, keep only the characters whose species is exactly "Droid". Save to my_droids.
Click to reveal solution
Explanation: Use == (double equals) to test equality. A single = is R's assignment operator and causes an error inside filter(). Six droids appear — note that BB8 has NA for height, which is fine because we filtered on species, not height.
Medium (5-8): Compound Conditions and Helpers
These exercises combine logical operators, handle missing values, and use select() helpers.
Exercise 5: AND conditions
From mtcars, keep cars with mpg > 20 AND cyl == 4. Save to my_efficient_fours. Show the result with select(mpg, cyl, hp).
Click to reveal solution
Explanation: Separating conditions with a comma inside filter() means AND — every condition must be TRUE for the row to be kept. filter(df, a, b) is equivalent to filter(df, a & b). The comma form reads more naturally.
Exercise 6: OR conditions with %in%
From mtcars, keep cars with 4 OR 6 cylinders. Save to my_small_engines. Use %in% rather than |. Count how many rows remain.
Click to reveal solution
Explanation: %in% tests whether each value appears in a vector on the right. It is cleaner than cyl == 4 | cyl == 6 and scales nicely to many values. 18 of 32 cars have 4 or 6 cylinders; the other 14 have 8 cylinders.
Exercise 7: Filter out missing values
From starwars, keep only characters with a known height and mass. Save to my_measured. Then count how many rows remain.
Click to reveal solution
Explanation: is.na(x) returns TRUE for each missing value. Prefixing it with ! negates it — keep rows where the value is NOT missing. Without this step, 28 rows with NA in either column would carry into downstream summaries and break mean() or sum() calculations.
filter(starwars, mass == NA) returns zero rows because NA == anything is NA, not TRUE. Always use is.na() or !is.na() to test for missing values.Exercise 8: Select columns with starts_with()
From starwars, keep only the columns whose names start with "s". Save to my_s_cols. How many columns do you get?
Click to reveal solution
Explanation: starts_with("s") matches any column whose name begins with s (case-insensitive by default). Three columns match: skin_color, species, starships. The related helpers ends_with(), contains(), and matches() (regex) work the same way.
where() to select by type, not name.** select(starwars, where(is.numeric)) keeps only numeric columns regardless of their names. This is invaluable when column names are unpredictable but types are known.Hard (9-12): Combined Pipelines
The final four exercises chain filter() and select() together, the way you would in a real analysis.
Exercise 9: Filter then select in a pipeline
From mtcars, find all 8-cylinder cars and show only mpg, hp, wt — in that order. Save to my_v8s. Use the pipe to chain filter() and select().
Click to reveal solution
Explanation: The pipe |> passes the result of one step into the next. mtcars goes into filter(cyl == 8), which feeds into select(mpg, hp, wt). Order matters: filtering first is usually faster because select() then operates on fewer rows.
Exercise 10: Complex filter with OR and AND
From mtcars, find cars that are (4-cylinder AND mpg > 30) OR (8-cylinder AND hp > 200). Save to my_extremes. Show mpg, cyl, hp.
Click to reveal solution
Explanation: Parentheses control evaluation order. Without them, R reads a & b | c & d as (a & b) | (c & d) by default — which happens to be what we want here. Using parentheses anyway documents your intent and avoids mistakes when the logic is more complex.
filter() and select() are the two most-used dplyr verbs because most analysis questions reduce to "which rows, which columns?"** Once you can compose them fluently with the pipe, the rest of the tidyverse — mutate(), summarise(), group_by() — layers on top of these two foundations.Exercise 11: Select with where() and a range
From mtcars, keep the columns mpg through hp (a range), then drop any of those that are NOT numeric. Save to my_numeric_range. How many columns remain?
Click to reveal solution
Explanation: mpg:hp selects the four columns from mpg to hp in their original position. The & operator intersects two selectors — only columns matching BOTH criteria are kept. Because every column in mtcars is already numeric, all four survive. On a mixed-type data frame, the non-numeric columns would drop out.
Exercise 12: Full analysis pipeline
Build a complete pipeline on starwars: keep only humans with known height, select name, height, mass, homeworld, and keep only those taller than 180cm. Save to my_tall_humans.
Click to reveal solution
Explanation: Three filter conditions stack cleanly: species match, height not missing, height above threshold. Then select() picks the four reporting columns. Reading the pipeline top-to-bottom tells the story: take starwars, keep tall humans with known height, show these four columns. This is the dplyr pattern you will reach for constantly.
NA values in mass and homeworld.** We only filtered on height. If you need non-missing values in every reporting column, extend the filter: filter(!is.na(height), !is.na(mass), !is.na(homeworld)).Summary
| Concept | Key pattern | ||
|---|---|---|---|
| Filter one condition | filter(df, col > x) |
||
| AND conditions | filter(df, cond1, cond2) |
||
| OR conditions | filter(df, col %in% c(a, b)) |
||
| Missing values | filter(df, !is.na(col)) |
||
| Keep columns | select(df, a, b, c) |
||
| Drop columns | select(df, -a, -b) |
||
| By prefix | select(df, starts_with("x")) |
||
| By type | select(df, where(is.numeric)) |
||
| Intersect selectors | select(df, a:c & where(is.numeric)) |
||
| Chain in pipeline | `df \ | > filter(...) \ | > select(...)` |
If you solved all 12 without peeking, you are ready for mutate(), arrange(), and summarise() — the next core dplyr verbs. If you struggled on a few, re-read the parent tutorial and try the failed exercises again tomorrow. Spaced practice beats cramming.
FAQ
**Q: What is the difference between filter() and select()?** filter() works on rows — it keeps rows that satisfy a condition. select() works on columns — it picks or drops columns by name, position, or helper function. You almost always use them together.
**Q: Why do I get an error when I use = inside filter()?** A single = is assignment in R. Equality testing uses == (double equals). Writing filter(mtcars, mpg = 20) attempts to assign and fails. Write filter(mtcars, mpg == 20) instead.
Q: How do I filter rows where a column IS missing? Use filter(df, is.na(col)). The negation !is.na(col) keeps non-missing values. Never write col == NA — it always returns NA, not TRUE.
**Q: Can I use select() to reorder columns?** Yes. select(df, c, a, b) returns the columns in the order you list them. For partial reordering, combine with everything(): select(df, important_col, everything()) puts important_col first and keeps the rest.
**Q: Is filter() before select() the same as select() before filter()?** Logically, yes — you get the same rows and columns. Practically, filter() first is usually faster because subsequent operations run on fewer rows. The exception: you cannot filter() on a column you have already dropped with select().
References
- Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. — R for Data Science, 2nd Edition. Chapter 3: Data transformation. Link
- dplyr documentation —
filter()reference. Link - dplyr documentation —
select()reference. Link - tidyselect — selection helpers (
starts_with,where,matches). Link - Posit — Data transformation with dplyr cheatsheet. Link
- R Core Team — An Introduction to R, Logical vectors and NA. Link
What's Next?
- dplyr filter() and select() Tutorial — the parent tutorial that covers every pattern in depth with explanations
- dplyr Exercises: 15 Practice Problems — broader dplyr practice covering
mutate,summarise,group_by, joins, andacross - data.table vs dplyr — compare the two dominant R data-manipulation frameworks