tidyr Reshaping Exercises: 10 pivot_longer & pivot_wider Problems
These 10 runnable tidyr exercises take you from basic pivot_longer() and pivot_wider() calls to advanced tricks like names_sep, names_pattern, and the .value sentinel, with a click-to-reveal solution for every problem.
How do you reshape wide data into long format with pivot_longer()?
Most datasets arrive in wide shape, one row per subject, one column per measurement, but ggplot2 and dplyr expect long shape: one row per observation. pivot_longer() is the bridge. Let's load tidyr and reshape the built-in relig_income dataset on US religious groups and their income brackets, so you can see the shape of a pivot call before writing one yourself. Every exercise below runs in the same R session, so variables carry over.
relig_income started as 18 rows × 11 columns; after pivoting it is 180 rows × 3 columns, every religion × income combination is now its own row. The three arguments did all the work: cols = -religion picks every column except religion, names_to names the new key column, and values_to names the new value column. This is the canonical wide-to-long move you'll reach for 90% of the time.
relig_income, billboard, fish_encounters, and household all ship with the tidyr package, so you can run every exercise in this page without touching the file system.Try it: Reshape relig_income but name the new columns bracket and n instead. Save to ex1_long and check the row count.
Click to reveal solution
Explanation: The names_to and values_to arguments are just labels, rename them freely. 18 religions × 10 brackets = 180 rows.
Try it: The billboard dataset has one row per song and weekly rank columns wk1, wk2, ..., wk76. Reshape it so there are two new columns, week and rank. Save to ex2_long.
Click to reveal solution
Explanation: wk1:wk76 selects the contiguous range of week columns. The three id columns (artist, track, date.entered) are preserved as-is; the 76 week columns collapse into two, giving 24,092 rows total.
How do you turn long data back into wide format with pivot_wider()?
pivot_wider() is the mirror image of pivot_longer(). It takes a key column and a value column and spreads them back across new columns, perfect for presenting results or for undoing a reshape you did earlier in a pipeline. The two arguments that matter most are names_from (which column's values become the new column headers) and values_from (which column fills the cells).
The 180-row long frame collapses back to 18 × 11, the exact shape of the original relig_income. That is the invariant you want for any round-trip reshape: pivot_wider(pivot_longer(x)) should return x (up to column ordering). When it doesn't, you have duplicate keys hiding in your data.
pivot_wider(pivot_longer(x)) changes the row count or introduces NAs you didn't have before, your (id, key) pairs are not unique, inspect with janitor::get_dupes() or a quick count().Try it: The fish_encounters dataset tracks which fish were seen at which river monitoring station. Pivot it so that each station becomes its own column, with the seen value in the cells. Save to ex3_wide.
Click to reveal solution
Explanation: Fish 4844 wasn't seen at the last four stations, so those cells are NA, pivot_wider() produces NA wherever a (fish, station) pair is missing from the input.
Try it: Repeat the pivot but replace missing encounters with 0 instead of NA. Save to ex4_wide.
Click to reveal solution
Explanation: values_fill = 0 replaces every cell that would otherwise be NA. Pass a named list, e.g. values_fill = list(seen = 0), when you have multiple value columns that need different defaults.
How do you pick which columns to pivot with tidyselect helpers?
Listing columns by name is fine for small datasets, but the moment your column list grows, tidyselect helpers make the pivot call shorter and safer. Inside cols =, you can use starts_with(), ends_with(), matches() (regex), and where(is.numeric), the same helpers you already use in dplyr::select(). Let's see it on iris, where the four measurement columns are all numeric and Species is a factor.
The call pivoted all four numeric columns in one stroke without naming any of them. iris went from 150 × 5 to 600 × 3, exactly 150 × 4 value rows plus the untouched Species id column. Same result as cols = Sepal.Length:Petal.Width, but this version survives a column rename.
where(is.numeric) is the fastest way to pivot every numeric column. It is also a lifesaver in messy survey data where new numeric questions keep being added, you pivot once and forget.Try it: Pivot iris so that only the Petal.Length and Petal.Width columns become rows (leave the sepal columns as-is). Save to ex5_long.
Click to reveal solution
Explanation: starts_with("Petal") matches exactly Petal.Length and Petal.Width. The sepal columns stay put as id columns, so the result has 5 columns (2 sepals + Species + 2 new) and 300 rows.
How do you split compound column names with names_sep and names_pattern?
Real datasets love to cram multiple variables into one header, sales_2023_q1, mpg_city_2022, new_sp_m014. pivot_longer() can split those headers during the pivot instead of leaving you with a messy string you have to separate later. Pass a vector to names_to (one name per piece) and tell tidyr how to cut the header with either names_sep (a delimiter) or names_pattern (a regex with capture groups).
names_prefix = "sales_" strips the common prefix first, then names_sep = "_" splits the remainder into exactly the two pieces named in names_to. Four columns become two (year, quarter) plus one sales value column, no messy post-pivot string surgery required.
names_sep and names_pattern do the same job; the delimiter decides which. Use names_sep whenever a clean character (or position) separates the fields. Use names_pattern when the pieces run together, for example wk12 (letters + digits with no gap), where you need a regex capture group to pull the number out.Try it: Reshape billboard so the week number is stored as an integer in a column called week. Use names_pattern to strip the wk prefix, and names_transform to cast the result to integer. Save to ex6_long.
Click to reveal solution
Explanation: (\\d+) captures only the digits after wk, so week comes out as "1", "2", .... names_transform then casts the string to integer. values_drop_na = TRUE discards weeks where a song wasn't on the chart, shrinking the result from ~24,000 rows to about 5,300.
Try it: Given the tribble below with mpg_<road>_<year> columns, reshape it so there is one row per (make, road, year) with an mpg value column. Save to ex7_long.
Click to reveal solution
Explanation: names_sep = "_" cuts each header into three pieces, mpg, the road type, and the year, matching the three names in names_to. The metric column is constant here; in a real dataset it would hold multiple metrics like "mpg" and "kWh".
How do you reshape multiple value columns at once using .value?
What if a single row holds two values that should become two columns after the pivot? The built-in household dataset is the classic example: each row is a family, and the columns dob_child1, dob_child2, name_child1, name_child2 encode both a child identifier and a measurement type (dob or name). Stacking everything into one value column would mix dates and names, which tidyr refuses to do. The answer is the special .value sentinel, it tells pivot_longer() that this piece of the header should become the output column name.
.value sits in the names_to slot where dob/name lived in the original headers, and "child" takes the slot where child1/child2 lived. The result has two separate typed columns, dob as a date and name as a character, instead of one messy column. Family 2's child2 row has NA in both because that family only had one child in the source data.
.value slot in names_to. The other slot becomes the new key column. If you try to use .value twice, or forget it entirely when you have multiple value types, the pivot fails with a type-mismatch error.Try it: Given the tribble below, reshape it so there is one row per (student, year) with two value columns math and read. Save to ex8_long.
Click to reveal solution
Explanation: After names_prefix trims "score_", each remaining header looks like math_2023 or read_2024. names_sep = "_" then hands the first piece to .value (which becomes the column name) and the second piece to year. Two headers per year collapse into two columns per row.
Practice Exercises
The two capstone problems below combine everything above. They need more than one pivot call each, treat them like mini workflows and sketch the target shape on paper before you start typing.
Exercise 9: Round-trip, long, summarise, wide again
Starting from relig_income, compute each income bracket's share of its religion's total. Your result should have one row per religion and one column per bracket (so the same shape as relig_income), but with proportions instead of raw counts. Save the final wide tibble to ex9_result.
Click to reveal solution
Explanation: Pivoting long first lets group_by(religion) |> mutate(share = n / sum(n)) divide each count by its religion total. Pivoting back wide returns the original 18 × 11 shape, but the cells now hold proportions that sum to 1 within each row. This long-compute-wide pattern is the reason pivot_longer() and pivot_wider() are always taught together.
Exercise 10: Clean a messy wide dataset with compound headers and missing values
The tribble below holds quarterly store results. Column names encode three things, quarter, metric, currency, and a few cells are NA. Reshape it into a tidy frame with columns store, quarter, sales, returns, then drop any row where both sales and returns are missing. Save to ex10_tidy.
Click to reveal solution
Explanation: names_to = c("quarter", ".value", "currency") splits each header into three pieces: "q1"/"q2" become the quarter key, "sales"/"returns" become the output column names (thanks to .value), and "usd" goes into currency. Dropping the constant currency column and filtering out the all-NA rows yields a clean 4-row tidy frame.
Complete Example
To wrap up, let's string the techniques together into a mini workflow on billboard: find the three tracks that stayed on the charts the longest, then display their weekly ranks in wide form. This is the kind of long-compute-wide loop you'll run every week in a real analysis job.
One pivot_longer() tidies the raw chart data, a dplyr pipeline finds the top-3 most persistent tracks, and a final pivot_wider() reshapes just those three tracks back to a wide weekly view, all from the same input frame. names_glue = "wk{week}" rebuilds the original wk1, wk2, ... column naming so the final table matches the publication convention of the source.
Summary
| Problem type | Function | Key argument(s) |
|---|---|---|
| Wide → long, simple range | pivot_longer() |
cols, names_to, values_to |
| Long → wide (inverse) | pivot_wider() |
names_from, values_from |
| Fill missing cells on widen | pivot_wider() |
values_fill |
| Pick columns programmatically | pivot_longer() |
cols = where(is.numeric) / starts_with() |
| Split compound headers on a delimiter | pivot_longer() |
names_sep, names_prefix |
| Split compound headers with regex | pivot_longer() |
names_pattern = "wk(\\d+)" |
| Multiple value columns at once | pivot_longer() |
.value in names_to |
pivot_wider(pivot_longer(x)) doesn't reproduce x, your data has duplicate keys, fix them before modelling, not after.References
- tidyr, pivot_longer() reference. Link
- tidyr, pivot_wider() reference. Link
- tidyr, Pivoting vignette. Link
- Wickham, H. & Grolemund, G., R for Data Science (2e), Chapter 5: Data tidying. Link
- CRAN tidyr, Pivoting vignette. Link
Continue Learning
- pivot_longer() and pivot_wider(): Reshape Data in R Without Losing Your Mind, the parent tutorial walks through every argument you practised here, with extra worked examples.
- dplyr group_by() & summarise() Exercises, grouping and summarising exercises that pair naturally with reshaping.
- dplyr Exercises, a broader dplyr practice set covering filter, select, mutate, and summarise.