tidyr Reshaping Exercises: 10 pivot_longer & pivot_wider Problems
Ten hands-on exercises drill pivot_longer() and pivot_wider() from basic column stacking to advanced multi-value reshaping — run every solution in your browser.
Introduction
Reading about reshaping is one thing. Doing it under time pressure with unfamiliar column names is another. These ten problems close that gap. Each one targets a specific pivot_longer() or pivot_wider() skill that trips up real analysts.
You will start with a simple wide-to-long conversion, then work through round-trip reshaping, column-name prefixes, missing-value fills, compound column names, aggregation during pivoting, regex-based name parsing, and multi-value pivots. By Exercise 10 you will combine several techniques on a single messy table.
If pivot_longer() and pivot_wider() are new to you, read the parent pivot_longer & pivot_wider tutorial first. Otherwise, run the setup block and begin. All code here runs in one shared R session. Use distinct variable names like ans1, ans2 in your own attempts so you do not overwrite the tutorial datasets.
Setup: The Datasets We Will Use
Before the exercises, we build three small tables. They are tiny on purpose so you can eyeball every row and verify your answers by hand. Run this block once. Every exercise after this assumes these objects exist.
The students table is wide: one row per student, one column per subject. The quarterly table is long: one row per region-quarter combination. The weather table has prefixed columns (temp_jan, rain_jan) encoding both the measurement type and the month.
Look at the column names in weather. Each encodes two things: the measurement type (temp or rain) and the month (jan or jul). Exercises 4 and 6 exploit this structure.
students, quarterly, and weather. If you reset the R session, run Setup again before continuing.Warm-Up: Basic Reshaping (Exercises 1-3)
These three exercises each use one pivot call. If you get them right, you understand the core mechanics. Expected output shapes are given so you can verify.
Exercise 1: Stack subject columns into long format
Use pivot_longer() to reshape students so each row holds one student-subject-score combination. Name the new columns subject and score. Save to ans1. Expected: 9 rows, 3 columns.
Click to reveal solution
Explanation: pivot_longer() takes the three subject columns and stacks them into two new columns. The column names become values in subject. The cell values become values in score. The student column repeats for each subject, giving 3 students times 3 subjects = 9 rows.
Exercise 2: Spread a long table to wide format
Use pivot_wider() to reshape quarterly so each region gets one row and each quarter becomes its own column. The cell values should come from revenue. Save to ans2. Expected: 2 rows, 3 columns (region, Q1, Q2).
Click to reveal solution
Explanation: pivot_wider() reads each unique value in quarter ("Q1", "Q2") and creates a column for it. The values in revenue fill the cells. The region column stays as the row identifier automatically because it is not named in names_from or values_from.
Exercise 3: Round-trip reshaping
Pivot students to long format (same as Exercise 1), then immediately pivot back to wide. Save the long version to ans3_long and the recovered wide version to ans3_wide. Verify that ans3_wide matches the original students table. Expected: 3 rows, 4 columns in ans3_wide.
Click to reveal solution
Explanation: pivot_longer() and pivot_wider() are inverses. Going long then wide recovers the original shape. The column order may differ from the original, but the data is identical. Round-trip reshaping is a useful sanity check when you are unsure whether a pivot lost information.
Core Challenges: Arguments and Edge Cases (Exercises 4-7)
Warm-up done. These four exercises use specific arguments that handle real-world messiness: column prefixes, missing value fills, compound column names, and duplicate-key aggregation.
Exercise 4: Strip a column prefix during pivot_longer
The weather table has columns like temp_jan, temp_jul, rain_jan, rain_jul. Pivot only the temperature columns (temp_jan, temp_jul) to long format. Use names_prefix to remove the "temp_" prefix so the month column contains just "jan" and "jul". Save to ans4. Expected: 4 rows.
Click to reveal solution
Explanation: names_prefix = "temp_" strips the leading text from each column name before placing it in month. Without it, the month column would contain "temp_jan" instead of "jan". The rain columns stay untouched because they were not included in cols. Use starts_with() to select columns by prefix cleanly.
Exercise 5: Fill missing combinations with a default value
The orders_long table below has sales by product and store, but not every product sold in every store. Pivot it wider so each store becomes a column. Fill missing combinations with 0 instead of NA. Save to ans5. Expected: 3 rows, 4 columns, no NA values.
Click to reveal solution
Explanation: Cherry was only sold in the East store. Without values_fill, the West column for Cherry would be NA. Setting values_fill = list(units = 0) replaces those missing cells with zero. This is essential when you need a complete matrix for plotting or modeling and zeros are meaningful.
Exercise 6: Split compound column names with names_sep
All four measurement columns in weather encode two pieces of information: the measurement type and the month, separated by an underscore. Pivot all four columns to long format using names_sep = "_" to split them into a measure column and a month column. Save to ans6. Expected: 4 rows, 4 columns (city, month, temp, rain).
Wait — that shape requires two steps. First pivot long to get measure and month, then pivot wide on measure to get temp and rain as separate columns. Save the final result to ans6.
Click to reveal solution
Explanation: names_sep = "_" splits each column name at the underscore. "temp_jan" becomes measure = "temp" and month = "jan". The intermediate long table has 8 rows with columns city, measure, month, and value. The second pivot spreads measure back into separate temp and rain columns. This two-step reshape is the standard pattern for compound column names.
names_sep is simpler. If the structure is irregular, reach for names_pattern (see Exercise 8).Exercise 7: Aggregate duplicates during pivot_wider
The survey table below has duplicate entries: the same person answered the same question twice. Pivot it wider so each question becomes a column. Use values_fn = list(answer = mean) to average duplicate answers. Save to ans7. Expected: 2 rows, 3 columns, no warnings.
Click to reveal solution
Explanation: Ali answered Q1 twice (4 and 6). Without values_fn, pivot_wider() would produce a list-column and warn you. Setting values_fn = list(answer = mean) averages the duplicates: (4 + 6) / 2 = 5. Bo answered Q2 twice (3 and 5), averaged to 4. Use values_fn whenever your data has duplicate key combinations and you want a scalar, not a list.
Advanced Problems: Real-World Reshaping (Exercises 8-10)
The last three exercises push further. You will parse structured column names with regex, pivot multiple value columns at once, and combine several techniques on a single messy table.
Exercise 8: Extract structured names with names_pattern
The clinic table below has columns like bp_sys_baseline and bp_dia_week4. Each column name encodes three pieces: a prefix (bp), a measure (sys or dia), and a timepoint (baseline or week4). The prefix is constant and useless. Use pivot_longer() with names_pattern to extract measure and timepoint while discarding the prefix. Save to ans8. Expected: 8 rows, 4 columns (patient, measure, timepoint, value).
Click to reveal solution
Explanation: names_pattern takes a regex with one capture group per entry in names_to. The pattern "bp_(\\w+)_(\\w+)" matches bp_, then captures everything up to the next underscore as measure, then captures the rest as timepoint. The bp_ prefix is consumed but not captured, so it disappears from the result. This is more powerful than names_sep because you control exactly which parts to keep.
names_to has two entries, your regex needs exactly two capture groups (...). A mismatch throws an error that does not always explain the cause clearly.Exercise 9: Pivot wider with multiple value columns
The results_long table below stores both a score and a grade for each student-subject combination. Pivot it wider so each subject becomes two columns: math_score, math_grade, sci_score, sci_grade. Save to ans9. Expected: 2 rows, 5 columns.
Click to reveal solution
Explanation: When you pass multiple columns to values_from, pivot_wider() creates one set of new columns per value column. The naming pattern is {value}_{name} by default. You get score_math, score_sci, grade_math, grade_sci. To reverse the naming to math_score, pass names_glue = "{subject}_{.value}".
Exercise 10: Reshape a messy table with combined techniques
The messy table below came from a spreadsheet. Column names mix a year and a metric (sales_2024, sales_2025, cost_2024, cost_2025). Your goal: produce a tidy table with columns region, year, sales, and cost. Use pivot_longer() with names_sep to split the columns, then pivot_wider() to separate sales and cost. Save to ans10. Expected: 4 rows, 4 columns.
Click to reveal solution
Explanation: The first pivot splits "sales_2024" into metric = "sales" and year = "2024", producing an 8-row intermediate table with columns region, metric, year, and value. The second pivot spreads metric back into separate sales and cost columns. This long-then-wide pattern is the universal fix for spreadsheet tables that encode two variables in the column header.
Common Mistakes and How to Fix Them
Mistake 1: Including the ID column in cols
Accidentally including the identifier column in cols drops it from the output and merges all rows together.
Bad:
Good:
Always exclude identifier columns with -col_name or name the value columns explicitly with c(col1, col2).
Mistake 2: Duplicate keys create list-columns in pivot_wider
If the combination of identifier + name is not unique, pivot_wider() produces list-columns instead of scalars and prints a warning.
Fix: either deduplicate before pivoting with distinct() or slice_max(), or use values_fn to aggregate (like Exercise 7).
Mistake 3: Numeric values become character after pivot_longer
When you pivot columns of mixed types (numeric and character) together, R coerces everything to character. Check that all columns in cols share the same type.
The score value 90 became character "90". Fix: pivot only columns of the same type, or use values_transform = list(val = as.numeric) when you know all values should be numeric.
Summary
Here is a quick reference for the key arguments you practised across all ten exercises.
| Argument | Function | What it does | Exercise |
|---|---|---|---|
cols |
pivot_longer() |
Which columns to stack | 1, 3, 4, 6, 8, 10 |
names_to |
pivot_longer() |
Name of the new key column | 1, 3, 4, 6, 8, 10 |
values_to |
pivot_longer() |
Name of the new value column | 1, 3, 4, 8, 10 |
names_prefix |
pivot_longer() |
Strip a prefix from column names | 4 |
names_sep |
pivot_longer() |
Split column names at a delimiter | 6, 10 |
names_pattern |
pivot_longer() |
Extract parts of column names via regex | 8 |
names_from |
pivot_wider() |
Column whose values become new column names | 2, 3, 6, 7, 9, 10 |
values_from |
pivot_wider() |
Column(s) whose values fill the new columns | 2, 3, 6, 7, 9, 10 |
values_fill |
pivot_wider() |
Default value for missing combinations | 5 |
values_fn |
pivot_wider() |
Aggregation function for duplicate keys | 7 |
FAQ
When should I use pivot_longer versus pivot_wider?
Use pivot_longer() when your column names contain data (like years, subjects, or measurement types) and you want them as values in a single column. Use pivot_wider() when a single column contains variable names and you want each one as a separate column. Most raw datasets need pivot_longer() first.
How do I pivot only some columns?
Use tidy-select helpers inside cols: starts_with("temp"), ends_with("_score"), matches("^Q\\d"), or just name them c(col1, col2). To exclude columns, use -col_name or !col_name.
What replaced gather() and spread()?
pivot_longer() replaced gather() and pivot_wider() replaced spread(). The tidyr team introduced the pivot functions in 2019 because the argument names (key, value) in gather/spread were confusing. The pivot versions have clearer names (names_to, values_from) and support advanced features like names_sep and values_fn.
Can I pivot multiple value columns at once?
Yes. Pass a vector to values_from in pivot_wider(), like values_from = c(score, grade). Each value column generates its own set of new columns. See Exercise 9 for a worked example.
References
- Wickham, H., Vaughan, D., & Girlich, M. — tidyr: Tidy Messy Data. tidyr pivot vignette. Link
- Wickham, H. & Grolemund, G. — R for Data Science, 2nd Edition. Chapter 5: Data Tidying. Link
- tidyr documentation — pivot_longer() reference. Link
- tidyr documentation — pivot_wider() reference. Link
- Stanford Data Challenge Lab — Advanced Pivoting. Link
- Data Carpentry — R for Social Scientists: Data Wrangling with tidyr. Link
- Arnold, J. — R for Data Science Exercise Solutions: Tidy Data. Link
What's Next?
Now that you can reshape data confidently, explore these related tutorials:
- Missing Values in R: Detect, Count, Remove & Impute NA — cleaned data often needs NA handling next
- dplyr Exercises (15 problems) — practise filtering, grouping, and joining after reshaping
- pivot_longer & pivot_wider Tutorial — review the parent tutorial if any exercise stumped you