pivot_longer() and pivot_wider(): Reshape Data in R Without Losing Your Mind
pivot_longer() stacks several columns into a single name-value pair, moving data from wide to long. pivot_wider() does the opposite, spreading one column's values across new columns. They are inverses of each other.
Reshaping data is one of those R skills that feels hard until you see it once. Then it clicks forever. Most real datasets arrive in a shape that does not match what you need for plotting or modeling. This post teaches the two tidyr functions that handle every reshape job you will meet.
Introduction
The core problem is simple. Humans like wide tables because they are compact and easy to read. Software likes long tables because each row is a single observation. You will spend a real fraction of your analysis time moving between the two shapes.
The tidyr package ships pivot_longer() and pivot_wider() as the modern tools for this job. They replaced the older gather() and spread() in 2019 with clearer argument names and better handling of edge cases. If you still see gather() in old tutorials, update the syntax; the tidyverse team has retired it.
You do not need to install anything to follow along. Every code block on this page runs directly in your browser. Click Run on the first block to load the libraries, then work through the rest top to bottom. Variables carry over between blocks, just like a notebook.
What is the difference between wide and long format?
Wide and long are two shapes that can hold the exact same information. The shape you pick depends on what you want to do next. Here is the same student-test-score dataset shown both ways.

Figure 1: Wide and long are two shapes for the same data.
Wide format puts one row per subject, with measurements spread across columns. Long format puts one row per measurement, with a key column saying what was measured. Let's build a small wide table and see it live.
Three students, three subjects, nine numbers. The wide table has 3 rows and 4 columns. The same nine numbers in long format would need 9 rows and only 3 columns: student, subject, score. Same data, different shape.
How does pivot_longer() turn wide data into long?
pivot_longer() needs you to answer three questions: which columns should be stacked, what should the new name column be called, and what should the new value column be called. That is the whole function.

Figure 2: pivot_longer() needs three answers: which columns, a name column, a value column.
Let's stack the three subject columns into one subject column and one score column.
The table went from 3 rows to 9, exactly one row per student-subject pair. The column headers math, english, and science became values inside the new subject column. The cell numbers became values inside the new score column. The student column was untouched because we did not list it in cols.
Selecting cols with helpers
Listing every column by name gets tiresome fast. You can use tidyselect helpers inside cols, just like inside select().
-student is the cleanest idiom for "stack everything except the id column". You can also use starts_with("score_"), ends_with("_2024"), matches("^q\\d+$"), or where(is.numeric) to pick columns by pattern or type.
How does pivot_wider() turn long data into wide?
pivot_wider() is the inverse. It takes one column full of names and spreads those names across new columns, filling each new column with values from a second column. Two arguments do the work.

Figure 3: pivot_wider() needs to know the identifier, the name source, and the value source.
Let's take the long_scores we just built and spread it back out.
Same shape as the original wide_scores. pivot_wider() read the distinct values in subject ("math", "english", "science") and created one column per value. Then it read score and dropped each number into the correct cell.
You did not need to say which columns identify each row. pivot_wider() assumes any column not mentioned in names_from or values_from is an identifier. Here that means student. If you want to be explicit, pass id_cols = student.
How do you pivot multiple value columns at once?
Real column names often pack two pieces of information together, like avg_math_2023 or billboard_wk1. pivot_longer() can split those names into two columns in one step with names_sep or names_pattern.
Imagine a quarterly sales file. Each column name is region_Q<n>.
Passing a character vector to names_to tells pivot_longer() to split each old column name into that many parts. names_sep = "_" does the split. You went from 4 value columns to 2 tidy key columns (region, quarter) plus one value column (sales), all in a single call.
If the separator is not a fixed character, use names_pattern with a regex. For something like sales2023q1, you would write names_pattern = "sales(\\d{4})q(\\d)".
How do you control the pivot with names_prefix, names_sep, and values_fill?
A few extra arguments clean up messy real-world pivots. Here are the three you will reach for the most.
**names_prefix** strips a common prefix from column names during pivot_longer(). **values_fill** plugs holes during pivot_wider(). **names_sep** glues parts together during pivot_wider() when you have multiple name sources.
Store B has no February row in the long input, so pivot_wider() would return NA by default. values_fill = 0 replaces every missing cell with zero, which is what you want for count data. Use values_fill = list(sales = 0, visits = 0) when you have several value columns and want different fills per column.
Common Mistakes and How to Fix Them
Mistake 1: Forgetting to quote names_to and values_to
Wrong:
Why it is wrong: names_to and values_to create brand-new columns. The new names do not exist yet, so R cannot evaluate subject as a symbol. They must be passed as strings.
Correct:
Mistake 2: Duplicate keys silently create list-columns
Wrong:
Why it is wrong: Two rows have id = 1, key = "a", so pivot_wider() does not know whether cell [1, "a"] should be 10 or 20. It keeps both in a list.
Correct: Decide how to combine duplicates first.
Mistake 3: Pivoting columns with mixed types
Wrong: Stacking numeric and character columns into one value column.
Why it is wrong: A single column can hold only one type. pivot_longer() coerces everything to the broadest type, which is character if any column is text. You lose numeric operations on the score.
Correct: Pivot numeric columns and character columns separately, or keep the identifier columns out of the pivot.
Mistake 4: Using cols = everything() and losing id columns
Wrong:
Why it is wrong: everything() includes the identifier column. You cannot tell which score came from which student anymore.
Correct: Always exclude the id columns.
Practice Exercises
Exercise 1: Stack year columns
The billboard dataset (built into tidyr) holds weekly chart ranks. Pivot the wk1 through wk76 columns into one week column and one rank column, then drop missing ranks.
Click to reveal solution
Explanation: starts_with("wk") picks all 76 week columns in one go. values_drop_na = TRUE strips rows where the song had dropped off the chart (most rows, actually).
Exercise 2: Spread a summary table
Start from the long_scores you made earlier. Compute the average score per subject across all students, then pivot to one row per subject-with-average into a single-row wide table where each subject is a column.
Click to reveal solution
Explanation: Summarise first to collapse each subject to one row, then pivot to spread the subject names across columns. The result is a one-row tibble where each column is a subject's class average.
Exercise 3: Split packed column names
Build this wide tibble and pivot it so that the region and year pieces of each column name become their own columns, alongside a temp value column.
Click to reveal solution
Explanation: A two-element names_to plus names_sep = "_" tells pivot_longer() to split each old name into two parts at the underscore. One call turned four packed columns into three clean ones.
Exercise 4: Round trip
Take wide_scores, pivot to long, round every score down to the nearest 10, pivot back to wide, and confirm you get the rounded version of the original.
Click to reveal solution
Explanation: The long shape makes the rounding operation trivial; one mutate() hits every score. Without the pivot, you would have to repeat the rounding for each of the three subject columns. This go-long-then-go-wide pattern is the most common reason you will reach for these functions.
Complete Example
Here is a realistic end-to-end flow. A school has test scores in wide format, you want per-student averages and per-subject averages, and a wide-format report at the end.
The pivot_longer step did the real work. Once the data was long, group_by() |> summarise() became a one-liner. A left_join() glued the averages back onto the original wide table, giving a report-ready output with one row per student.
Summary
| Function | Direction | Key arguments | Result |
|---|---|---|---|
| pivot_longer() | Wide to long | cols, names_to, values_to | More rows, fewer columns |
| pivot_wider() | Long to wide | names_from, values_from, id_cols | Fewer rows, more columns |
| names_sep | Split old names | character or regex | Multiple new name columns |
| values_fill | Plug holes | value or named list | No NA cells after widening |
| values_drop_na | Prune empties | TRUE/FALSE | Drop NA rows after longer |
- Long format is what ggplot2 and group_by() expect; go long first when in doubt.
names_toandvalues_tomust be quoted strings because they name new columns.colsaccepts every tidyselect helper you know fromselect().- Duplicate keys during
pivot_wider()create list-columns; aggregate first.
FAQ
When should I use pivot_longer() instead of pivot_wider()? Use pivot_longer() before any group-wise calculation, plot, or model fit. Most tidyverse tools need one-row-per-observation. Use pivot_wider() only at the end, when you need a human-readable report or a matrix for linear algebra.
What replaced gather() and spread()? pivot_longer() replaced gather() and pivot_wider() replaced spread() in tidyr 1.0 (2019). The new functions have clearer argument names and handle multi-column names that the old functions could not.
Can I pivot multiple value columns at once? Yes. Pass values_from = c(col1, col2) to pivot_wider() to create paired output columns like col1_X, col2_X, and so on. For pivot_longer(), use a .value placeholder in names_to together with names_sep to split column names into a stub and a suffix.
Why did my values column turn into a list-column? Your long data had duplicate keys: two or more rows share the same names_from value and the same id columns. pivot_wider() packs all candidates into a list for you to inspect. Aggregate with group_by() |> summarise() or pass values_fn = sum to pivot_wider() to collapse them first.
References
- tidyr documentation - pivot_longer() reference. Link
- tidyr documentation - pivot_wider() reference. Link
- tidyr Pivoting vignette. Link
- Wickham, H., Cetinkaya-Rundel, M., Grolemund, G. - R for Data Science, 2nd Edition. Chapter 6: Data Tidying. Link
- Wickham, H. - "Tidy Data" - Journal of Statistical Software (2014). Link
- tidyr 1.0.0 release notes. Link
What's Next?
- Tidy Data in R - understand the three rules of tidy data that make reshaping meaningful.
- dplyr group_by() and summarise() - the next step after going long: collapse long data into summaries.
- R Joins with dplyr - join the reshaped output back onto lookup tables with inner_join() and left_join().