dplyr mutate() and rename(): Create & Modify Columns (8 Examples)
mutate() adds new columns or modifies existing ones while keeping all other columns. rename() changes column names without touching data. Combined with across() and case_when(), they handle every column transformation.
filter() and select() reduce your data — fewer rows, fewer columns. mutate() expands it — new columns computed from existing ones. It's the verb you reach for whenever you need a derived variable.
mutate(): Add New Columns
The new column is computed row by row. All existing columns are preserved.
**Explanation:** `mutate` creates `height_m` first, then uses it to compute `bmi`, then uses `bmi` in `case_when`. All in a single mutate call because columns are built left to right.
Exercise 2: Clean Column Names
Convert all iris column names to lowercase snake_case (e.g., Sepal.Length → sepal_length).
library(dplyr)
# Use rename_with with a custom function
**Explanation:** A named list of functions (`list(mean = ..., sd = ...)`) creates multiple output columns per input. `.names = "{.col}_{.fn}"` produces names like `Sepal.Length_mean`, `Sepal.Length_sd`.
Summary
Function
Purpose
Example
mutate(new = expr)
Add/modify column
mutate(bmi = wt / ht^2)
transmute(new = expr)
Create + drop others
transmute(bmi = wt / ht^2)
case_when(cond ~ val)
Conditional values
case_when(x > 0 ~ "pos")
across(cols, fn)
Apply to many columns
across(where(is.numeric), round)
rename(new = old)
Rename by name
rename(weight = wt)
rename_with(fn)
Rename by function
rename_with(tolower)
FAQ
What's the difference between mutate() and transmute()?
mutate() keeps all existing columns plus new ones. transmute() keeps only the columns you explicitly create. Use mutate() 95% of the time.
Can mutate() reference columns created in the same call?
Yes. Columns are created left to right: mutate(x = a + b, y = x * 2) works because x exists by the time y is evaluated.
How do I mutate conditionally — different logic per group?
Use group_by() before mutate(): df |> group_by(region) |> mutate(pct = sales / sum(sales)). Each group's sum(sales) is computed independently.
What replaced mutate_at, mutate_if, mutate_all?
across() replaced all three in dplyr 1.0. mutate_if(is.numeric, round) becomes mutate(across(where(is.numeric), round)). mutate_at(vars(x, y), log) becomes mutate(across(c(x, y), log)).