dplyr mutate() in R: Create and Transform Columns
The mutate() function in dplyr adds new columns to a data frame or replaces existing ones. You write expressions that reference other columns, and mutate evaluates them and appends the result, keeping every original row.
mutate(df, kpl = mpg * 0.425) # new column from formula mutate(df, mpg = round(mpg, 1)) # replace existing column mutate(df, hp_class = if_else(hp > 150, "high", "low")) # conditional mutate(df, across(where(is.numeric), scale)) # apply to many cols mutate(df, rank = row_number(), .by = cyl) # within groups mutate(df, hp_z = (hp - mean(hp)) / sd(hp)) # z-score transmute(df, name, kpl = mpg * 0.425) # mutate + drop other cols
Need explanation? Read on for examples and pitfalls.
What mutate() does in one sentence
mutate() is a column adder. You give it a data frame plus one or more expressions of the form new_col = formula, and it returns the same data frame with those columns added (or replaced if the name already exists). Every row stays; only the column structure changes.
Unlike base R df$new_col <- ..., mutate works inside a pipeline, supports computing several columns in one call, and lets you reference columns you just created in the same expression chain.
Syntax
mutate() takes a data frame plus name-value column expressions. Use if_else(), case_when(), and across() for conditional and bulk transforms. Use .by to compute per group without leaving the data grouped.
The full signature is:
mutate(.data, ..., .by = NULL, .keep = "all", .before = NULL, .after = NULL)
.data is the data frame. The ... argument takes one or more name = expression pairs. .by groups for the duration of the call. .keep controls which existing columns to retain. .before and .after position new columns relative to existing ones.
mutate(df, a = mpg * 2, b = a + 1) works: the second expression sees a from the first. This is how you build chained transforms in one mutate call without intermediate variables.Seven common patterns
1. Create a new column from a formula
2. Replace an existing column
If the column name already exists, the new value overwrites it.
3. Conditional values with if_else()
if_else() is the type-strict dplyr alternative to base R ifelse(). The TRUE and FALSE branches must return the same type.
4. Multi-way branching with case_when()
case_when() evaluates conditions in order and returns the value of the first match. The trailing TRUE ~ default is the catch-all.
5. Apply a function to many columns with across()
across() paired with where() lets you transform many columns in one stroke. This is the modern replacement for mutate_at(), mutate_if(), and mutate_all().
6. Compute relative to a group with .by
.by is preferred over group_by() when the grouping is only needed for this single mutate call. Result is automatically ungrouped.
7. Drop other columns with transmute()
transmute() is mutate() plus select(): it returns only the columns you name in the call.
mutate() and across() together replace 90% of the legacy _at/_if/_all variants. If you see old code with mutate_if(df, is.numeric, scale), the modern equivalent is mutate(df, across(where(is.numeric), scale)). Same result, more composable, fewer functions to remember.mutate() vs base R column assignment
Base R uses <- for column assignment; mutate() uses = inside the function call. That is the surface difference. The deeper difference is composability: mutate slots into pipelines and supports many columns in one call.
| Task | dplyr | Base R |
|---|---|---|
| Add one column | mutate(df, y = a * 2) |
df$y <- df$a * 2 |
| Replace column | mutate(df, a = round(a, 1)) |
df$a <- round(df$a, 1) |
| Add multiple | mutate(df, y = a*2, z = b/3) |
two assignments |
| Conditional | mutate(df, y = if_else(a>0,"P","N")) |
df$y <- ifelse(df$a>0,"P","N") |
| Apply to many | mutate(df, across(where(is.numeric), scale)) |
df[nums] <- lapply(df[nums], scale) |
When to use which:
- Use
mutate()inside any dplyr pipeline. - Use base R
<-for one-off scripts or single-column updates without other tidyverse code.
Common pitfalls
Pitfall 1: column references vs string literals. mutate(df, y = "a") creates column y filled with the string "a", not a copy of column a. Use bare names to reference columns: mutate(df, y = a).
Pitfall 2: forgetting .by and getting wrong totals. mutate(df, pct = hp / sum(hp)) divides each hp by the total of all hp. To get per-cylinder percentages, add .by = cyl. Without grouping, sum(hp) is computed once over the whole column.
if_else() causes errors that ifelse() silently coerces. if_else(cond, "yes", 0) errors because TRUE returns character and FALSE returns numeric. Base R ifelse() would coerce; dplyr's if_else() refuses. This strictness catches bugs but surprises beginners. To opt out, use dplyr::if_else(..., missing = NA_character_) or fall back to ifelse().Pitfall 3: mutate() keeps all columns; transmute() drops them. If you want only the new columns and a few keepers, use transmute(). Mixing them up is a frequent source of "where did all my columns go" or "why are these columns still here" surprises.
Try it yourself
Try it: Add a new column mpg_per_cyl = mpg / cyl to mtcars. Save the result to ex_mtcars2 and print the first 3 rows of mpg, cyl, and the new column.
Click to reveal solution
Explanation: mutate() takes name = expression pairs. The expression mpg / cyl is evaluated row-wise (vectorized), and the result becomes a new column appended to every row.
Related dplyr functions
After mastering mutate(), look at:
transmute(): likemutate()but drops unlisted columnsrelocate(): move columns to a specific positionif_else(),case_when(),case_match(): vectorized conditional value generatorsacross(): apply a function to multiple columns insidemutate()orsummarise()rowwise()plusmutate(): row-by-row computation when you cannot vectorize
mutate() paired with lag(), lead(), cumsum(), cummean(), and row_number() covers most window-function needs in dplyr. For heavy time-series work, also check the slider package.
FAQ
How do I add multiple columns at once with mutate?
List them comma-separated: mutate(df, x = a + b, y = a * b, z = a / b). Each new column is visible to the next expression, so you can build chains in one call.
What is the difference between mutate and transmute in dplyr?
mutate() keeps every original column and adds new ones. transmute() returns ONLY the columns you name in the call, dropping the rest. Use transmute() when the next step only needs the derived columns.
How do I create a conditional column in dplyr?
For two outcomes use if_else(condition, true_value, false_value). For three or more outcomes use case_when() with one condition per branch and a trailing TRUE ~ default. Both work inside mutate().
Can I use mutate to update columns conditionally?
Yes. mutate(df, x = if_else(x < 0, 0, x)) replaces negative values with 0. Or with case_when for richer logic: mutate(df, x = case_when(x < 0 ~ 0, x > 100 ~ 100, TRUE ~ x)).
How do I add a column based on row position?
Use row_number(): mutate(df, rownum = row_number()). To rank within groups: mutate(df, rank = row_number(), .by = group). For absolute index of original row, do this BEFORE any filter or arrange.