tidyr replace_na() in R: Replace NA With a Value
The replace_na() function in tidyr replaces NA values in a vector or data frame with a specified value. For a vector, pass one replacement; for a data frame, pass a named list with column-specific replacements.
replace_na(x, 0) # vector: NA -> 0 replace_na(df, list(x = 0, y = "missing")) # data frame: per-col mutate(df, x = replace_na(x, 0)) # inside mutate mutate(df, across(where(is.numeric), ~ replace_na(., 0))) # all numerics coalesce(x, 0) # alternative: first non-NA ifelse(is.na(x), 0, x) # base R alternative df |> mutate(x = if_else(is.na(x), 0, x)) # dplyr if_else
Need explanation? Read on for examples and pitfalls.
What replace_na() does in one sentence
replace_na() swaps NA values in a vector or data frame for a value you provide. It is type-strict (replacement must match the column type) and pipeline-friendly.
For more flexible NA handling (e.g., conditional replacement, multi-column fallbacks, mean imputation), use coalesce(), case_when(), or mutate() with is.na() instead.
Syntax
For vectors: replace_na(vec, value). For data frames: replace_na(df, list(col1 = val1, col2 = val2)).
For a data frame, you pass a named list specifying per-column replacements:
replace_na(df, list(x = "zero")) errors if x is numeric. R is strict here; the replacement type must be compatible with the column type.Five common patterns
1. Replace NA in a vector
The simplest case. Single replacement value for a single vector.
2. Replace NA per-column in a data frame
Pass a named list. Columns NOT in the list keep their NAs.
3. Replace NA across many columns at once
across(where(is.numeric), ~ replace_na(., 0)) applies the replacement to every numeric column. Non-numeric columns (name) are untouched.
4. coalesce: first non-NA across columns
coalesce() returns the FIRST non-NA value across the listed vectors. Useful for fallback chains: try column A, then B, then a default.
5. Conditional replacement with case_when
case_when() lets you replace NAs based on OTHER columns or conditions. More flexible than replace_na().
replace_na() is for CONSTANT replacement. For dynamic or conditional replacement, use case_when() or mutate(if_else(is.na(x), ...)) instead. Replace_na fits a narrow but very common case: "fill NA with a specific value per column".replace_na() vs coalesce() vs ifelse(is.na(), ...)
| Approach | Best for | Notes |
|---|---|---|
replace_na(x, val) |
Constant value | Simple, type-strict |
coalesce(x, y, z) |
Fallback chain across cols | First non-NA wins |
ifelse(is.na(x), val, x) |
One-off in base R | Works without packages |
dplyr::if_else(is.na(x), val, x) |
One-off in dplyr | Type-strict like replace_na |
case_when() |
Conditional replacement | Flexible, multi-condition |
When to use which:
- Use
replace_na()for clean per-column constants. - Use
coalesce()when the replacement comes from another column. - Use
case_when()when replacement depends on conditions.
Common pitfalls
Pitfall 1: type mismatch errors. replace_na(df, list(x = 0)) errors if x is character. Cast first: mutate(x = as.numeric(x)) or use a string replacement: list(x = "0").
Pitfall 2: replacing with mean / median naively. replace_na(df, list(x = mean(df$x, na.rm = TRUE))) works but is brittle. For mean imputation: mutate(x = ifelse(is.na(x), mean(x, na.rm = TRUE), x)). Better: use the mice package for principled imputation.
Try it yourself
Try it: Replace NAs in airquality$Ozone with the mean Ozone value. Save the modified vector to ex_filled.
Click to reveal solution
Explanation: Compute the mean (excluding NAs) first, then replace_na(Ozone, mean_ozone) fills NA with that mean. The result has zero NAs. For real analysis, mean imputation is a starting point; consider mice::mice() for proper missing-data treatment.
Related tidyr / dplyr functions
After mastering replace_na, look at:
coalesce(): pick first non-NA across multiple columnsfill(): forward/backward-fill NAscomplete(): fill in missing combinationscase_when(),if_else(): conditional replacementdrop_na(): remove rows with NA (alternative to replacement)mice::mice(): principled multiple imputation
For time series, zoo::na.locf() (last observation carried forward) and imputeTS package handle NAs more sophisticatedly.
FAQ
How do I replace NA with 0 in R?
For a vector: replace_na(x, 0). For a data frame column: mutate(x = replace_na(x, 0)). For all numeric columns: mutate(across(where(is.numeric), ~ replace_na(., 0))).
What is the difference between replace_na and coalesce in R?
replace_na(x, value) replaces NA with a CONSTANT. coalesce(x, y, z) takes the FIRST non-NA across multiple inputs. Use replace_na for fixed defaults; coalesce for fallback chains.
How do I replace NA values per column in a data frame?
Pass a named list to replace_na: replace_na(df, list(x = 0, name = "unknown", date = as.Date("1970-01-01"))). Each column gets its own replacement value.
Can I impute the mean for NA values with replace_na?
Yes: mean_x <- mean(df$x, na.rm = TRUE); replace_na(df$x, mean_x). But this is naive. For real analysis with missing data, use mice for multiple imputation; mean replacement understates variability.
What replaces dplyr's deprecated replace_na?
replace_na() is from tidyr (not deprecated). dplyr re-exports it for convenience. Both tidyr::replace_na and dplyr::replace_na refer to the same function.