tidyr drop_na() in R: Remove Rows With NA Values
The drop_na() function in tidyr removes rows with any NA values. Specify columns to limit which NAs trigger row removal. It is the pipeline-friendly alternative to base R na.omit() and complete.cases().
drop_na(df) # remove rows with ANY NA drop_na(df, x) # remove rows where x is NA drop_na(df, x, y) # remove rows where x OR y is NA drop_na(df, where(is.numeric)) # NA in any numeric column drop_na(df, starts_with("score")) # NA in score-prefixed cols filter(df, complete.cases(df)) # equivalent base R filter(df, !is.na(x)) # one column, base-style condition
Need explanation? Read on for examples and pitfalls.
What drop_na() does in one sentence
drop_na() returns a data frame with rows removed wherever specified columns contain NA. Without arguments, it drops rows with NA in ANY column; with column names, it drops rows where those specific columns have NA.
This is one of the most-used data cleaning steps. Use it whenever an analysis cannot tolerate missing values and you have decided that complete-case analysis is acceptable.
Syntax
drop_na(data, ...) is the form. The ... argument selects columns to check for NA.
drop_na() without arguments is conservative. It removes a row if ANY column has NA. For a more permissive drop ("only when these columns are NA"), name the columns: drop_na(df, x, y) ignores NAs in other columns.Five common patterns
1. Drop rows with any NA
The most aggressive form. Equivalent to base R na.omit(df) or df[complete.cases(df), ].
2. Drop rows where one column is NA
NAs in column y are KEPT; only rows with NA in x are dropped. Useful when only specific columns are required for the next step.
3. Drop based on multiple columns
drop_na(x, y) removes rows where x OR y is NA.
4. Drop based on column predicate
Tidyselect helpers like where(), starts_with(), matches() work inside drop_na for flexible column selection.
5. Drop in a pipeline
drop_na() integrates cleanly into dplyr pipelines, unlike na.omit() which can break flow.
drop_na() vs na.omit() vs complete.cases()
| Function | Source | Pipeline-friendly | Column-specific |
|---|---|---|---|
drop_na() |
tidyr | Yes | Yes |
na.omit() |
base R | Awkward | No (all columns only) |
complete.cases() |
base R | Used inside filter() | Yes (specify columns in subset) |
When to use which:
- Use
drop_na()in dplyr/tidyverse pipelines. - Use
na.omit()for one-line scripts on the whole data frame. - Use
complete.cases(df[, c("x", "y")])insidefilter()for very specific column subsets.
Common pitfalls
Pitfall 1: dropping more rows than expected. drop_na() without arguments removes rows with NA in ANY column. With many columns, this can drop most of your data. Always check nrow(df) before and after.
Pitfall 2: hiding the missingness pattern. Silently dropping rows with NA can make patterns invisible. Before dropping, summarize: summarise(across(everything(), ~ sum(is.na(.)))) shows how many NAs per column.
mice package) when missingness is systematic.Try it yourself
Try it: From airquality (built-in dataset), drop rows where Ozone is NA. Save to ex_clean and report row count.
Click to reveal solution
Explanation: drop_na(airquality, Ozone) keeps rows where Ozone is non-NA. Other columns may still have NAs (e.g., Solar.R) but those rows are kept. Result: 116 rows of 153 original.
Related tidyr / dplyr functions
After mastering drop_na, look at:
replace_na(): replace NA with a specific valuefill(): forward/backward-fill missing valuescoalesce(): take first non-NA across columnscomplete(): fill in implicit missing combinationsnaniar::miss_var_summary(): NA summary by columnmice::mice(): multiple imputation for missing data
For "carry forward last observation" patterns common in time series, tidyr::fill() is the right tool, not drop_na().
FAQ
How do I remove rows with NA in R?
Use tidyr::drop_na(df) to remove rows with NA in ANY column. Use drop_na(df, x) to remove rows where only column x is NA. Base R alternatives: na.omit(df) (whole data frame) or df[complete.cases(df), ].
What is the difference between drop_na and na.omit?
drop_na() is from tidyr and works inside pipelines. It accepts column arguments to limit which NAs trigger row removal. na.omit() is base R, always drops rows with NA in any column, and is harder to chain in pipelines.
How do I drop NA rows for specific columns only?
Pass column names to drop_na: drop_na(df, x, y) removes rows where x OR y is NA. NAs in other columns are kept.
Should I drop NA rows or impute them?
Drop when missingness is small and completely random. Impute when missingness is systematic and you cannot afford to lose rows. For complex missingness, multiple imputation via mice package is the gold standard.
Can I use drop_na with tidyselect helpers?
Yes. drop_na(df, where(is.numeric)) drops rows where any numeric column is NA. drop_na(df, starts_with("score")) checks score-prefixed columns. All standard tidyselect helpers work.