tidyr drop_na() in R: Remove Rows With NA Values

The drop_na() function in tidyr removes rows with any NA values. Specify columns to limit which NAs trigger row removal. It is the pipeline-friendly alternative to base R na.omit() and complete.cases().

By Selva Prabhakaran · Published May 12, 2026 · Last updated May 12, 2026

⚡ Quick Answer

drop_na(df)                              # remove rows with ANY NA
drop_na(df, x)                           # remove rows where x is NA
drop_na(df, x, y)                        # remove rows where x OR y is NA
drop_na(df, where(is.numeric))           # NA in any numeric column
drop_na(df, starts_with("score"))        # NA in score-prefixed cols
filter(df, complete.cases(df))           # equivalent base R
filter(df, !is.na(x))                    # one column, base-style condition

Need explanation? Read on for examples and pitfalls.

📊 Is drop_na() the right tool?

What drop_na() does in one sentence

drop_na() returns a data frame with rows removed wherever specified columns contain NA. Without arguments, it drops rows with NA in ANY column; with column names, it drops rows where those specific columns have NA.

This is one of the most-used data cleaning steps. Use it whenever an analysis cannot tolerate missing values and you have decided that complete-case analysis is acceptable.

Syntax

drop_na(data, ...) is the form. The ... argument selects columns to check for NA.

Run live

Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.

RBuild a small data frame with NAs

library(tidyr) library(dplyr) library(tibble) df <- tibble::tibble( id = 1:5, x = c(1, NA, 3, 4, NA), y = c("a", "b", NA, "d", "e") ) df |> drop_na() #> # A tibble: 2 x 3 #> id x y #> <int> <dbl> <chr> #> 1 1 1 a #> 2 4 4 d

Tip

drop_na() without arguments is conservative. It removes a row if ANY column has NA. For a more permissive drop ("only when these columns are NA"), name the columns: drop_na(df, x, y) ignores NAs in other columns.

Five common patterns

1. Drop rows with any NA

RStrict: keep only fully complete rows

df |> drop_na()

The most aggressive form. Equivalent to base R na.omit(df) or df[complete.cases(df), ].

2. Drop rows where one column is NA

ROnly require x to be non-NA

df |> drop_na(x) #> # A tibble: 3 x 3 #> id x y #> <int> <dbl> <chr> #> 1 1 1 a #> 2 3 3 NA #> 3 4 4 d

NAs in column y are KEPT; only rows with NA in x are dropped. Useful when only specific columns are required for the next step.

3. Drop based on multiple columns

RBoth x and y must be non-NA

df |> drop_na(x, y) #> # A tibble: 2 x 3 #> id x y #> <int> <dbl> <chr> #> 1 1 1 a #> 2 4 4 d

drop_na(x, y) removes rows where x OR y is NA.

4. Drop based on column predicate

RDrop rows with NA in any numeric column

df |> drop_na(where(is.numeric))

Tidyselect helpers like where(), starts_with(), matches() work inside drop_na for flexible column selection.

5. Drop in a pipeline

RDrop NAs as part of analysis chain

result <- mtcars |> dplyr::mutate(mpg = ifelse(mpg > 30, NA, mpg)) |> drop_na(mpg) |> dplyr::summarise(avg = mean(mpg)) result #> avg #> 1 18.79615

drop_na() integrates cleanly into dplyr pipelines, unlike na.omit() which can break flow.

Key Insight

Dropping rows with NA is one tool among many for missing data. It is appropriate when missing data is a small fraction of rows AND the missingness is "completely at random". For systematic missingness, dropping introduces BIAS. Consider imputation (mean, median, model-based) or analysis methods that handle NA directly (some regression libraries) instead.

drop_na() vs na.omit() vs complete.cases()

Function	Source	Pipeline-friendly	Column-specific
`drop_na()`	tidyr	Yes	Yes
`na.omit()`	base R	Awkward	No (all columns only)
`complete.cases()`	base R	Used inside filter()	Yes (specify columns in subset)

When to use which:

Use drop_na() in dplyr/tidyverse pipelines.
Use na.omit() for one-line scripts on the whole data frame.
Use complete.cases(df[, c("x", "y")]) inside filter() for very specific column subsets.

Common pitfalls

Pitfall 1: dropping more rows than expected. drop_na() without arguments removes rows with NA in ANY column. With many columns, this can drop most of your data. Always check nrow(df) before and after.

Pitfall 2: hiding the missingness pattern. Silently dropping rows with NA can make patterns invisible. Before dropping, summarize: summarise(across(everything(), ~ sum(is.na(.)))) shows how many NAs per column.

Warning

Dropping NA introduces bias when missingness is NOT random. If older respondents skip an income question, dropping their rows leaves a younger sample with different characteristics. Consider methods like multiple imputation (mice package) when missingness is systematic.

Try it yourself

Try it: From airquality (built-in dataset), drop rows where Ozone is NA. Save to ex_clean and report row count.

RYour turn: drop NA Ozone

# Try it: keep only complete Ozone rows ex_clean <- # your code here nrow(ex_clean) #> Expected: 116 (out of 153)

Click to reveal solution

RSolution

ex_clean <- tidyr::drop_na(airquality, Ozone) nrow(ex_clean) #> [1] 116

Explanation: drop_na(airquality, Ozone) keeps rows where Ozone is non-NA. Other columns may still have NAs (e.g., Solar.R) but those rows are kept. Result: 116 rows of 153 original.

After mastering drop_na, look at:

replace_na(): replace NA with a specific value
fill(): forward/backward-fill missing values
coalesce(): take first non-NA across columns
complete(): fill in implicit missing combinations
naniar::miss_var_summary(): NA summary by column
mice::mice(): multiple imputation for missing data

For "carry forward last observation" patterns common in time series, tidyr::fill() is the right tool, not drop_na().

FAQ

How do I remove rows with NA in R?

Use tidyr::drop_na(df) to remove rows with NA in ANY column. Use drop_na(df, x) to remove rows where only column x is NA. Base R alternatives: na.omit(df) (whole data frame) or df[complete.cases(df), ].

What is the difference between drop_na and na.omit?

drop_na() is from tidyr and works inside pipelines. It accepts column arguments to limit which NAs trigger row removal. na.omit() is base R, always drops rows with NA in any column, and is harder to chain in pipelines.

How do I drop NA rows for specific columns only?

Pass column names to drop_na: drop_na(df, x, y) removes rows where x OR y is NA. NAs in other columns are kept.

Should I drop NA rows or impute them?

Drop when missingness is small and completely random. Impute when missingness is systematic and you cannot afford to lose rows. For complex missingness, multiple imputation via mice package is the gold standard.

Can I use drop_na with tidyselect helpers?

Yes. drop_na(df, where(is.numeric)) drops rows where any numeric column is NA. drop_na(df, starts_with("score")) checks score-prefixed columns. All standard tidyselect helpers work.

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

tidyr drop_na() in R: Remove Rows With NA Values

What drop_na() does in one sentence

Syntax

Five common patterns

1. Drop rows with any NA

2. Drop rows where one column is NA

3. Drop based on multiple columns

4. Drop based on column predicate

5. Drop in a pipeline

drop_na() vs na.omit() vs complete.cases()

Common pitfalls

Try it yourself

FAQ

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

tidyr drop_na() in R: Remove Rows With NA Values

What drop_na() does in one sentence

Syntax

Five common patterns

1. Drop rows with any NA

2. Drop rows where one column is NA

3. Drop based on multiple columns

4. Drop based on column predicate

5. Drop in a pipeline

drop_na() vs na.omit() vs complete.cases()

Common pitfalls

Try it yourself

Related tidyr / dplyr functions

FAQ