tidyr fill() in R: Forward-Fill Missing Values
The fill() function in tidyr fills NA values in a column with the most recent non-NA value above (or below). It is the "last observation carried forward" (LOCF) operation, common in time-series data.
df |> fill(col) # forward-fill (default down) df |> fill(col, .direction = "up") # backward-fill df |> fill(col, .direction = "downup") # both directions df |> fill(c(col1, col2)) # multiple columns df |> group_by(g) |> fill(col) # per-group fill
Need explanation? Read on for examples and pitfalls.
What fill() does in one sentence
fill(data, ..., .direction = "down") replaces NAs in the named columns with the most recent NON-NA value above (or below). Default direction is "down" (top to bottom).
The classic use case: time-series with sparse observations where each NA should inherit the previous value.
Syntax
fill(data, ..., .direction = c("down","up","downup","updown")). ... is the columns to fill.
Five common patterns
1. Forward fill (default)
2. Backward fill
3. Both directions
4. Multiple columns
5. Per-group
Crucial for time-series with multiple subjects: fill should reset at each user boundary.
fill() is the tidyverse name for LOCF (last observation carried forward). Common in time-series, sensor data, and "spreadsheet category headers" that span multiple rows. For per-subject time-series, ALWAYS group_by before fill.fill() vs replace_na() vs coalesce()
Three NA-handling functions in tidyr/dplyr.
| Function | Behavior | Best for |
|---|---|---|
tidyr::fill() |
Carry forward / backward | Time-series LOCF |
tidyr::replace_na() |
Replace NA with constant | Default value |
dplyr::coalesce() |
First non-NA across columns | Multi-source fallback |
dplyr::na_if() |
Replace specific value with NA | Sentinel cleanup |
When to use which:
- fill for sequential / time-series fill.
- replace_na for "if NA then X" (constant).
- coalesce for multi-source fallback.
- na_if for sentinel values.
A practical workflow
The "carry forward state" pattern is fill's main use.
Per user, in chronological order, NA states inherit the previous known state. Without fill, state changes appear as NA between events.
For monthly reporting where the category column is only on the first row of each group:
Common pitfalls
Pitfall 1: fill across group boundaries. Without group_by, fill carries forward ACROSS groups. Always group_by before fill on grouped data.
Pitfall 2: leading NAs. Default direction "down" can't fill the first row if it's NA. Use ".direction = "downup"" to handle leading NAs.
fill() does NOT verify whether filling makes semantic sense. Carrying forward a stale value may be wrong if the data is "truly missing" (not "same as previous"). Validate with domain knowledge.Try it yourself
Try it: Forward-fill the name column in a sparse dataset, grouped by user. Save to ex_filled.
Click to reveal solution
Explanation: group_by ensures fill resets at each user boundary; name is carried forward within each user.
Related tidyr / dplyr functions
After mastering fill, look at:
tidyr::replace_na(): replace NA with constantdplyr::coalesce(): multi-source NA filltidyr::complete(): fill missing row combinationsdplyr::lag()/lead(): row-shift comparisonsdplyr::case_when(): conditional fill logic
For Excel-style "merged cell" data, fill is the standard import-cleanup tool.
FAQ
What does fill do in tidyr?
fill(data, col) replaces NA values in col with the most recent non-NA value above (default direction "down"). Used for last-observation-carried-forward (LOCF) imputation.
How do I fill NAs from below in tidyr?
Pass .direction = "up": fill(df, col, .direction = "up"). Each NA inherits the next non-NA value.
Should I fill before or after group_by?
After. group_by(g) |> fill(col) fills WITHIN each group. Without grouping, fill crosses boundaries.
What is the difference between fill and replace_na?
fill carries forward (or backward) the most recent non-NA value. replace_na uses a CONSTANT value. Different semantics for missing data.
Does fill modify the data in place?
No. It returns a new data frame. Always assign the result.