janitor remove_constant() in R: Drop Single-Value Columns
The remove_constant() function in janitor drops columns of a data frame where every row holds the same value. It clears zero-variance predictors before modeling, prunes leftover flag columns after a filter, and pairs with remove_empty() for a complete import cleanup.
remove_constant(df) # drop all single-value cols remove_constant(df, na.rm = TRUE) # ignore NA in the check remove_constant(df, quiet = FALSE) # print removed column names remove_constant(df, na.rm = TRUE, quiet = FALSE) # combine ignore NA and print df |> janitor::remove_constant() # pipe-friendly df |> remove_empty() |> remove_constant() # full import cleanup chain sapply(df, \(x) length(unique(x))) == 1 # manual constancy check remove_constant(df) |> ncol() # confirm shape after pruning
Need explanation? Read on for examples and pitfalls.
What remove_constant() does in one sentence
remove_constant() deletes any column whose values are all identical, returning a data frame with the same row count but fewer columns. It is the column-side companion to remove_empty(), targeting columns that carry zero information rather than columns that are blank.
The function pays off in three situations. After reading configuration-shaped data from a wide spreadsheet, you often see columns like survey_year = 2024 repeated on every row. After a filter() step, columns that defined the filter (status == "active") collapse to one value. Before fitting a linear model or running PCA, zero-variance predictors trigger numerical errors. One call handles all three.
Syntax
remove_constant() takes a data frame plus na.rm and quiet arguments. Both flags default to safe values for exploratory work.
The signature is short:
remove_constant(dat, na.rm = FALSE, quiet = TRUE)
Only dat is required. With defaults, NAs count as a distinct value and removed columns are not announced. Flip na.rm to ignore missing values, and flip quiet to print which columns left.
filter(df, region == "EU") makes region a constant in the result, but it stays in the data frame and clutters downstream summaries. A single remove_constant() after the filter prunes those obvious leftovers without naming them by hand.Six everyday use cases
1. Drop columns that are entirely the same value
The default call removes any all-same column in one pass. It does not modify the row count.
Three columns vanish: status, region, and archived each held a single value. id and signups survive because they vary.
2. Print which columns were dropped
Set quiet = FALSE to see a one-line report. Useful inside scripts where silent column loss is confusing.
3. Treat NA as a distinct value (default)
Mixed-NA columns survive the default check because NA counts as its own value. This matters when imported data has occasional missing cells.
always_one is dropped (one value, no NAs). one_or_na survives because NA and 1 differ. all_na survives because all values are NA, which is one value, but the check sees a single distinct NA.
4. Ignore NA in the check
na.rm = TRUE strips NA before measuring constancy. Columns with one non-NA value plus NAs get dropped.
With na.rm = TRUE, one_or_na becomes constant (only 1 remains), and all_na becomes constant (no values remain, which counts as zero distinct). Use this mode when NAs are noise, not signal.
5. Chain with remove_empty() for a full import cleanup
The pair handles both blank and constant pollution in one expression. This is the import-time recipe for spreadsheet exports.
Order matters slightly: remove_empty() first prunes all-NA columns so remove_constant() does not waste a pass on them.
6. Strip zero-variance predictors before modeling
lm(), glm(), and PCA reject columns with no variance. A pre-fit cleanup avoids cryptic errors deeper in the pipeline.
Without remove_constant(), the same lm() call would silently coerce constant_flag and inflate the design matrix, or fail outright in model.matrix().
remove_constant() as a standard preprocessing step, not an optional cleanup.remove_constant() vs select() vs remove_empty()
Each function targets a different kind of column problem. Picking the right one avoids over-pruning or leaving dirt behind.
| Function | Drops based on | Best for |
|---|---|---|
remove_constant(df) |
every value identical | post-filter cleanup, zero-variance pruning |
remove_empty(df, which = "cols") |
every value is NA | spreadsheet imports with blank padding |
select(df, -col) |
exact column name | you already know what to drop |
dplyr::select(df, where(...)) |
a predicate on the column | type or pattern-based filtering |
caret::nearZeroVar(df) |
low variance, not just constant | model preprocessing where 95% same counts |
remove_constant() is the strictest of these: only fully constant columns go. Use nearZeroVar() when 99.5% same-value is also a problem.
Common pitfalls
na.rm = FALSE is the default, so a column of mostly the same value mixed with NAs survives. Many import pipelines hit this trap: a column you expect to be constant carries one NA and remains. If you want NA-tolerant pruning, set na.rm = TRUE explicitly.A second trap is shape assumptions downstream. Code that hard-codes column indices (df[, 5]) breaks after remove_constant() shifts positions. Address columns by name to stay safe.
A third pitfall is grouped data frames. remove_constant() ungroups the result without warning. If you need the groups back, call group_by() again after the cleanup.
Try it yourself
Try it: Build a data frame with two varying columns and one column where every row holds "yes". Use remove_constant() to drop the constant, then confirm the result has 2 columns.
Click to reveal solution
Explanation: ex_flag repeats "yes" in every row, so remove_constant() drops it. The varying columns ex_id and ex_score are preserved.
Related janitor functions
- [
remove_empty()](janitor-remove_empty-in-R.html) drops fully NA rows and columns. - [
get_dupes()](janitor-get_dupes-in-R.html) surfaces duplicate rows before pruning. - [
clean_names()](janitor-clean_names-in-R.html) standardizes column names at import. - [
compare_df_cols()](janitor-compare_df_cols-in-R.html) compares column structures across two data frames. - For deeper background see the janitor package overview and the official tidyverse reference.
FAQ
Does remove_constant() work on character, factor, and logical columns the same way?
Yes. The constancy check uses length(unique(x)) on each column, which treats every type identically. A factor with one level present in the data is constant. A logical column of all FALSE is constant. A character column of repeated "A" is constant. The function does not look at factor level definitions, only the values actually present in the rows.
Why does a column of all NAs survive the default call?
Because na.rm = FALSE counts NA as a value. A column of c(NA, NA, NA) has one distinct value (NA), so it is technically constant. But the function's NA detection treats this as a borderline case that remove_empty() is better suited to handle. Use remove_empty(df, which = "cols") to strip all-NA columns explicitly.
Can remove_constant() be used before training a model to avoid zero-variance errors?
Yes, and this is one of its most common uses. lm(), glm(), and PCA all fail or warn on constant predictors because the design matrix becomes singular. Running remove_constant(df) before model.matrix() or any ~ . formula avoids the trap. For models that tolerate near-constant predictors but penalize them, caret::nearZeroVar() is the stricter alternative.
Does remove_constant() preserve tibble and data.table classes?
For tibbles, yes: the return is a tibble. For data.tables, the function coerces to a data.frame on return, which loses data.table semantics. If you need data.table preserved, copy the relevant columns explicitly with dt[, ..keep_cols] after computing which columns to keep.
Is there an option to keep certain constant columns?
No built-in argument exists. The workaround is to set the columns aside before the call and rejoin them after: keep <- df[, "must_keep"]; pruned <- remove_constant(df[, setdiff(names(df), "must_keep")]); bind_cols(keep, pruned). For a more flexible rule, write the constancy check by hand: df[, sapply(df, \(x) length(unique(x)) > 1 | names(df) %in% "must_keep")].