janitor clean_names() in R: Standardize Messy Column Names
The clean_names() function in janitor rewrites every column name in a data frame to clean snake_case in one call. It strips special characters, collapses whitespace, lowercases letters, and guarantees syntactic names, ideal for taming output from Excel, SAS, or messy CSV files.
clean_names(df) # snake_case default df |> janitor::clean_names() # pipe-friendly clean_names(df, case = "small_camel") # lowerCamelCase clean_names(df, case = "screaming_snake") # ALL_CAPS_SNAKE clean_names(df, replace = c("%" = "pct")) # custom replace clean_names(df, ascii = FALSE) # keep non-ASCII chars make_clean_names(c("Var 1", "Var 2")) # clean a char vector
Need explanation? Read on for examples and pitfalls.
What clean_names() does in one sentence
clean_names() rewrites every column name in a data frame so the result is lowercase snake_case and safe to use in R code. It removes punctuation, replaces spaces and dots with underscores, transliterates non-ASCII characters, deduplicates collisions, and returns the same data with new names.
This solves the recurring pain of working with files where column names contain spaces, percent signs, parentheses, or unicode characters. After clean_names(), you can write df$some_col without backticks.
Syntax
clean_names() takes a data frame plus optional case and replace arguments. All work happens on the names; the data values are untouched.
The full signature:
clean_names(dat, case = "snake", replace = c(...), ascii = TRUE,
use_make_names = TRUE, parsing_option = 1, ...)
Only dat is required. The defaults produce snake_case names from arbitrary input, which is the right answer for most data-import workflows.
clean_names() right after every import call. Make read_csv("file.csv") |> clean_names() a reflex. Standardized names from the first step prevent silent bugs later when filter() or select() expect a column whose original spelling has a stray space or different capitalization in production data.Six common patterns
1. Snake_case the entire data frame
Spaces become underscores. Parentheses are dropped. % auto-expands to percent. The result is lowercase snake_case across the board.
2. Switch to camelCase or another case style
Valid case values include "snake" (default), "small_camel", "big_camel", "screaming_snake", "all_caps", "lower_upper", "upper_lower", "title", and "sentence". Pick one and stick with it across a project for consistency.
3. Custom token replacements
replace is a named character vector: names are regex patterns, values are replacements. Patterns run BEFORE case conversion. Use this to preserve meaning that the default would otherwise drop, such as keeping $ as usd or % as pct.
4. Clean a character vector with make_clean_names()
make_clean_names() runs the same cleanup logic on a plain character vector. Use it when you need clean names for column assignment, list elements, or factor levels outside the context of a data frame.
5. Handle non-ASCII characters
ascii = TRUE (default) transliterates accents to plain ASCII letters. ascii = FALSE preserves the original characters. ASCII is safer for code and SQL exports; non-ASCII can break legacy systems but reads better in localized reports.
6. Deduplicate colliding names automatically
When two original names normalize to the same string, janitor appends _2, _3, and so on. No silent overwrites; every column survives with a unique label.
clean_names() is destructive of formatting but lossless on data. The original column names are replaced, not preserved as attributes. If you need the raw names later for a report header, copy them with original_names <- names(df) BEFORE calling clean_names(). The data values themselves are never touched.clean_names() vs rename_with() vs make.names()
Three tools sit in this space; pick by how much control you need.
| Task | clean_names() |
rename_with() |
make.names() |
|---|---|---|---|
| Bulk snake_case messy names | clean_names(df) |
rename_with(df, ~ tolower(gsub("[^A-Za-z0-9]+", "_", .))) |
setNames(df, make.names(names(df))) |
| Enforce camelCase | clean_names(df, case = "small_camel") |
custom lambda | not supported |
| Drop accents | clean_names(df, ascii = TRUE) |
needs stringi::stri_trans_general |
no |
| Guarantee unique names | always | manual | with unique = TRUE |
| Cross-package opinion | janitor convention | yours | base R minimum |
When to use which:
- Use
clean_names()for any data-import workflow. One line, opinionated defaults, no thinking required. - Use
rename_with()when you want a specific custom transform across many columns. - Use
make.names()when you only need syntactic legality and want zero extra dependencies.
df.columns = df.columns.str.replace(r"\W+", "_", regex=True).str.lower(). No single pandas function ships with janitor's full feature set; the pyjanitor package ports clean_names() for those who want the same behavior.Common pitfalls
Pitfall 1: forgetting to assign the result. clean_names(df) does NOT modify df in place. Capture the result: df <- clean_names(df) or apply it inside a pipeline.
Pitfall 2: order matters with replace. replace = c("%" = "pct") runs before case conversion, so pct gets case-folded by default. To keep an exact token like PCT, write replace = c("%" = "PCT") and pass case = "none", otherwise janitor lowers it.
df$"Revenue ($)", calling clean_names() rewrites that column to revenue. Scripts that referenced the old names raise "unknown column" errors. Rename consistently at import time, then commit to the cleaned names throughout the analysis.Pitfall 3: empty names become x. A column with name "" (blank) becomes x. If multiple blank columns exist, you get x, x_2, x_3. Inspect the result after cleaning to confirm names are meaningful, especially when reading spreadsheets with blank header cells.
Try it yourself
Try it: Take airquality and use clean_names() with case = "screaming_snake" to get ALL-CAPS names. Save the result to ex_aq and print the new names.
Click to reveal solution
Explanation: Passing case = "screaming_snake" tells janitor to produce uppercase tokens joined by underscores. The data values are unchanged; only the column labels are rewritten.
Related janitor functions
After mastering clean_names(), look at:
make_clean_names(): the same cleanup logic for a character vector instead of a data frameremove_empty(): drop empty rows or columns, often used right afterclean_names()tabyl(): tidy frequency tables, frequently the next step after data importget_dupes(): find duplicated rows by one or more columnsdplyr::rename_with(): roll your own transform when janitor's defaults do not fit
For a fuller tour of the janitor package, see the janitor package guide. The package's official reference site is sfirke.github.io/janitor.
FAQ
What does janitor clean_names() do?
clean_names() rewrites the column names of a data frame to a clean, consistent style (snake_case by default). It removes punctuation, replaces spaces with underscores, transliterates accented characters, and guarantees every name is unique. The data values are not modified; only the column labels change, which makes the function safe to drop into any import pipeline.
How do I use clean_names() with read_csv()?
Pipe it directly: readr::read_csv("file.csv") |> janitor::clean_names(). This pattern standardizes column names the moment the data lands in R, so every downstream step sees consistent identifiers. Many R analysts treat this two-line idiom as the default import recipe and never reference the raw column names afterward.
Can clean_names() produce camelCase instead of snake_case?
Yes. Pass case = "small_camel" for firstName, case = "big_camel" for FirstName, or other supported styles like "screaming_snake" or "all_caps". Run ?janitor::clean_names to see the full list, then pick one and apply it consistently across your project. Mixing case styles between scripts is a common source of subtle bugs.
Why does clean_names() rename two columns to the same base name?
When two original names normalize to the same string (for example "Total $" and "Total %" both reduce to total), janitor appends a numeric suffix: the first stays total, the second becomes total_2. This guarantees uniqueness without losing data, so always inspect names after cleaning to confirm the auto-suffix matches your intent.
Does clean_names() work with tibbles and data.table?
Yes. clean_names() operates on any object that has a names() method and preserves the input class. Tibbles stay tibbles, data.tables stay data.tables, plain data frames stay plain data frames. The cleaning logic is identical across types, which makes the function safe in mixed pipelines that move between tidyverse and data.table workflows.