janitor make_clean_names() in R: Clean Column Name Vectors

The make_clean_names() function in janitor takes any character vector and returns a clean snake_case version, ideal for naming list elements, factor levels, or columns built outside a data frame. It is the vector-level companion to clean_names().

⚡ Quick Answer
make_clean_names(c("First Name", "Age"))             # default snake_case
make_clean_names(x, case = "small_camel")            # lowerCamelCase output
make_clean_names(x, case = "screaming_snake")        # SCREAMING_SNAKE output
make_clean_names(x, replace = c("%" = "pct"))        # custom token replacement
make_clean_names(x, allow_dupes = TRUE)              # keep duplicates as-is
setNames(my_list, make_clean_names(names(my_list)))  # rename list elements
make_clean_names(levels(my_factor))                  # cleaned factor labels

Need explanation? Read on for examples and pitfalls.

📊 Is make_clean_names() the right tool?
STARTclean a plain character vectormake_clean_names(x)clean every name in a data frameclean_names(df)need syntactic names without janitor opinionsmake.names(x, unique = TRUE)rename list elements after cleaningsetNames(lst, make_clean_names(names(lst)))clean factor levelsmake_clean_names(levels(f))preserve duplicates without _2, _3 suffixesmake_clean_names(x, allow_dupes = TRUE)enforce camelCase or all-caps outputmake_clean_names(x, case = "small_camel")

What make_clean_names() does in one sentence

The make_clean_names function takes a character vector and returns a snake_case, syntactically safe version of every element. It removes punctuation, replaces spaces and dots with underscores, transliterates accented characters, and appends numeric suffixes to any collisions so the output vector is unique by default.

The function operates on plain vectors, which makes it the right tool when names live outside a data frame: as keys in a list, levels of a factor, sheet labels from Excel, or strings you intend to assign to columns later.

Syntax

make_clean_names() accepts a character vector plus the same options as clean_names(). The signature is identical apart from the leading argument.

Run live
Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.
RBasic vector cleanup with janitor
library(janitor) raw <- c("First Name", "Age (yrs)", "Weight %", "X-1") make_clean_names(raw) #> [1] "first_name" "age_yrs" "weight_percent" "x_1"

  

The full signature:

make_clean_names(string, case = "snake", replace = c(...), ascii = TRUE,
                 use_make_names = TRUE, allow_dupes = FALSE,
                 sep_in = NULL, transliterations = "Latin-ASCII",
                 parsing_option = 1, numerals = "asis")

Only string is required. The defaults mirror clean_names() exactly, so output between the two functions is consistent on the same input.

Tip
Reach for make_clean_names() whenever names are NOT yet attached to a data frame. Use it on character vectors, list keys, factor levels, or sheet labels. Once names are part of a data frame, clean_names(df) is shorter and conveys intent more clearly to readers of your code.

Five common patterns

1. Clean a raw character vector

RClean a messy vector of labels
labels <- c("Total Sales ($)", "Q1 Growth %", "Net.Margin", "Café") make_clean_names(labels) #> [1] "total_sales" "q1_growth_percent" "net_margin" "cafe"

  

Spaces and dots become underscores. Currency symbols drop out. The percent sign expands to the word percent. Accented characters transliterate to plain ASCII letters. The vector returns in the same order it came in, ready for any downstream use such as setting column names or building a lookup table where labels must be syntactically legal R identifiers.

2. Apply across list element names

RRename list elements with setNames
results <- list(`Run 1` = 0.92, `Run 2` = 0.88, `Final Test` = 0.95) results <- setNames(results, make_clean_names(names(results))) names(results) #> [1] "run_1" "run_2" "final_test"

  

This pattern works for any named object: lists, named numeric vectors, or environments converted via as.list. The data values stay put; only the labels change. It is the cleanest way to standardize keys before serializing to JSON, writing to a database, or merging result sets where one set arrived from Excel and another from a tidy table. Inconsistent keys quietly produce missing joins, and one defensive call avoids the entire class of bug.

3. Clean factor levels in place

RStandardize factor levels
f <- factor(c("Group A", "Group B", "Group A", "Group C")) levels(f) <- make_clean_names(levels(f)) levels(f) #> [1] "group_a" "group_b" "group_c"

  

Assigning to the levels function rewrites the labels without changing which level each observation maps to. Factor counts and order are preserved. This avoids the trap of using as.character and then re-factoring, which can silently drop unused levels when a level has zero observations in the current slice.

4. Build column names before assignment

RGenerate column names for a new data frame
metrics <- c("MAE (train)", "MAE (test)", "R^2 (train)", "R^2 (test)") df <- data.frame(matrix(runif(4), nrow = 1)) names(df) <- make_clean_names(metrics) names(df) #> [1] "mae_train" "mae_test" "r_2_train" "r_2_test"

  

When you build a data frame from a matrix or from a call such as do.call rbind, you often hold the desired column labels in a separate vector. Run them through the cleaner before assignment so the result is immediately syntactic and free of backticks. This is the standard approach when you generate labels programmatically, for example concatenating a metric name with a split label, and want consistent output regardless of how the source strings were spaced.

5. Allow duplicates when sources may overlap

RKeep duplicates without numeric suffixes
ids <- c("Site A", "Site B", "Site A", "Site C") make_clean_names(ids) #> [1] "site_a" "site_b" "site_a_2" "site_c" make_clean_names(ids, allow_dupes = TRUE) #> [1] "site_a" "site_b" "site_a" "site_c"

  

Default behavior appends a numeric suffix such as _2 and _3 to enforce uniqueness, which is the right answer for column names. Set the allow_dupes argument to TRUE when collisions are meaningful, such as a long-form factor where the same group appears across rows, or a merge key that must keep its original value so a downstream join lands on the correct records.

Key Insight
make_clean_names() is clean_names() with the data frame layer removed. Treat them as the same engine exposed at two levels: one operates on df, the other on names(df). Whichever shape your data is in, the cleanup rules are identical.

make_clean_names() vs clean_names() vs make.names()

Three tools cover the same space; choose by what you are holding.

Task make_clean_names() clean_names() make.names()
Input type character vector data frame character vector
Returns cleaned vector data frame with new names minimally legal names
Snake_case by default yes yes no (legal only)
Drops accents yes (ascii = TRUE) yes no
Custom replacements yes (replace = ...) yes no
Guarantees uniqueness yes (or allow_dupes) yes with unique = TRUE
Dependency janitor janitor base R

When to use which:

  • Vector-level cleaner: any character vector you will assign as names later.
  • Data frame cleaner: names already attached to a tabular object, one-liner returning the same shape.
  • Base R alternative: when you cannot add janitor and only need legal syntactic names.
Note
Coming from Python pandas? The closest standalone equivalent is re.sub(r"\W+", "_", s).lower() applied with a list comprehension. The pyjanitor package ports clean_names() but does not expose a direct make_clean_names() alias; use list comprehensions for vector-level work.

Common pitfalls

Pitfall 1: not assigning the result. The cleaning function returns a new vector and leaves the input unchanged. Capture the output into x_clean, or assign directly into the names slot of a data frame. Functional purity is a feature, not a bug, because it lets you preview a cleanup before committing to it.

Pitfall 2: dedup suffixes surprise downstream joins. If two sources clean to the same label and one becomes total_2, a later left_join keyed on total silently drops that column. Inspect the result by comparing the length of the unique cleaned vector against the original length, and clean once at import so downstream code never has to know about the original spellings.

Warning
Renaming factor levels with as.character() can drop empty levels. Always assign to levels(f) directly, as in pattern 3. Round-tripping through as.character() then factor() discards levels that have zero observations, which breaks reproducibility when your downstream code expects a fixed level set.

Pitfall 3: the case option overrides exact custom replacements. If you map the percent sign to the uppercase token PCT and want that exact spelling, pass case equal to none. The default snake case lowercases everything after substitution, so an uppercase replacement reads as lowercase in the output.

Try it yourself

Try it: Take the vector messy_cols <- c("Order ID", "Customer.Name", "Sales (USD)", "Sales (USD)"). Clean it with make_clean_names() and assign the result as the names of a 1-row data frame ex_df <- data.frame(1, 2, 3, 4). Print names(ex_df).

RYour turn: clean and assign
# Try it: clean a vector and use it as column names messy_cols <- c("Order ID", "Customer.Name", "Sales (USD)", "Sales (USD)") ex_df <- data.frame(1, 2, 3, 4) names(ex_df) <- # your code here names(ex_df) #> Expected: c("order_id", "customer_name", "sales_usd", "sales_usd_2")

  
Click to reveal solution
RSolution
messy_cols <- c("Order ID", "Customer.Name", "Sales (USD)", "Sales (USD)") ex_df <- data.frame(1, 2, 3, 4) names(ex_df) <- make_clean_names(messy_cols) names(ex_df) #> [1] "order_id" "customer_name" "sales_usd" "sales_usd_2"

  

Explanation: make_clean_names() returns a cleaned vector that you assign with names(df) <-. The duplicate "Sales (USD)" becomes sales_usd and sales_usd_2, preserving both columns with unique labels.

After mastering make_clean_names(), look at:

  • clean_names(): the same engine applied to a full data frame in one call
  • remove_empty(): drop empty rows or columns, often the next step after import
  • row_to_names(): promote a data row to be the new column names, then clean them
  • get_dupes(): inspect rows duplicated by one or more columns
  • setNames(): base R helper to attach a name vector to any named object

For a fuller tour of the janitor package, see the janitor package guide. The official documentation lives at sfirke.github.io/janitor.

FAQ

What is the difference between make_clean_names() and clean_names()?

make_clean_names() operates on a character vector and returns a character vector. clean_names() operates on a data frame and returns a data frame with the column names rewritten. The cleaning rules and option set are identical; only the input and output types differ.

How do I use make_clean_names() with readxl or readr?

For data frames, prefer clean_names() in the pipeline: read_csv("file.csv") |> clean_names(). Reach for make_clean_names() on sheet or range labels read separately, for example readxl::excel_sheets("workbook.xlsx") |> make_clean_names() to build keys before iterating sheets with purrr::map().

Does make_clean_names() handle non-English or Unicode characters?

Yes. By default, ascii = TRUE transliterates accented Latin characters (é to e, ñ to n) using the Latin-ASCII rule. Pass ascii = FALSE to preserve the originals. The transliterations argument accepts other ICU rules for non-Latin scripts.

Why does make_clean_names() produce _2 and _3 suffixes?

When two inputs clean to the same result (for example "Total $" and "Total %" both become total), the function appends a numeric suffix to keep each output unique. Set allow_dupes = TRUE to skip suffixes when collisions are intentional, such as a long-form key column.

Can I chain make_clean_names() with the pipe?

Yes. It accepts a vector as the first argument: names(df) |> make_clean_names() |> head(). This previews the cleaned output before you commit. Wrap the result in setNames() or assign back to names(df) once the output looks right.