stringr str_to_sentence() in R: Capitalize the First Letter
stringr str_to_sentence() converts a character vector to sentence case by capitalizing only the first letter of the first word and lowercasing every other character. It is vectorised, NA safe, and Unicode aware, which makes it the right tool for cleaning ALL-CAPS user input, error logs, and plot captions in R.
str_to_sentence(x) # default English locale str_to_sentence(c("HELLO WORLD", "good DAY")) # vector input str_to_sentence("WAR AND PEACE") # downcases the rest str_to_sentence(NA_character_) # NA, not "NA" (NA-safe) str_to_sentence("hello world. goodbye world.") # only first letter, not each sentence df |> mutate(comment = str_to_sentence(comment)) # sentence-case a column str_to_sentence(str_replace_all(x, "_", " ")) # snake_case to sentence case label str_to_sentence("", locale = "en") # empty string passes through
Need explanation? Read on for examples and pitfalls.
What str_to_sentence() does in one sentence
str_to_sentence(string, locale = "en") returns a copy of the input with the first character upper cased and every other character lower cased. It works element-wise on a vector, propagates NA as NA, treats the whole element as one unit (not one sentence at a time), and uses Unicode-aware case mapping from the stringi package so results are identical across Windows, macOS, and Linux.
Reach for str_to_sentence() when you have shouty, mixed-case, or all-lowercase text that should read like a normal sentence: forum comments scraped in caps, error log lines, plot captions assembled from raw column names, or free-text categorical responses.
The output length matches the input, every variation collapses to a clean sentence form, NA stays NA, and the empty string passes through unchanged.
Syntax
str_to_sentence() takes two arguments and returns a character vector the same length as the input. Locale defaults to "en", which is sufficient for ASCII and most Western European text.
Four common str_to_sentence() scenarios
Sentence case shows up wherever raw text meets human eyes. Each scenario below starts from a realistic, slightly ugly input and applies str_to_sentence() to produce something safe to display.
Clean ALL-CAPS user comments for display
Loud user input becomes readable without losing the original message. Forum scrapers, survey free-text fields, and chat logs are full of users SHOUTING. Sentence case is friendlier without flattening the message to lowercase.
The first letter is capitalized, the rest of the message is lower cased, and punctuation is left alone.
Format error messages into readable prose
Heterogeneous log levels canonicalise to one display style. Error and log strings often arrive in lower case from one library and upper case from another. Pipe them through str_to_sentence() before showing them in a UI or dashboard.
The severity prefix becomes a clean leading word, and the rest of each line reads like a sentence.
Sentence-case plot captions and axis labels from snake_case columns
Column names become publication-ready captions in two function calls. ggplot2 axis titles default to the raw column name. Combine str_replace_all() with str_to_sentence() to convert miles_per_gallon into Miles per gallon for any number of columns.
The underscores become spaces first, then sentence case capitalises only the first word, which matches scientific style conventions for figure captions.
Normalise headlines scraped from mixed-case sources
One canonical casing kills duplicate rows hiding behind capitalization. When you stack data from several APIs, headlines arrive in title case, sentence case, and ALL CAPS. Pick one canonical form for storage and dedup.
All three variants collapse to one canonical sentence-cased string, ready for a deduplication step.
str_to_sentence() vs str_to_title() vs str_to_upper() vs toupper()
Four functions look similar; only one is right for any given task. Pick the function whose default behaviour matches the form you want, not the function that needs the fewest extra steps.
| Function | Output for "hello world" |
Output for "HELLO WORLD" |
Use when |
|---|---|---|---|
str_to_sentence(x) |
"Hello world" |
"Hello world" |
First letter only, rest lower case |
str_to_title(x) |
"Hello World" |
"Hello World" |
Every word capitalised |
str_to_upper(x) |
"HELLO WORLD" |
"HELLO WORLD" |
Whole string upper case |
toupper(x) |
"HELLO WORLD" |
"HELLO WORLD" |
Base R equivalent, no locale arg |
toupper() and tolower() but no sentence-case helper. Before stringr you had to write paste0(toupper(substr(x, 1, 1)), tolower(substr(x, 2, nchar(x)))). str_to_sentence() is the one-liner version.Common pitfalls
Three behaviours surprise new users. Each one has a simple workaround, but you need to know it exists before you can apply it.
Acronyms get downgraded to lower case
str_to_sentence() lower cases everything after the first character, even ALL-CAPS substrings. "NASA found water" becomes "Nasa found water". If acronyms matter, apply a post-process step.
For pipelines with many acronyms, build a vector of corrections and loop, or maintain a named lookup and str_replace_all() in one pass.
Multiple sentences in one string get only one capital letter
The function reads each vector element as one sentence regardless of internal periods. A single element can contain several periods, but str_to_sentence() treats the whole element as one sentence. To capitalize each sentence separately, split first.
The regex lookbehind keeps the period attached to each sentence so the join is faithful.
Locale arguments do not unlock special casing for English
The locale arg controls Unicode casing rules, not grammar or acronym handling. It matters for Turkish dotted-i, German sharp-s, and similar scripts. For English data, setting locale = "en" is the same as omitting it; do not expect locale to fix acronym or grammar problems.
i casing is the classic gotcha. Under locale = "tr", lower case i upper cases to I with a dot above, not plain I. Use locale = "en" explicitly when processing English text on a Turkish system, or you will see surprising glyphs.Try it yourself
Try it: Convert c("the QUICK brown fox", "JUMPS over the lazy DOG") to sentence case, then store the result in ex_sentence.
Click to reveal solution
Explanation: str_to_sentence() is vectorised, so a single call handles both elements. Each element is treated independently: the first character of each becomes upper case, every other character becomes lower case.
Related stringr functions
stringr ships a small family of case-conversion helpers, all built on the same stringi engine. Pick the one whose default output matches your target form.
str_to_title()capitalizes the first letter of every word; use for names, headlines, and book titles.str_to_lower()lowercases the entire string; use before fuzzy matching or deduplication.str_to_upper()uppercases the entire string; use for codes, tickers, and SQL keywords.str_trim()removes leading and trailing whitespace; chain before any case-conversion to avoid stray spaces.str_replace_all()runs regex-based substitutions; combine with case helpers to protect acronyms or split on custom boundaries.
For the full reference, see the stringr case conversion docs on tidyverse.org.
FAQ
What is the difference between str_to_sentence and str_to_title in R?
str_to_sentence() capitalizes only the first letter of the first word, while str_to_title() capitalizes the first letter of every word. For "hello world", str_to_sentence() returns "Hello world" and str_to_title() returns "Hello World". Pick str_to_sentence() for prose-style display and str_to_title() for headlines, names, and figure titles.
How do I capitalize each sentence in a paragraph in R?
str_to_sentence() treats each vector element as one sentence, so a paragraph with three periods gets only one capital letter. Split the paragraph on sentence boundaries first with str_split(x, "(?<=\\.)\\s+"), apply str_to_sentence() to the resulting vector, then paste it back together with paste(..., collapse = " "). The lookbehind keeps the period attached to each piece.
Does str_to_sentence handle NA values safely?
Yes. NA character values pass through unchanged, returning NA in the same position. Empty strings also pass through as empty strings. This matters when you sentence-case a column inside mutate() because you do not need a na.rm step or an ifelse() guard.
Is str_to_sentence available in base R?
No. Base R provides toupper() and tolower() but no sentence-case helper. Without stringr you would write paste0(toupper(substr(x, 1, 1)), tolower(substr(x, 2, nchar(x)))) element by element, which is verbose and slow for long vectors. stringr's str_to_sentence() is the vectorised, NA safe one-liner.
Why does str_to_sentence lowercase my acronyms?
The function applies a blanket lower case to every character after the first, so "NASA found water" becomes "Nasa found water". There is no built-in protection. The standard fix is a follow-up str_replace() or str_replace_all() with a named lookup of acronyms to restore the original casing.