gsub() in R: Replace All Pattern Matches

The gsub() function in base R replaces ALL regex matches in a character vector with a replacement. sub() replaces only the FIRST match per string. Both are vectorized.

By Selva Prabhakaran · Published May 12, 2026 · Last updated May 12, 2026

⚡ Quick Answer

gsub("apple", "orange", x)              # replace all
sub("apple", "orange", x)               # replace first only
gsub("\\d+", "#", x)                    # all digits
gsub("\\s+", " ", x)                    # collapse whitespace
gsub("(\\d+)-(\\d+)", "\\2-\\1", x)     # backreference swap
gsub("apple", "orange", x, fixed = TRUE)# literal
gsub("apple", "orange", x, ignore.case = TRUE)

Need explanation? Read on for examples and pitfalls.

📊 Is gsub() the right tool?

What gsub() does in one sentence

gsub(pattern, replacement, x) finds every regex match of pattern in each string of x and replaces it with replacement. sub() does the same but stops after the first match per string.

These are the standard base R functions for text cleaning: removing punctuation, normalizing whitespace, swapping codes, masking sensitive values.

Syntax

gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE).

Run live

Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.

RReplace all 'apple' with 'orange'

x <- c("apple pie with apple", "banana", "apple cake") gsub("apple", "orange", x) #> [1] "orange pie with orange" "banana" "orange cake" sub("apple", "orange", x) # first only #> [1] "orange pie with apple" "banana" "orange cake"

Tip

Default is regex; pass fixed = TRUE for literal replacement. Without fixed, characters like . * + ( are special. gsub(".", "x", "abc") returns "xxx" (each char). gsub(".", "x", "abc", fixed = TRUE) returns "abc" (no . in input).

Five common patterns

1. Remove punctuation

RStrip non-word characters

x <- c("Hello, World!", "How are you?") gsub("[[:punct:]]", "", x) #> [1] "Hello World" "How are you"

[[:punct:]] is a POSIX class for punctuation.

2. Collapse whitespace

RMultiple spaces into one

x <- " hello world " gsub("\\s+", " ", x) #> [1] " hello world " trimws(gsub("\\s+", " ", x)) #> [1] "hello world"

\\s+ matches one or more whitespace chars; trimws removes leading/trailing.

3. Replace digits with placeholder

RMask numbers

x <- c("user123", "item456", "abc") gsub("\\d+", "#", x) #> [1] "user#" "item#" "abc"

4. Backreferences (capture and swap)

RReorder date components

dates <- c("2024-01-15", "2025-03-20") gsub("(\\d{4})-(\\d{2})-(\\d{2})", "\\3/\\2/\\1", dates) #> [1] "15/01/2024" "20/03/2025"

\\1, \\2, \\3 reference the 1st, 2nd, 3rd capture groups in the pattern.

5. Remove instead of replace

REmpty string deletes the match

x <- c("apple-pie-123", "banana-456") gsub("-\\d+$", "", x) #> [1] "apple-pie" "banana"

Replacing with "" is the standard "delete pattern" idiom.

Key Insight

gsub is REGEX by default; if you want literal text, use fixed = TRUE. Forgetting this is the #1 source of bugs. gsub(".", "x", "abc") replaces EVERY character because . is regex. Always think: is my pattern regex or literal?

gsub() vs sub() vs str_replace_all() vs chartr()

Four "find and replace" functions in R, with different scope.

Function	Replaces	Regex	Best for
`gsub()`	All matches	Yes (default)	Standard regex replace
`sub()`	First match per string	Yes (default)	"Replace first occurrence"
`stringr::str_replace_all()`	All matches	Yes	Tidyverse pipelines
`stringr::str_replace()`	First match	Yes	Tidyverse, single replace
`chartr()`	Char-by-char map	No	Translate single chars

When to use which:

gsub for default base R replacement.
sub when you only want the first hit (useful for "trim leading X" patterns).
str_replace_all for tidyverse code.
chartr for fast 1-to-1 character mapping (no regex needed).

A practical text-cleaning workflow

Most text cleaning is a chain of gsub calls. A typical pipeline:

Lowercase: tolower(x)
Strip punctuation: gsub("[[:punct:]]", "", x)
Collapse whitespace: gsub("\\s+", " ", x)
Trim: trimws(x)
Standardize specific tokens: gsub("usa|united states", "US", x, ignore.case = TRUE)

Build the chain incrementally and inspect intermediate results. The biggest source of bugs is a regex that matches more (or less) than you intended.

Common pitfalls

Pitfall 1: backslash escaping. Regex \d is "\\d" in an R string. gsub("\d+", ...) ERRORS or behaves unexpectedly. Always use \\d (double backslash).

Pitfall 2: greedy vs lazy quantifiers. gsub("<.*>", "", "<a>text<b>") returns "" (greedy). For lazy match, use <.*?> (works in perl = TRUE mode) or anchor more precisely.

Warning

Replacing with backreferences requires a CAPTURE group in the pattern. gsub("\\d+", "(\\1)", x) does NOT work because there are no parentheses around \\d+. Wrap in (...) to capture: gsub("(\\d+)", "(\\1)", x).

Performance and Unicode notes

For most everyday inputs, gsub is fast enough that you should not think about performance. It runs in compiled C and handles vectors of millions of strings without issue. Two situations where performance does matter: very long strings (megabyte-class text) and patterns with catastrophic backtracking (alternation inside a quantifier, e.g., (a|a)*). For megabyte text, stringi::stri_replace_all_regex() is faster and Unicode-aware. For pathological patterns, simplify or use perl = TRUE which has different backtracking rules. Unicode handling in base gsub depends on locale; use stringi or stringr for consistent UTF-8 behaviour across platforms.

Try it yourself

Try it: Clean these phone numbers by stripping all non-digit characters. Save to ex_phones.

RYour turn: digits-only

phones <- c("(555) 123-4567", "555.234.5678", "+1 555 345 6789") ex_phones <- # your code here ex_phones #> Expected: c("5551234567", "5552345678", "15553456789")

Click to reveal solution

RSolution

ex_phones <- gsub("\\D", "", phones) ex_phones #> [1] "5551234567" "5552345678" "15553456789"

Explanation: \\D matches any NON-digit character. Replacing with "" removes them. Result is digits-only.

After mastering gsub, look at:

sub(): first-match variant
stringr::str_replace_all(): tidyverse equivalent
stringr::str_replace(): first-match tidyverse
chartr(): character-by-character translation
regexec() + regmatches(): extract capture groups for inspection
tools::toTitleCase(): capitalize words

For replacing many specific patterns at once, stringr::str_replace_all(x, named_vector) is cleaner than chaining gsubs.

FAQ

What is the difference between gsub and sub in R?

gsub replaces ALL matches in each string. sub replaces only the FIRST match per string. They share the same arguments.

How do I replace multiple patterns at once with gsub?

Chain calls: gsub("p2", "r2", gsub("p1", "r1", x)). Or use stringr::str_replace_all(x, c("p1" = "r1", "p2" = "r2")) for a cleaner named-vector approach.

How do I do a literal replace with gsub?

Pass fixed = TRUE: gsub(".", "x", x, fixed = TRUE) replaces literal periods, not "any character".

What is a backreference in gsub?

\\1, \\2, etc. in the replacement refer to capture groups (parenthesized parts of the pattern). They let you reorder or reuse matched parts in the output.

How do I make gsub case-insensitive?

Pass ignore.case = TRUE. Or use the inline regex flag with perl = TRUE: gsub("(?i)apple", "fruit", x, perl = TRUE).

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

gsub() in R: Replace All Pattern Matches

What gsub() does in one sentence

Syntax

Five common patterns

1. Remove punctuation

2. Collapse whitespace

3. Replace digits with placeholder

4. Backreferences (capture and swap)

5. Remove instead of replace

gsub() vs sub() vs str_replace_all() vs chartr()

A practical text-cleaning workflow

Common pitfalls

Performance and Unicode notes

Try it yourself

FAQ

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

gsub() in R: Replace All Pattern Matches

What gsub() does in one sentence

Syntax

Five common patterns

1. Remove punctuation

2. Collapse whitespace

3. Replace digits with placeholder

4. Backreferences (capture and swap)

5. Remove instead of replace

gsub() vs sub() vs str_replace_all() vs chartr()

A practical text-cleaning workflow

Common pitfalls

Performance and Unicode notes

Try it yourself

Related replace functions

FAQ

Related Tutorials