R String Manipulation Exercises: 10 stringr Practice Problems Solved
Ten focused string exercises using the stringr package, detect, extract, replace, split, pad, case and regex. Every problem runs in the browser with a worked solution you can reveal. Use these to build fluency with the functions you will reach for in every data-cleaning job.
Every real dataset has messy strings: trailing whitespace, inconsistent case, dates embedded in filenames, phone numbers with parentheses. The stringr package gives you a small, consistent set of verbs that handle all of this. These exercises cover the ten you will use most often.
Setup
Section 1, Trim, pad, and case
Exercise 1. Trim whitespace and fix case
From names_raw, produce a vector where leading and trailing whitespace is removed and each name is converted to title case (first letter of each word capitalised).
Solution
str_trim() removes whitespace on both sides; str_squish() also collapses internal runs of whitespace to single spaces. str_to_title() uppercases the first character of each word.
Exercise 2. Pad to fixed width
Given ids <- c("7", "42", "309", "1024"), pad each to width 5 with leading zeros so they become "00007", "00042", "00309", "01024".
Solution
str_pad() is the cleanest way to build fixed-width identifiers. The side argument can be "left", "right", or "both".
Section 2, Detect and count
Exercise 3. Detect a substring
Using names_clean from Exercise 1, return a logical vector that is TRUE for names containing the letter "a" (case-insensitive).
Solution
str_detect() returns a logical vector the same length as its input. Use regex(..., ignore_case = TRUE) when you want to ignore case inside the pattern itself.
Exercise 4. Count occurrences
Count how many times the letter "e" appears in each element of names_clean.
Solution
str_count() returns an integer vector, one count per input string.
Section 3, Extract
Exercise 5. Extract the first word
Return the first word (the given name) from each element of names_clean.
Solution
str_extract() returns the first match of the pattern, or NA if there is none. word(x, 1) is a shortcut that does not require regex.
Exercise 6. Extract all numbers from a string
Given txt <- "Year 2024, month 03, day 15 — score 42.7", extract every number (including the decimal) as a character vector.
Solution
str_extract_all() returns a list because each string can have any number of matches. Index into the list with [[1]] for a single-string input.
Section 4, Replace and split
Exercise 7. Replace the first match
In "2024-03-15", replace the first - with a space, keeping the rest intact.
Solution
str_replace() replaces only the first match; str_replace_all() replaces every match.
Exercise 8. Split and take
Given paths <- c("data/raw/file1.csv", "data/clean/file2.csv", "output/file3.csv"), extract just the file name (the part after the last /).
Solution
str_split() returns a list because each string can split into a different number of pieces. For known-structured paths, basename() is simpler and faster.
Section 5, Light regex
Exercise 9. Extract an email address
Given text <- "Contact: ada@example.org for help, or bob@work.dev", extract both email addresses.
Solution
The pattern says: one or more allowed local-part characters, then @, then one or more allowed domain characters, then a dot, then two or more letters for the TLD. This is a pragmatic email regex, not a complete RFC-compliant one (which is notoriously complex).
Exercise 10. Validate with a pattern
Write a function is_valid_phone(x) that returns TRUE for strings of the form xxx-xxx-xxxx where each x is a digit, and FALSE otherwise. Test with c("555-123-4567", "5551234567", "abc-def-ghij", "555-12-4567").
Solution
The anchors ^ and $ force the pattern to match the entire string, not just a substring. \d{3} means exactly three digits. Without the anchors, "555-123-4567 ext 99" would also match.
Summary
- Trim with
str_trim()/str_squish(). Pad withstr_pad(). Change case withstr_to_lower(),str_to_upper(),str_to_title(). - Detect with
str_detect(), count withstr_count(), both return vectors the same length as the input. - Extract with
str_extract()(first match) orstr_extract_all()(all matches, returns a list). - Replace with
str_replace()(first) orstr_replace_all()(all). Split withstr_split(). - Light regex essentials:
\\ddigit,\\swhitespace,\\Snon-whitespace,{n}exactly n,+one or more,*zero or more,^/$anchors.