stringr str_order() in R: Sort Index for Character Vectors
stringr str_order() returns the integer permutation that puts a character vector into sorted order, like base order() but with locale-aware, predictable Unicode collation. It powers row reordering, ranking, and any task where you need the sorting positions rather than the sorted values themselves.
str_order(x) # ascending sort index str_order(x, decreasing = TRUE) # descending sort index str_order(x, na_last = FALSE) # NAs first str_order(x, na_last = NA) # drop NAs from index str_order(x, locale = "en") # English collation (default) str_order(x, locale = "sv") # Swedish locale rules str_order(x, numeric = TRUE) # "a10" after "a2", natural sort x[str_order(x)] # equivalent to str_sort(x)
Need explanation? Read on for examples and pitfalls.
What str_order() does in one sentence
str_order() turns a character vector into the integer positions that, when used as an index, produce a sorted vector. It is the stringr-flavored sibling of base order() with two upgrades: locale handling is explicit and consistent across operating systems, and a numeric = TRUE flag enables natural sort. The output is a permutation; pass it to [ ] to reorder the original vector, a data frame, or any parallel structure.
The integer vector c(2, 1, 3) reads as "to sort x, take element 2 first, then 1, then 3." Indexing the original vector with that permutation reproduces what str_sort() would return directly. Use str_order() when you need to reorder a different parallel object using the same sort.
Syntax
str_order() accepts one required argument and four tuning arguments. Every option is keyword-driven, so call sites stay readable.
The default locale of "en" produces English collation that is stable across Windows, macOS, and Linux, which is the main reason to prefer str_order() over order() for cross-platform code. The next blocks walk through each argument.
na_last controls only NA placement, not ordering direction. Pair it with decreasing to get any of the four corner cases.
Five common str_order() scenarios
Five patterns cover almost every real call to str_order(). Each block is independent and uses built-in vectors so you can run them in the live console.
Reorder a character vector
Use str_order() when you need the sort positions, not just the sorted output. Indexing with the result is what makes it different from str_sort().
This block happens to be a no-op because state.name already ships sorted. Shuffle it first and the difference is obvious. The pattern matters more than this specific result.
Order a data frame by a string column
Pass the permutation to [idx, ] to reorder rows. This is the dplyr-free version of arrange(df, col).
str_order() gives the row order; the bracket subscript applies it. Pair this with several keys by stacking permutations or, more conveniently, with dplyr::arrange().
Sort in descending order
decreasing = TRUE flips the index without changing locale rules. Tie-breaking remains stable.
The decreasing = TRUE argument is preferred over rev(str_order(x)) because it preserves stable order when input contains duplicates.
Locale-aware sorting
Different languages sort letters differently. Swedish puts å after z; German treats ö like o; default English collation does neither.
locale = "sv" (Swedish) ranks å and ö after z, matching how a Swedish dictionary orders entries. Pick the locale that matches your readers, not your server.
"en" locale is consistent across operating systems. Base R order() falls back to the system locale (LC_COLLATE), which is why a CSV sorted on a developer laptop sometimes reorders on a production Linux box. str_order() removes that surprise.Natural sort for filenames and versions
numeric = TRUE compares embedded digit runs numerically. This is what users mean by "natural sort" or "human sort".
Lexicographic sort puts file10 before file2 because "1" precedes "2" character by character. Natural sort understands that 10 > 2 and reorders accordingly. Use it for filenames, version strings, and any IDs with embedded counters.
str_order() vs order() vs str_sort() vs sort()
Four functions order strings, but they answer two different questions in two different ways. Picking the wrong one is the most common bug in sorting code.
| Function | Returns | Locale behavior | Best for |
|---|---|---|---|
str_order(x) |
integer permutation | explicit locale = arg, consistent across OS |
reorder parallel objects, ranks |
str_sort(x) |
sorted character vector | same explicit locale | quick "give me the sorted values" |
order(x) |
integer permutation | uses system LC_COLLATE, varies by OS |
base-only code; same idea as str_order |
sort(x) |
sorted character vector | uses system LC_COLLATE |
base-only code; same idea as str_sort |
str_order() when you need to reorder something other than x itself (a data frame row, a parallel vector, a ranking). Reach for str_sort() when all you want is the sorted strings. The choice is about what you do next, not what the values look like.Common pitfalls
Three pitfalls account for most surprises with str_order(). Each has a one-line fix.
Using order() and expecting cross-platform stability
Base order() honors the system locale. A file that sorts cleanly on macOS can shuffle on Linux because the OS-level LC_COLLATE differs.
If the team ships R code that runs on developer laptops, CI runners, and production servers, default to str_order() with an explicit locale.
Ignoring NA placement
NAs land at the end by default, even with decreasing = TRUE. That is rarely what you want when ranking real data.
Pass na_last = NA when you want the index to ignore missing values, or filter the input first with v[!is.na(v)] if downstream code expects a full permutation.
Lexicographic vs natural sort
Filenames and version strings need numeric = TRUE to sort the way users read them. Defaults treat digit characters one at a time.
The first call puts A10 before A2. The second compares the trailing digits as numbers and puts them in the natural order users expect.
numeric = TRUE only kicks in on digit runs. Strings with no digits sort identically with both settings; strings that mix digits and letters get the natural treatment only for the digit segments. Test on representative inputs before relying on it for mixed payloads.Try it yourself
Try it: Use state.name to find the alphabetically last five state names (ascending order would put them at the end). Use str_order() and a slice, not str_sort().
Click to reveal solution
Explanation: str_order() gives the ascending permutation of state.name. Indexing into the vector with that permutation produces the sorted names, and tail(., 5) keeps the last five, which are the alphabetically last five state names.
Related stringr functions
When str_order() is not quite what you need, these are the next stops:
- str_sort() returns the sorted vector directly instead of the permutation.
- str_rank() returns the rank of each element (the inverse permutation).
- str_detect() filters strings by a pattern before sorting.
- str_to_lower() normalizes case for case-insensitive sorting.
- arrange() is the dplyr verb that sorts a whole data frame by one or more columns.
- The official stringr reference for str_order covers every argument and edge case.
FAQ
What is the difference between str_order() and order() in R?
Both functions return an integer permutation that sorts a vector, but they differ in how locale is chosen. str_order() takes an explicit locale = argument and uses ICU, which gives identical results on every operating system. Base order() falls back to the system LC_COLLATE, so the same code can produce different sorts on a developer laptop and a CI runner. Use str_order() in production code where reproducibility matters.
How do I sort a data frame by a string column with str_order()?
Compute the permutation, then subset rows with that index. For example, df[str_order(df$name), ] reorders df by name. This is the base-R equivalent of dplyr::arrange(df, name). For descending order, pass decreasing = TRUE. For multi-key sorts, prefer arrange() or chain order() with multiple columns since str_order() takes only one vector at a time.
Does str_order() handle natural sort for filenames?
Yes. Pass numeric = TRUE to compare embedded digit runs as numbers. That turns c("file2", "file10") into c("file2", "file10") in sorted order instead of the lexicographic c("file10", "file2"). The flag only affects digit segments; letters still sort by the chosen locale. It is the simplest way to handle version strings and counters embedded in IDs.
Why does str_order() put NAs at the end?
The default na_last = TRUE places NAs after every non-missing value, which keeps the result a complete permutation of the input. Set na_last = FALSE to put NAs first or na_last = NA to drop them from the result. Choose based on whether your downstream code wants a full-length index or a missing-free one.
Can str_order() sort case-insensitively?
Yes, by normalizing case before ordering. Wrap the argument in str_to_lower() (or str_to_upper()), e.g., str_order(str_to_lower(x)). ICU collation also exposes locale-level case folding, but the lowercase-then-order pattern is the easiest to read and works for most reporting needs.