dplyr arrange() in R: Sort Rows by Column
The arrange() function in dplyr sorts the rows of a data frame by one or more columns. Default order is ascending; wrap a column in desc() to sort descending. The data frame structure stays the same, only the row order changes.
arrange(df, mpg) # ascending by mpg arrange(df, desc(mpg)) # descending by mpg arrange(df, cyl, desc(mpg)) # cyl asc, then mpg desc arrange(df, desc(is.na(mpg)), mpg) # put NAs first arrange(df, .by_group = TRUE) # respect group_by() arrange(df, factor(grade, levels = c("A","B","C"))) # custom order arrange(df, pick(starts_with("date"))) # sort by tidyselect
Need explanation? Read on for examples and pitfalls.
What arrange() does in one sentence
arrange() is a row sorter. You pass a data frame and one or more sort keys; it returns the same rows reordered by those keys. Sort keys can be column names (ascending) or desc(column) for descending. Multi-column sort is implicit when you pass several keys: ties on the first key break by the second, and so on.
Unlike base R df[order(df$x), ], arrange handles missing values consistently (NA always goes last), works inside a pipeline, and keeps row names intact for tibbles.
Syntax
arrange() takes a data frame plus sort keys. Use desc() for descending. Use .by_group = TRUE to sort within group_by groups. Use pick() and tidyselect helpers for column-set sorting.
The full signature is:
arrange(.data, ..., .by_group = FALSE, .locale = NULL)
.data is the data frame. The ... argument takes one or more sort keys. .by_group = TRUE makes arrange respect the grouping set by group_by(). .locale controls collation for character sorting (English by default).
na.last = TRUE default. To force NAs to the top, sort by desc(is.na(col)) first: arrange(df, desc(is.na(x)), x).Six common patterns
1. Sort ascending by one column
2. Sort descending with desc()
desc() reverses the sort order for that key only. Other keys remain ascending.
3. Sort by multiple columns
The first key (cyl) defines the primary order. Ties on cyl break by the second key (desc(mpg)).
4. Put NAs first
desc(is.na(mass)) evaluates to TRUE for NA rows and sorts them first. Then mass sorts the rest ascending.
5. Sort within groups using .by_group
Without .by_group = TRUE, arrange ignores groupings and sorts globally.
6. Custom sort order with factor()
Wrapping the sort key in factor() with explicit levels lets you define any order you want, including non-alphabetical or custom domain orders like "Low/Medium/High".
arrange |> head), or window functions that depend on row order (lead, lag, cumsum).arrange() vs base R sorting
Base R uses order() and bracket subsetting; arrange wraps that into a single readable call. The semantics are nearly identical. The main practical differences are pipe-friendliness, NA handling consistency, and the readable desc() helper.
| Task | dplyr | Base R |
|---|---|---|
| Sort ascending | arrange(df, mpg) |
df[order(df$mpg), ] |
| Sort descending | arrange(df, desc(mpg)) |
df[order(-df$mpg), ] |
| Multi-key | arrange(df, cyl, desc(mpg)) |
df[order(df$cyl, -df$mpg), ] |
| NAs first | arrange(df, desc(is.na(x)), x) |
df[order(df$x, na.last=FALSE), ] |
| Custom order | arrange(df, factor(g, levels=...)) |
df[order(factor(df$g, levels=...)), ] |
When to use which:
- Use
arrange()inside any dplyr pipeline. - Use base R
order()for one-line scripts or when sorting matrices and vectors that are not data frames.
Common pitfalls
Pitfall 1: arrange does not stick. A subsequent summarise() or group_by() may reorder rows. If you need a guaranteed final order, place arrange() LAST in your pipeline before saving or displaying.
Pitfall 2: forgetting .by_group after group_by(). mtcars |> group_by(cyl) |> arrange(desc(mpg)) ignores the grouping and sorts globally. To sort within groups, add .by_group = TRUE.
desc() only reverses the SORT ORDER, it does not negate the value. arrange(df, desc(name)) sorts character names Z to A. It does not transform the column. If you need the negated values themselves (for math), use arrange(df, -mpg) for numeric columns instead. desc() works on any type; - only on numeric.Pitfall 3: locale surprises with character sorting. Sort order for non-ASCII characters depends on the system locale. Pass .locale = "en" (or another explicit locale) to get reproducible results across machines.
Try it yourself
Try it: Sort mtcars by cyl ascending, then by mpg descending within each cyl group. Save the result to ex_sorted and print the first 5 rows of cyl and mpg.
Click to reveal solution
Explanation: When you pass multiple arguments to arrange(), the first is the primary sort key. Ties on cyl are broken by the second key, desc(mpg), which sorts descending. So all 4-cyl cars appear first, with the highest mpg at the top.
Related dplyr functions
After mastering arrange(), look at:
desc(): descending sort wrapperslice_max(),slice_min(): select top/bottom N rows by value (sorts implicitly)pick(): tidyselect insidearrange()for column-set sortingwith_order(): arrange-and-compute, useful for window functionsgroup_by()plus.by_group = TRUE: per-group sorting
For sorting that should be a transformation rather than a subset, also consider mutate() with rank functions like min_rank(), dense_rank(), and row_number().
FAQ
How do I sort by multiple columns in dplyr?
List them comma-separated: arrange(df, cyl, desc(mpg)) sorts cyl ascending then mpg descending within each cyl group. The first column is the primary sort key.
What is the difference between arrange and sort in R?
sort() works on a single vector and returns a vector. arrange() works on a data frame and returns a data frame with rows reordered. For multi-column sorting in a data frame, only arrange() (or order()) makes sense.
How do I sort descending in dplyr?
Wrap the column in desc(): arrange(df, desc(mpg)). For numeric columns you can also use arrange(df, -mpg). The desc() helper is more readable and works on any type.
Where do NA values go when I arrange?
NA values always sort to the END regardless of ascending or descending. This matches base R's default na.last = TRUE. To put NAs at the top, sort by desc(is.na(col)) first, then by the column.
Can I sort within groups using dplyr arrange?
Yes. Add .by_group = TRUE after group_by(): df |> group_by(cyl) |> arrange(mpg, .by_group = TRUE). Without that flag, arrange ignores grouping.