apply() in R: Apply a Function Over Matrix Rows or Columns
The apply() function in base R applies a function over rows (MARGIN = 1) or columns (MARGIN = 2) of a matrix or data frame. It is the matrix-specific cousin of lapply and sapply.
apply(m, MARGIN = 1, FUN = sum) # row sums apply(m, MARGIN = 2, FUN = mean) # column means apply(m, 2, function(x) sum(is.na(x))) # NA count per col apply(m, 1, max) # max per row apply(m, c(1,2), fn) # element-wise (rare) rowSums(m); colSums(m) # faster for sum/mean matrixStats::rowMaxs(m) # faster specialized
Need explanation? Read on for examples and pitfalls.
What apply() does in one sentence
apply(X, MARGIN, FUN) repeatedly calls FUN on each row (MARGIN = 1) or column (MARGIN = 2) of a matrix or 2D array, returning the results as a vector or matrix. For data frames, apply implicitly converts to a matrix, which loses type information.
apply() is matrix-specific. For lists and vectors, use lapply or sapply. For column-wise transforms on data frames, prefer dplyr::mutate(across(...)) or lapply(df, fn).
Syntax
apply(X, MARGIN, FUN, ...). MARGIN: 1 = rows, 2 = columns, c(1,2) = each cell.
rowSums, colSums, rowMeans, colMeans instead of apply. They are faster and clearer. apply(m, 2, sum) and colSums(m) produce the same result; the latter is specialized C code.Five common patterns
1. Row sums, column sums
For these specific cases, rowSums(m) and colSums(m) are the recommended fast alternatives.
2. Column-wise NA count
apply with a custom function is the standard tool for arbitrary column-wise computations.
3. Max per row
4. Element-wise transformation (MARGIN = c(1, 2))
MARGIN = c(1, 2) calls fn on every element. Same as m^2 for vectorized ops; only useful for non-vectorized functions.
5. apply on data frame (with caution)
This works only because all columns are numeric. With a character column, apply converts everything to character, breaking the operation. For data frames, prefer lapply(df, fn) or dplyr::summarise(across(...)).
apply() is for MATRICES; for DATA FRAMES, use lapply() or purrr::map(). Data frames may have mixed column types. apply() silently converts everything to the most general type (usually character), which breaks numeric operations. The matrix-specific apply is only safe when all data are the same type.apply vs sapply vs lapply vs purrr::map
Four R iteration functions, each tuned to a different input shape and output expectation. Knowing which one to reach for is essentially a matter of "what shape is my input, and what shape do I want back?"
| Function | Input | Output | Best for |
|---|---|---|---|
apply() |
Matrix / 2D array | Vector or matrix | Row/column-wise on matrix |
sapply() |
List or vector | Vector or matrix | Quick interactive simplification |
lapply() |
List or vector | List | Type-predictable list output |
vapply() |
List or vector | Type-strict vector | Production code |
purrr::map_*() |
List or vector | Type-strict per variant | Tidyverse |
When to use which:
- Use
apply()only on matrices and 2D numeric arrays. - Use
lapply/sapply/vapplyfor lists and vectors. - Use
dplyr::mutate(across(...))for data frame column transforms.
Common pitfalls
Pitfall 1: applying to a data frame with mixed types. apply(df, 2, mean) errors if any column is character. apply silently coerces; the coercion changes the data.
Pitfall 2: forgetting MARGIN. apply(m, sum) (no MARGIN) errors. Always specify MARGIN = 1 or 2.
apply() is SLOWER than rowSums() / colSums() for sum / mean. For large matrices, the specialized functions are 10x faster. Use them for performance-sensitive code.Try it yourself
Try it: Compute the standard deviation of each column in matrix m using apply. Save to ex_sds.
Click to reveal solution
Explanation: apply(m, 2, sd) runs sd() on each of the 5 columns (MARGIN = 2 means columns). Each call returns one number; the result is a numeric vector of length 5.
Related apply functions
After mastering apply, look at:
rowSums(),colSums(),rowMeans(),colMeans(): fast specialized versionslapply(),sapply(),vapply(): for lists and vectorsmapply(): multi-argument applypurrr::map()family: tidyverse alternativesmatrixStatspackage: many specialized matrix operations
For data frame transforms, dplyr::summarise(across(...)) and dplyr::mutate(across(...)) are more idiomatic than apply.
FAQ
What does the MARGIN argument do in apply?
MARGIN = 1 applies the function to each ROW. MARGIN = 2 applies to each COLUMN. MARGIN = c(1, 2) applies to each ELEMENT.
What is the difference between apply and sapply in R?
apply() works on matrices and 2D arrays, applying a function over rows or columns. sapply() works on lists and vectors, applying a function over elements. They serve different data shapes.
Can I use apply on a data frame in R?
Yes, but it converts the data frame to a matrix first. If your columns are all numeric, this works. If any column is character, all data become character. For mixed data frames, prefer lapply(df, fn) or dplyr::summarise(across(...)).
How do I compute row means in R?
Use rowMeans(m) for fast specialized computation. Or apply(m, 1, mean) for the general approach. For data frames with mixed types, dplyr::rowwise() %>% mutate(rm = mean(c_across(everything()))).
Why is apply slow on large matrices?
apply() is implemented in R, so it loops in interpreted code. For sum / mean / etc., specialized C-level functions like rowSums and colMeans are much faster. For other operations on large matrices, consider the matrixStats package.