mean() in R: Arithmetic Mean With Trim and NA Handling
The mean() function in base R computes the arithmetic average of a numeric vector. Pass na.rm = TRUE to ignore missing values and trim to drop the tails for a robust central tendency.
mean(x) # arithmetic mean mean(x, na.rm = TRUE) # ignore NA values mean(x, trim = 0.1) # 10% trimmed mean (robust to outliers) mean(x[x > 0]) # conditional mean (positive values only) weighted.mean(x, w) # weighted mean (separate function) mean(mtcars$mpg) # mean of a data frame column colMeans(mtcars) # column-wise mean of a numeric data frame
Need explanation? Read on for examples and pitfalls.
What mean() does in one sentence
mean() returns the sum of x divided by length(x) as a single numeric value. It accepts numeric, integer, logical (where TRUE = 1), and complex vectors. For other input types it returns NA with a warning.
The function is method-dispatching, so date and POSIXct objects also have a defined mean. mean(Sys.Date() + 0:9) returns the midpoint date as a Date, not a number.
Syntax
mean(x, trim = 0, na.rm = FALSE, ...) takes a vector plus two optional arguments. trim drops a fraction of values from each end; na.rm controls how NA values are handled.
The three arguments:
x: numeric, logical, or date/time vectortrim: fraction (between 0 and 0.5) of values to trim from each end before averagingna.rm: ifTRUE, dropNAbefore computing; default isFALSE
na.rm = TRUE whenever you suspect missing values. With the default na.rm = FALSE, a single NA makes the entire result NA. This is the most common base-R surprise; check for it first when a summary returns NA.Five common patterns
1. Plain arithmetic mean
The sum is 15, the length is 5, the mean is 3.
2. Ignore missing values
Without na.rm = TRUE, any NA poisons the result. With it, the function drops NA and averages the remaining four values.
3. Trimmed mean for outlier robustness
trim = 0.1 drops the lowest 10% and highest 10% of sorted values before averaging. The single outlier of 100 pulls the plain mean to 15.4; the trimmed mean of 6.5 better reflects the bulk of the data.
4. Mean of a data frame column
For a single column, pass it directly. For every numeric column, use colMeans() or sapply(df, mean).
5. Mean of a logical vector returns a proportion
R coerces logical to numeric (TRUE = 1, FALSE = 0). The mean of a logical vector is the proportion that is TRUE. This is the cleanest one-liner for proportions in base R.
mean(condition) is the idiomatic way to compute a proportion in R. Because logicals coerce to 0 and 1, mean(x > threshold) returns the share of values above the threshold without a separate counter. The same trick gives mean(is.na(x)) for missingness rate and mean(x == "yes") for the yes-rate.mean vs median vs trimmed mean vs weighted mean
Pick the central-tendency function that matches your data shape. The table compares mean, median, trimmed mean, and weighted mean.
| Function | What it computes | When to use |
|---|---|---|
mean(x) |
Arithmetic average | Symmetric data with no outliers |
mean(x, trim = 0.1) |
Trimmed mean | Skewed data with extreme tails |
median(x) |
Middle value | Heavily skewed data; ordinal scales |
weighted.mean(x, w) |
Weighted average | Observations differ in importance |
colMeans(df) |
Column-wise mean | All numeric columns of a data frame |
For symmetric, well-behaved data the plain mean is the standard summary. The moment outliers appear, switch to a median or a trimmed mean for a more honest center.
Common pitfalls
Pitfall 1: mean() returns NA when any element is NA. Always set na.rm = TRUE for real-world data. If you forget, downstream summaries silently propagate NA and break plots and tables.
Pitfall 2: mean of a character vector returns NA with a warning. R does not auto-convert strings to numbers. Convert first with as.numeric(), then handle the NA values from parse failures.
Pitfall 3: trim is a fraction, not a count. mean(x, trim = 2) is invalid; trim must be between 0 and 0.5. Use trim = 0.1 for 10%, not trim = 10.
mean(). mean(mtcars) returns NA with a warning in modern R. Use colMeans(mtcars) for numeric columns or sapply(mtcars, mean) for a column-by-column mean. This bug bites every R beginner once.Try it yourself
Try it: Compute the mean of mtcars$mpg for cars with exactly 4 cylinders. Save the result to ex_mean_4cyl.
Click to reveal solution
Explanation: The subset mtcars$mpg[mtcars$cyl == 4] keeps mpg values only where cylinder count is 4. Passing that filtered vector to mean() gives the conditional average. The same logical-subset pattern powers most "mean of a subgroup" questions in base R.
Related base R summary functions
After mastering mean(), look at:
median(): middle value, robust to outlierssd()andvar(): spread around the meanrange(),min(),max(): extremessummary(): five-number summary plus mean in one callcolMeans()androwMeans(): bulk averages for matrices and data framesaggregate()anddplyr::summarise(): group-wise means
For descriptive statistics across many variables at once, see the descriptive statistics in R guide. For weighted averages, weighted.mean() extends the toolkit; for running or rolling means, zoo::rollmean() is the go-to.
FAQ
How do I calculate the mean in R while ignoring missing values?
Pass na.rm = TRUE: mean(x, na.rm = TRUE). By default, mean() returns NA if any element is NA, which silently propagates into downstream summaries. Always set na.rm = TRUE when working with real-world data, or impute the missing values first using a domain-appropriate rule before averaging.
What is the difference between mean and median in R?
mean() is the arithmetic average (sum divided by length); median() is the middle value when data is sorted. The mean is sensitive to outliers and skew; the median is not. For symmetric data the two agree; for skewed data the median is the safer summary of central tendency.
How do I compute the mean of every column in a data frame?
For numeric-only frames, use colMeans(df, na.rm = TRUE). For mixed types, use sapply(df, function(x) if (is.numeric(x)) mean(x, na.rm = TRUE) else NA). The tidyverse equivalent is dplyr::summarise(across(where(is.numeric), mean, na.rm = TRUE)) for a one-line column-wise mean.
Can I compute a weighted mean in base R?
Yes, with weighted.mean(x, w) where w is a numeric vector of weights the same length as x. The function returns sum(x * w) / sum(w). Use it when observations contribute unequally, such as survey weights or grades with different credit values.
Why does mean(mtcars) return NA?
Because mean() expects a vector, not a data frame. Older R versions silently coerced the frame; current R returns NA with a warning. Use colMeans(mtcars) for column-wise means or mean(mtcars$mpg) for a single column.