Outlier Detection
Outliers can quietly distort means, standard deviations, and regression slopes. Grubbs, ESD, Hampel (MAD), and Tukey IQR each flag suspicious values with different rules. Paste your data, pick a method, and see exactly which points get flagged, by what statistic, and at what alpha level.
Flag values that drift far from the bulk of your data using Grubbs, Generalized ESD, the Hampel (MAD) filter, or the Tukey IQR rule. Reproducible R code, runs in your browser.
What is an outlier?▾
An outlier is a value that sits unusually far from the centre of a sample relative to its spread. The word unusually hides a choice: a method, a threshold, and an assumption about the underlying distribution.
Parametric tests like Grubbs and Generalized ESD assume the bulk of the data is normally distributed and use the studentized residual against a critical value. They are powerful when the assumption holds and oversensitive when it does not.
Robust rules like the Hampel filter (median absolute deviation) and the Tukey IQR rule use rank-based dispersion measures that the outliers themselves cannot inflate. They make weaker assumptions and accept that they may flag a few extra points in heavy-tailed but legitimate data.
None of these tests answer should this point be removed. They answer does this point look unusual under this model. Removal is an editorial decision; this tool flags and explains, never auto-deletes.
Try a real-world example to load.
Per-test summary
Flagged values
- none
# Outlier detection in R library(outliers) x <- c(2, 3, 3, 4, 4, 5, 5, 6, 7, 30) # Grubbs (single outlier, two-sided) grubbs.test(x, two.sided = TRUE) # Generalized ESD (manual loop, k = max outliers) esd <- function(x, k = 5, alpha = 0.05) { n <- length(x); xx <- x; idx <- seq_along(x) out <- integer(0) for (i in 1:k) { m <- mean(xx); s <- sd(xx); ni <- length(xx) R <- max(abs(xx - m)) / s p <- 1 - alpha / (2 * ni) t <- qt(p, ni - 2) lam <- (ni - 1) * t / sqrt(ni * (ni - 2 + t^2)) if (R > lam) { j <- which.max(abs(xx - m)) out <- c(out, idx[j]); xx <- xx[-j]; idx <- idx[-j] } } out } esd(x, k = 5) # Hampel filter (median absolute deviation, k = 3) mads <- abs(x - median(x)) / mad(x) which(mads > 3) # Tukey IQR rule (1.5 * IQR fences) boxplot.stats(x)$out
Read moreAnatomy of each test
G > ((n-1)/sqrt(n)) * sqrt(t² / (n-2+t²)) with t = qt(1 - α/(2n), n-2). Tests one outlier per call; iterate manually if needed. Assumes normality.k times. Compare each against a Bonferroni-style critical value. Detects up to k outliers without prior knowledge of how many.k = 3 approximates a 3-sigma rule for normal data because 1.4826 * MAD ≈ sd. Robust because the outliers cannot inflate the dispersion measure.k = 1.5 for “mild” outliers, k = 3 for “extreme”. Distribution-free, well-suited to skewed data, but tends to over-flag in small samples and in long-tailed but legitimate distributions.CaveatsWhen each method goes wrong
- Failure mode
- What to do
- Grubbs on heavy-tailed data
- Over-rejects. Use Hampel or IQR instead, or apply a log transform.
- ESD with k too low
- Masks real outliers (the masking effect). Set
kgenerously; the iterative test handles the surplus. - Hampel on data with zero MAD
- MAD is zero when more than half the values are identical. The tool falls back to no detection and warns.
- Tukey IQR on small n
- Q1 and Q3 are unstable for n < 10. Prefer Grubbs at small sample sizes.
- Removing without thought
- An outlier is a question, not a verdict. Investigate the source before deletion.
Further readingRelated calculators & posts
- Normality test picker · check whether Grubbs and ESD assumptions hold before trusting them.
- Confidence interval calculator · recompute your CI after deciding what to do with flagged points.
- t-test calculator · compare means once you have settled on the cleaned sample.
- Outlier Treatment in R · longer treatment of detection, capping, and Winsorisation.