R solve() Error: 'singular matrix', Diagnose Multicollinearity and Fix It
Error in solve.default(...) : system is computationally singular means R tried to invert a matrix whose columns carry duplicate information, one column can be written as a weighted sum of the others. In regression language, that is multicollinearity, and the fix is to find which columns collide and then drop, combine, or regularise them.
Why does solve() complain that my matrix is 'computationally singular'?
solve() inverts a matrix by running Gaussian elimination. If a column repeats information already carried by other columns, one elimination step ends up dividing by zero, or by a number so small that the result explodes into machine noise. R refuses to return nonsense and throws the error instead. The quickest way to see it is to feed solve() a 3×3 matrix whose second column is exactly twice the first, then watch R call out the zero determinant and refuse the inverse.
Let's build that matrix and hit the error on purpose. We'll measure how singular it is with det() and rcond(), then call solve() inside tryCatch() so the error text prints instead of stopping the page.
The determinant is zero because column 2 is a scalar multiple of column 1, so the three columns span only a 2D plane instead of the full 3D space. rcond() confirms this with a value of 0, the reciprocal condition number is LAPACK's way of asking "how far is this matrix from singular?" on a scale where 1 is perfectly invertible and 0 is hopelessly singular. solve() reports the exact elimination step where the zero pivot appeared (U[2,2] = 0), which is LAPACK-speak for "the second diagonal entry of the upper-triangular factor was zero."
coef(lm_fit) is the friendlier face of the same singular-matrix error. When lm() silently drops a predictor and returns NA, R has internally detected the exact collinearity that would have crashed solve(t(X) %*% X) had you called it yourself. Treat the NA as a warning, not a feature.Try it: Compute the numerical rank of X using qr(X)$rank and store it in ex_rank. The rank should be less than ncol(X), that's what "singular" means numerically.
Click to reveal solution
Explanation: qr() factors the matrix and reports its numerical rank, the count of linearly independent columns. Rank 2 on a 3-column matrix confirms one column is a combination of the others.
What kinds of R models quietly trigger a singular matrix?
The 3×3 toy above is easy to spot. Real bugs hide inside regression design matrices where a hidden collinearity only shows up a few rows deep. The situation most R users meet first is a linear model with a predictor that's a scaled copy of another, say, the same measurement in two units. lm() tries to mask it by dropping the redundant term and returning NA for its coefficient, but the underlying solve(t(X) %*% X) call still fails if you reach for it yourself.
Below, we simulate body-height data in both centimetres and metres. Both columns carry the same information (one is the other divided by 100), so the design matrix is guaranteed to be singular.
lm() prints an NA for height_m because it detected the perfect dependency during its QR-based fit and dropped the column. The NA is R's polite way of telling you "I couldn't separate these two predictors." If you skip lm() and compute the normal-equations solution by hand, you see the raw LAPACK message with a reciprocal condition number near $10^{-18}$, which is effectively zero in double-precision arithmetic.
Try it: Use alias(fit) to print the exact linear dependency between height_cm and height_m. The output shows which column was dropped and the coefficients of the dependency equation.
Click to reveal solution
Explanation: alias() reports each dropped column as a linear combination of the remaining ones. Here height_m = 0.01 * height_cm, which is just the unit conversion in numeric form.
How do I find the exact column causing the problem?
Knowing that a design matrix is singular is only half the job, you still want the name of the guilty column. QR decomposition does both things at once: it reports the numerical rank, and its pivot vector orders columns so that the independent ones come first and the redundant ones trail at the end. That means you can read off both "which columns to keep" and "which column is the culprit" in two lines.
We'll build a 5-column matrix where the fifth column is the sum of the first two, then ask qr() to locate it for us.
The rank is 4 on a 5-column matrix, which tells us exactly one column can be removed. The pivot splits naturally: the first four pivot positions name the keepers, and whatever sits past rank is the redundant column we need to drop. This works for any rectangular matrix, regardless of where the collinearity sits, qr() will always float the redundant columns to the back.
qr(X)$rank is the fastest rank check in base R. No package needed, and it works for matrices of any shape. If qr(X)$rank < ncol(X), your matrix is singular and the pivot vector names the problem columns.Try it: Drop the redundant column from X5 and verify that solve(t(X_fixed) %*% X_fixed) succeeds. Store the fixed matrix in ex_X_fixed.
Click to reveal solution
Explanation: Indexing with pivot[seq_len(rank)] keeps only the independent columns. The resulting 4×4 X'X is full rank, so solve() returns its inverse.
When should I use ridge regularisation versus a generalised inverse?
Dropping a column is the right fix when the redundancy is accidental, a duplicate, a unit-converted copy, a lagged version of the same series. It's the wrong fix when many predictors are each slightly correlated with each other, because now every column carries unique signal and you'd throw some of it away. Two alternatives keep all the columns: ridge regression, which adds a small penalty to the diagonal of $X^TX$ so it's always invertible, and the Moore–Penrose pseudoinverse, which uses the singular value decomposition to define a unique solution even when the matrix is exactly singular.
Ridge changes the normal equation from $\hat{\beta} = (X^TX)^{-1} X^T y$ to:
$$\hat{\beta}_{ridge} = (X^TX + \lambda I)^{-1} X^T y$$
Where:
- $X$ is the $n \times p$ predictor matrix
- $y$ is the length-$n$ response vector
- $\lambda$ is a small positive constant (the ridge penalty)
- $I$ is the $p \times p$ identity matrix
Adding $\lambda I$ lifts the diagonal just enough to push every eigenvalue away from zero, which is exactly what solve() needs. Below, we build a 60×6 matrix where columns a and b are almost identical, close enough that rcond() reports a value near $10^{-8}$, then compute both fixes and compare the coefficients.
The reciprocal condition number is so small that the naive solve() would refuse this matrix. Ridge splits the shared signal (true coefficient 1.5) roughly evenly between the near-duplicate columns a and b, about 0.75 each, and leaves the other coefficients near zero. MASS::ginv() does almost the same thing via SVD with no tuning parameter to set. On this extreme near-perfect collinearity the two methods agree to four decimal places.
a and b are nearly perfect copies, but when correlations are moderate (0.7–0.9) the two methods produce visibly different coefficients and ridge usually generalises better out of sample. Use cross-validation to pick lambda on real data, don't default to 0.1.Try it: Recompute the ridge coefficients with lambda <- 1.0 and store the result in ex_beta_ridge_big. Larger lambda means more shrinkage toward zero.
Click to reveal solution
Explanation: Increasing lambda pulls every coefficient closer to zero. The shrinkage is mild on the signal-carrying a and b but the effect grows as lambda rises, which is exactly how ridge trades a bit of bias for a lot of variance reduction.
How do I avoid the dummy-variable trap with factor-heavy models?
Categorical variables open a second route to singularity. If you expand a factor into one dummy column per level and also keep the intercept, the dummy columns sum to the intercept, a perfect linear dependency hiding in plain sight. R's model.matrix() knows this and automatically drops one level when you write ~ factor_name, but hand-built design matrices or ~ 0 + factor_name formulas bypass the drop and re-introduce the problem.
Let's build both versions on a 3-level factor and compare their ranks.
Both matrices have rank 3, but X_bad has 4 columns and X_good has 3. The fourth column in X_bad is redundant, adding groupA + groupB + groupC gives the intercept column exactly. Writing ~ group (without the 0 + prefix) tells model.matrix() to use level A as the baseline and skip its dummy, which is the standard way to parametrise a factor with an intercept.
lm(y ~ 0 + group) gives you one coefficient per level (their group means) instead of an intercept plus K-1 contrasts. The rank stays the same, and the interpretation flips from "differences from baseline" to "absolute level means."Try it: Build the design matrix for a two-factor interaction group * rep(c("pre","post"), 3) and check its rank with qr()$rank. Store the matrix in ex_X_inter.
Click to reveal solution
Explanation: group * time_fac expands to main effects plus the interaction. model.matrix() drops the baseline level for each factor, leaving six columns: intercept, two group contrasts, one time_fac contrast, and two interaction terms, all linearly independent.
Practice Exercises
Exercise 1: Build a diagnose-and-fix wrapper
Write a function my_diagnose(X) that takes a numeric matrix, checks whether it's singular, finds the redundant columns if any, and returns a list with three fields: rank, redundant_cols (names of columns to drop), and X_fixed (the matrix with redundant columns removed). Test it on a matrix where col3 = col1 + col2.
Click to reveal solution
Explanation: qr() returns the numerical rank and a pivot vector that sorts columns with the independent ones first. Everything after position rank is redundant and gets dropped. Using drop = FALSE protects against the single-column edge case.
Exercise 2: Closed-form ridge regression from scratch
Write a function my_ridge(X, y, lambda) that implements ridge regression via the closed-form formula $\hat{\beta} = (X^TX + \lambda I)^{-1} X^T y$ and returns the coefficient vector. Compare your output against MASS::lm.ridge(y ~ X - 1, lambda = lambda) on the X_near dataset from the ridge section.
Click to reveal solution
Explanation: The formula is a direct translation of the math. Wrapping the result in drop() converts the one-column matrix to a named numeric vector, which matches the output format readers expect from base R regression helpers.
Exercise 3: Build a safe_solve() wrapper
Write safe_solve(A) that tries solve(A) normally, but falls back to MASS::ginv(A) with a warning if the reciprocal condition number is below 1e-12. The function should always return an inverse (or pseudoinverse), never throw an error.
Click to reveal solution
Explanation: rcond() costs almost nothing to compute, so the early-exit check adds no meaningful overhead. The fallback gives you a usable pseudoinverse whenever the matrix would otherwise crash solve(), which is the behaviour you usually want in a long-running pipeline.
Complete Example: Full diagnose-then-fix workflow
Let's walk through an end-to-end case that mirrors what you'd do on real data. We'll simulate 100 observations with 6 predictors where two predictors are near-copies of a third, detect the problem with rcond(), name the offender with qr(), and fix it two ways: drop-and-solve for the exact redundancy, and ridge for the remaining near-collinearity.
The workflow is deliberately layered. rcond() at step 2 flags the matrix as singular (reciprocal condition number near $10^{-18}$). qr() at step 3 names x3 as the exact-duplicate offender. Dropping x3 lifts rcond() by fourteen orders of magnitude, but the matrix is still mildly ill-conditioned because x4 remains nearly identical to x1. A ridge penalty absorbs that residual near-singularity and returns coefficients that split the shared signal between x1 and x4 (each around 0.99, summing to the true value of 2) while recovering x2's true coefficient of -1.5 cleanly.
Summary
| Symptom | Diagnosis | Fix |
|---|---|---|
Error in solve.default: system is computationally singular |
det(X) == 0; rcond(X) near 0 |
Diagnose with qr() before fixing |
NA coefficient in lm() output |
alias(fit) shows the dependency |
Drop the duplicated column |
| Redundant column after feature engineering | qr(X)$pivot[(rank+1):ncol(X)] names it |
Drop via X[, qr(X)$pivot[seq_len(rank)]] |
| Many mildly correlated predictors | kappa(X) very large, rcond() small |
Ridge: solve(X'X + lambda*I) |
| p > n matrix | Rank capped at n | MASS::ginv() or ridge + cross-validation |
| Dummy variable trap | qr()$rank < ncol on factor design |
Use ~ factor (not ~ 0 + factor + intercept) |
References
- R Core Team,
?solve,?rcond,?qr. R documentation - Venables, W. N. & Ripley, B. D., Modern Applied Statistics with S, 4th ed. MASS package reference, especially
ginv(). CRAN - Hastie, T., Tibshirani, R. & Friedman, J., The Elements of Statistical Learning, 2nd ed. Chapter 3: Ridge Regression. Free PDF
- Wickham, H., Advanced R, 2nd ed. Numerical linear algebra notes. adv-r.hadley.nz
- Wikipedia, Condition number of a matrix. Link
- Wikipedia, Moore–Penrose inverse. Link
- Statistics Globe, "R Error in solve.default: system is exactly singular." Link
Continue Learning
- R Common Errors, the full reference of R errors, with quick fixes for the most frequent crashes.
- Linear Regression in R, the method where singular design matrices appear most often; includes diagnostic plots and assumption checks.
- Ridge and Lasso Regression in R, a deeper treatment of regularisation, including how to tune
lambdawith cross-validation.