Ridge & Lasso Exercises in R: 8 Regularization Practice Problems, Solved Step-by-Step
These 8 ridge and lasso exercises in R take you from a first glmnet() fit through cross-validated lambda selection, elastic-net tuning, and a sparse-signal simulation that shows how well lasso recovers the true model. Every problem includes a runnable starter, a hint, and a click-to-reveal solution with explanation.
How do you run your first Ridge or Lasso fit in R?
The glmnet package handles ridge, lasso, and elastic net through one function, switched by the alpha argument. It expects a numeric predictor matrix and a numeric response vector, not a formula. Get those two shapes right and the fit is a one-liner. Here is a first lasso fit on the classic Boston housing data so you can see the zeroed coefficients that make lasso famous.
Four predictors, zn, indus, age, and rad, are pinned to exactly zero at lambda 0.5. The nine survivors are the variables the L1 penalty thinks carry genuine signal. Flip alpha to 0 and the same call becomes ridge, which shrinks every coefficient but zeroes none.
glmnet and cv.glmnet blocks are read-only on this page. Every block is copy-paste ready for your own R session. Install with install.packages("glmnet"). The #> lines show the output you will see locally.Try it: Refit the same matrix with alpha = 0 (ridge) and count the non-zero coefficients at s = 0.5. Ridge should never zero any coefficient.
Click to reveal solution
Explanation: Ridge's L2 penalty is smooth, so no coefficient lands on exactly zero. Every predictor plus the intercept stays in the model, which is why Df in a ridge fit is always equal to the number of columns in x.
How do you pick the right lambda with cv.glmnet()?
Picking lambda by eye is guesswork. cv.glmnet() runs K-fold cross-validation across the lambda path and hands you two values: the error-minimising lambda.min, and the more conservative lambda.1se that is still within one standard error of the minimum. Most exercises below lean on one of these two numbers, so set the pattern first.
lambda.1se keeps six predictors: chas, nox, rm, dis, ptratio, black, and lstat. That is the model you report when you want a simpler story and a model that is less tuned to folds you happened to draw.
set.seed() can return different lambdas. Seeded folds make your exercise solutions line up with the ones shown here.Try it: Run cv.glmnet with alpha = 0 (ridge) on the same matrix. Is ridge's lambda.1se larger or smaller than lasso's 0.3177?
Click to reveal solution
Explanation: Ridge needs a bigger lambda because its penalty is squared, so the per-coefficient pull at a given lambda is weaker than lasso's. The two lambdas are not directly comparable across methods; always read them inside their own fit.
Practice Exercises
Each capstone below uses the Boston objects x, y, lasso_fit, or cv_lasso built in the setup sections, unless it introduces its own simulated data. Every exercise has distinct ex{N}_ prefixes so running the solutions does not pollute your working state.
Exercise 1: Ridge coefficient comparison across lambdas
Fit ridge on Boston (alpha = 0). Extract the coefficient of rm (rooms per dwelling) at s = 0.1 and at s = 10. Report the two values and which lambda produces the larger magnitude.
Click to reveal solution
Explanation: At the smaller lambda, ridge is close to OLS and the rm coefficient is near the unpenalised value. At the larger lambda the penalty dominates the loss and every coefficient is pulled toward zero, so rm shrinks by an order of magnitude. Ridge moves smoothly between these two regimes rather than dropping variables.
Exercise 2: Lasso variable list at target sparsity
Find the largest lambda on lasso_fit$lambda that keeps exactly five non-zero coefficients (not counting the intercept). Save the five predictor names to ex2_five and print them.
Click to reveal solution
Explanation: We want the weakest possible regularisation, the largest lambda, that still holds sparsity at five. Picking the largest such lambda gives you the most stable five-predictor model, because any smaller lambda would admit a sixth variable. These five are the predictors the literature has long called the dominant drivers of Boston home values.
Exercise 3: lambda.min vs lambda.1se on mtcars
Move to the mtcars dataset. Build a predictor matrix with model.matrix(mpg ~ ., mtcars)[, -1]. Run cv.glmnet with alpha = 1 and set.seed(3). Report the coefficient count at both lambdas and the predicted mpg for the first row under each lambda.
Click to reveal solution
Explanation: lambda.min keeps seven predictors and predicts closer to the actual 21.0; lambda.1se keeps only three (intercept plus two) and lands nearly on the actual. Simpler models often win on small datasets like mtcars (32 rows), where lambda.min can overfit.
Exercise 4: Elastic Net alpha tuning
Sweep alpha across c(0, 0.25, 0.5, 0.75, 1) and run cv.glmnet(x, y, alpha = a, nfolds = 10) for each, with set.seed(11) before every call. Report the minimum CV error per alpha and the winning alpha.
Click to reveal solution
Explanation: Pure ridge (alpha 0) loses because it cannot drop the weakest predictors. Pure lasso (alpha 1) sometimes over-prunes correlated groups. Alpha 0.5 wins on this split because it keeps the dominant predictors but still lets several weak ones go. The grid search is the standard way to tune elastic net when you do not have a strong prior on the mix.
Exercise 5: Train/test RMSE showdown
Split Boston 70/30 with set.seed(2026). Fit ridge and lasso with cv.glmnet on the training 70%. Predict on the held-out 30%. Save both RMSEs to a named vector ex5_rmse and name the winner.
Click to reveal solution
Explanation: Lasso edges out ridge by about 0.03 RMSE, a hair of a win. It comes from lasso dropping two weak predictors that would otherwise have added test-set noise. On most splits of Boston the two methods are within a standard error of each other, which is a real-world lesson: the choice between ridge and lasso is often decided by interpretability, not pure accuracy.
Exercise 6: Sparse signal recovery simulation
Simulate 200 observations with 15 predictors, where only the first three carry true effect (betas 2, -1.5, 1) and the rest are pure noise. Fit cross-validated lasso and, at lambda.1se, count true positives (non-zero coefs among the first three) and false positives (non-zero coefs among the remaining twelve).
Click to reveal solution
Explanation: Lasso recovered all three true predictors (TP = 3) and kept one noise predictor by accident (FP = 1). This kind of simulation is the cleanest way to judge a selection method: you know the ground truth, so you can count the errors directly. Under more noise (larger sd) or smaller n, expect false positives and false negatives to rise.
Exercise 7: First-entry lambda for a predictor
On the Boston lasso_fit, find the largest lambda at which the rm coefficient is still zero. In other words, the step right before rm enters the model. Compare it to the lambda at which lstat enters. Which predictor enters first as lambda shrinks?
Click to reveal solution
Explanation: lstat has the larger "last zero" lambda, which means it enters the model first as you walk lambda down from huge to small. rm follows close behind. That entry order mirrors the variable-importance ranking that stepwise selection would give on this data, and it is why a two-predictor lasso model usually picks exactly these two.
Exercise 8: Relaxed Lasso (select with lasso, refit with OLS)
Use cv.glmnet with alpha = 1 on a 70% train split of Boston. Extract the non-zero predictor names at lambda.1se. Refit plain lm() on just those columns of the training data. Compare the test RMSE of the OLS refit to the test RMSE of the original lasso predictions.
Click to reveal solution
Explanation: Relaxed lasso drops the shrinkage on the selected coefficients and refits them with unbiased OLS. Here it beats plain lasso on test RMSE by about 0.23, a typical win when the selected variables are truly useful. Use this pattern when you trust lasso's selection but want stronger coefficient estimates for interpretation.
Complete Example
Put the moves from all eight exercises into one end-to-end run on a simulated dataset with a known sparse truth. Only the first five of twenty predictors carry effect; the rest are noise. A good cross-validated lasso should recover them and give a test RMSE close to the irreducible noise.
Lasso recovered exactly the five true predictors and dropped all fifteen noise predictors. The RMSE of 1.04 sits close to the irreducible noise standard deviation of 1, meaning the model is nearly as good as the oracle that knows the true coefficients. Repeat this simulation with sd = 2 and you will see lasso start missing the smallest beta (0.8); the signal-to-noise ratio is what determines recovery, not the sample size alone.
Summary
| # | Exercise | Key move | Difficulty |
|---|---|---|---|
| 1 | Ridge coef across lambdas | coef(fit, s = ...) shrinkage |
Medium |
| 2 | Lasso at target sparsity | Scan fit$lambda, pick endpoint |
Medium |
| 3 | mtcars CV: min vs 1se | model.matrix, two-lambda compare |
Medium |
| 4 | Elastic net alpha tuning | sapply over alpha grid |
Hard |
| 5 | Train/test RMSE showdown | Hold-out predict + RMSE |
Medium |
| 6 | Sparse signal recovery | Ground-truth simulation | Hard |
| 7 | First-entry lambda | Full path scan for a variable | Hard |
| 8 | Relaxed lasso | Select with lasso, refit with OLS | Hard |
Five moves to carry forward:
- Build a numeric matrix with
model.matrix(formula, data)[, -1]. - Fit the path with
glmnet(x, y, alpha = ...)and the CV withcv.glmnet(). - Extract coefficients with
coef(fit, s = "lambda.min")ors = "lambda.1se". - Predict with
predict(fit, newx = ..., s = ...)(never plug a data.frame in). - Always
set.seed()before anycv.glmnet()call.
References
- glmnet package documentation, Stanford Statistics. Link
- Hastie, T., Tibshirani, R., Friedman, J. The Elements of Statistical Learning, 2nd ed. Chapter 3.4: Shrinkage Methods. Link
- Tibshirani, R. Regression Shrinkage and Selection via the Lasso. JRSS Series B (1996). Link
- Zou, H., Hastie, T. Regularization and Variable Selection via the Elastic Net. JRSS Series B (2005). Link
- James, G., Witten, D., Hastie, T., Tibshirani, R. An Introduction to Statistical Learning, 2nd ed. Chapter 6.2: Shrinkage Methods. Link
- glmnet CRAN reference manual. Link
Continue Learning
- Ridge and Lasso Regression in R is the explainer these exercises drill. Read it first if any concept above felt unfamiliar.
- Linear Regression is the OLS baseline that ridge and lasso improve on. Understanding the unpenalised fit makes the shrinkage story concrete.
- Multicollinearity in R covers the problem ridge was designed to solve. Read it if your regression coefficients flip signs or have large standard errors.