SEM Exercises in R: 8 lavaan Path Model Practice Problems, Solved Step-by-Step
These 8 SEM exercises in R take you from your first lavaan path model on the bundled PoliticalDemocracy data through standardized estimates, fit indices, a hybrid measurement-plus-structural model, mediation with indirect effects, bootstrap confidence intervals, modification indices, and a chi-square nested-model test. Every problem is solved step by step with runnable R code and a click-to-reveal explanation.
How do you specify and fit a path model with lavaan?
lavaan turns a path diagram into a small string of model syntax: ~ for regression arrows, =~ for measurement (factor loading) arrows, and ~~ for variances and covariances. Once you write the model, sem() fits it and summary() prints the estimates. We start with a two-equation regression-style path model on PoliticalDemocracy so you can see the syntax-to-output round trip end to end before any latent variables get involved.
Two equations, two sets of coefficients. Industrialization in 1960 (x1) raises both democracy in 1960 (y1 ~ x1 = 1.47) and democracy in 1965 (y5 ~ x1 = 0.75), and lagged democracy strongly predicts current democracy (y5 ~ y1 = 0.62). The standardized column on the right (Std.all) puts every coefficient on a comparable scale, which is the version you usually want when comparing predictor importance.
Working with parameterEstimates() rather than reading the printed summary() output is what lets you script post-hoc analyses: filter by op to isolate regressions (~), variances (~~), or factor loadings (=~), and extract est, se, or pvalue directly into downstream code.
~ says "regress on", =~ says "is measured by", and ~~ says "covaries with". A path diagram with five arrows becomes a five-line string. Every fancy SEM later in this article is a longer combination of those same three pieces.sem() for path models and cfa() for measurement-only models. Both wrap lavaan() with sensible defaults: sem() fixes the first loading per latent variable to 1 and freely estimates exogenous covariances; cfa() does the same but is documented for measurement models. They produce identical fits when the syntax matches.Try it: From pe_basic, extract the unstandardized regression coefficient on x1 for the y5 equation and save it to ex_coef. One subset is enough.
Click to reveal solution
Explanation: Three logical conditions pin down the exact row: outcome y5, operator ~ (regression), predictor x1. The $est column then returns the point estimate. This is the same value as the Estimate printed by summary(), just grabbed programmatically.
How do you judge whether a lavaan model actually fits?
A path model returns coefficients no matter what, but only fit indices tell you whether the model's implied covariance matrix actually matches the data covariance matrix. lavaan reports chi-square, CFI, TLI, RMSEA, and SRMR. Knowing which to trust and the standard cutoffs (CFI and TLI ≥ 0.95, RMSEA ≤ 0.06, SRMR ≤ 0.08) is what separates a defensible SEM from a number salad.
The basic two-equation model is just-identified (df = 0): it has exactly as many parameters as covariances, so it fits the data perfectly by construction. CFI = 1, RMSEA = 0, and chi-square = 0 are diagnostic only when df > 0. The job of fit indices begins once you start imposing constraints, which is what every later exercise does.
With df = 0 there is nothing to test. In Exercises 3 and 4 below you will fit over-identified models where the chi-square actually carries information, and where CFI and RMSEA are the primary indices to lean on (chi-square rejects too eagerly when N is large).
Try it: From fit_idx, pull just the CFI value (single scalar) and save it to ex_cfi.
Click to reveal solution
Explanation: Double brackets on a named numeric vector return the unnamed scalar, which is exactly what downstream code (e.g., comparison tables) usually wants. Single brackets fit_idx["cfi"] would return a length-1 named vector, which sometimes trips up if checks.
Practice Exercises
The eight problems below progress from a basic two-equation path model to bootstrap CIs, modification-index respecification, and a nested-model chi-square test. Variables are prefixed my_ to avoid colliding with the tutorial state above.
Exercise 1: Specify a basic two-equation path model
Fit a path model on PoliticalDemocracy with y1 ~ x1 and y5 ~ x1 + y1. Print the unstandardized coefficient on y1 from the y5 equation. Expected: roughly 0.62.
Click to reveal solution
Explanation: The two regression lines compose into one structural model. parameterEstimates() returns every parameter; subsetting by lhs, op, and rhs isolates the lagged-democracy coefficient, which is the autoregressive effect of 1960 democracy on 1965 democracy after controlling for industrialization.
Exercise 2: Pull a standardized coefficient
Using my_fit1 from Exercise 1, get the standardized (std.all) estimate for y5 ~ y1. Use standardizedSolution() rather than reading from summary().
Click to reveal solution
Explanation: standardizedSolution() rescales every variable to unit variance, so coefficients become comparable across predictors with different units. The standardized coefficient 0.63 means a one-SD increase in 1960 democracy raises 1965 democracy by 0.63 SD, holding industrialization constant.
Exercise 3: Compute global fit indices
Fit a constrained version of the path model that drops y5 ~ x1 (forcing only an indirect path from x1 through y1). This makes the model over-identified with df > 0. Report CFI, RMSEA, and SRMR. Does it pass Hu-Bentler cutoffs (CFI ≥ 0.95, RMSEA ≤ 0.06, SRMR ≤ 0.08)?
Click to reveal solution
Explanation: Removing the direct path y5 ~ x1 costs the model badly: CFI 0.87 (well below 0.95), RMSEA 0.39 (way above 0.06), SRMR 0.12 (above 0.08). The chi-square is also significant. All four indices say the constrained model misfits, which is evidence that industrialization has a direct effect on 1965 democracy on top of its indirect path through 1960 democracy.
Exercise 4: Add a measurement model and fit a hybrid SEM
Build a hybrid SEM with two latent variables: ind60 measured by x1, x2, x3 and dem60 measured by y1, y2, y3, y4, plus a structural path dem60 ~ ind60. Fit it, then print CFI and RMSEA.
Click to reveal solution
Explanation: The latent variables ind60 and dem60 are inferred from their indicators using the =~ operator. CFI 0.97 and SRMR 0.05 look good; RMSEA 0.11 is high (the cutoff is 0.06), which is a hint there is residual misfit somewhere, exactly the situation Exercise 7 will diagnose with modification indices.
Exercise 5: Test indirect effects (mediation)
Simulate a small mediation dataset and fit M ~ a*X; Y ~ b*M + c*X; ab := a*b. Print the indirect effect ab. The := operator defines a new parameter as a function of others, which is how lavaan tests indirect effects.
Click to reveal solution
Explanation: Labels a, b, c give names to specific paths. The ab := a*b line creates a defined parameter equal to the product, which is the indirect effect of X on Y through M. lavaan reports its estimate, standard error (delta-method by default), and p-value alongside the regular parameters.
Exercise 6: Bootstrap CI for the indirect effect
Refit the mediation model from Exercise 5 with se = "bootstrap" and bootstrap = 200. Use parameterEstimates(..., boot.ci.type = "perc") to extract the 95% percentile bootstrap CI on ab. Bootstrap CIs are recommended over delta-method SEs for indirect effects because the sampling distribution of a product is rarely normal.
Click to reveal solution
Explanation: With 200 bootstrap resamples, the 95% percentile CI on the indirect effect lies entirely above zero (0.11, 0.31), supporting a non-zero mediation effect. In real analyses you would use 1000-5000 resamples; 200 keeps this exercise fast. The boot.ci.type = "perc" argument selects the percentile CI; bias-corrected ("bca.simple") is also available.
Exercise 7: Use modification indices to respecify
The hybrid SEM in Exercise 4 had RMSEA 0.105, above the 0.06 cutoff. Call modificationIndices() on my_fit_hyb, find the largest mi value among residual covariances (op == "~~"), free that one parameter, refit, and compare CFI before and after.
Click to reveal solution
Explanation: modificationIndices() reports, for every fixed parameter, the chi-square drop (mi) you would get by freeing it and the expected parameter change (epc) it would take. Freeing y1 ~~ y3 (a likely measurement-error correlation between two democracy items) lifts CFI from 0.97 to 0.99 and pushes RMSEA from 0.11 down to 0.09. Always justify modifications theoretically; chasing the largest mi blindly is how spurious models get published.
Exercise 8: Compare nested models with anova()
Fit two versions of the hybrid SEM: a constrained version where the structural path dem60 ~ ind60 is fixed to 0, and the unconstrained version (Exercise 4). Run anova(constrained, unconstrained) for the chi-square difference test and decide which model wins.
Click to reveal solution
Explanation: The constrained model adds 1 df by fixing one path to 0 and incurs a chi-square increase of 28.5, which is highly significant (p < 0.001 against a chi-square distribution with 1 df). The constraint is rejected, so the unconstrained model wins: industrialization (ind60) does have a non-zero effect on democracy (dem60). This is the SEM analogue of comparing nested regression models with an F-test.
Complete Example
The mini-study below ties the eight pieces together: fit a hybrid SEM with a measurement model and a structural path, judge global fit, pull standardized estimates, and report the structural coefficient with its 95% Wald CI.
CFI = 0.95 just clears the cutoff; RMSEA = 0.10 still flags residual misfit (modification indices would point to correlated errors among the democracy indicators, which is the standard fix in the literature). The standardized coefficients are clean: industrialization in 1960 has moderate direct effects on both democracy waves (0.45 and 0.18), and 1960 democracy strongly predicts 1965 democracy (0.88) net of industrialization.
standardizedSolution() returns ci.lower and ci.upper by default. Reviewers expect the standardized point estimate plus interval rather than just the unstandardized coefficient and p-value, especially when predictors are on different measurement scales.Summary
| # | Exercise | Key function | What it teaches |
|---|---|---|---|
| 1 | Two-equation path model | sem() + parameterEstimates() |
Specify regressions with ~, fit, extract coefficients programmatically |
| 2 | Standardized coefficient | standardizedSolution() |
Rescale to unit variance for cross-predictor comparison |
| 3 | Global fit on a constrained model | fitMeasures() |
Compute CFI, RMSEA, SRMR; apply Hu-Bentler cutoffs |
| 4 | Hybrid measurement + structural model | =~ operator + sem() |
Combine latent variables and structural paths in one model |
| 5 | Indirect effect with := |
Defined parameter + label syntax | Test mediation via the product of two paths |
| 6 | Bootstrap CI on indirect effect | se = "bootstrap" + boot.ci.type |
Build CIs that don't rely on normality of the product |
| 7 | Modification indices | modificationIndices() |
Identify and (carefully) free the parameter most hurting fit |
| 8 | Nested-model chi-square test | anova() on two sem() fits |
Decide whether a constraint is statistically defensible |
References
- Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36. Link
- lavaan tutorial, official package tutorial covering syntax and worked examples. Link
- lavaan documentation, function reference for
sem(),cfa(),fitMeasures(), and friends. Link - Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis. Structural Equation Modeling, 6(1), 1-55.
- Kline, R. B. (2023). Principles and Practice of Structural Equation Modeling (5th ed.). Guilford Press.
- Bollen, K. A. (1989). Structural Equations with Latent Variables. Wiley. (Source of the PoliticalDemocracy dataset.)
- UCLA Statistical Consulting, Introduction to SEM with lavaan. Link
- MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004). Confidence Limits for the Indirect Effect: Distribution of the Product and Resampling Methods. Multivariate Behavioral Research, 39(1), 99-128.
Continue Learning
- SEM and CFA in R With lavaan: From Path Diagram to Fit Statistics, the full conceptual walkthrough of lavaan model syntax, identification, and interpretation that these exercises drill on.
- SEM Fit Indices in R: CFI, RMSEA, SRMR, What Counts as Good Fit?, deeper dive on how each fit index is computed, when it lies, and which to report together.
- Factor Analysis in R, companion piece on EFA and CFA that pairs naturally with the measurement-model half of any SEM.