SEM Exercises in R: 16 lavaan Practice Problems
These 16 structural equation modeling exercises in R take you through the full lavaan workflow on bundled datasets: writing model syntax, fitting CFA and full SEMs, reading fit indices, interpreting modification indices, estimating indirect effects with bootstrap intervals, testing measurement invariance across groups, and handling ordered-categorical indicators. Solutions are hidden behind a click and explained.
How are these problems structured?
Every problem gives you a task, the exact output your solution must reproduce, and a difficulty tag. You write code in the Your turn block, then expand the solution to verify your approach and read why it works. Code blocks share state across the page like notebook cells, so you only need to load packages once.
The two datasets used throughout are bundled with lavaan itself: PoliticalDemocracy (75 nations, eight indicators of industrialization and political democracy in 1960 and 1965) and HolzingerSwineford1939 (301 schoolchildren, nine cognitive tests grouped into visual, textual, and speed factors).
Section 1. Specifying and fitting your first lavaan model (3 problems)
Exercise 1.1: Fit a two-equation path model on PoliticalDemocracy
Task: A political scientist wants to confirm that 1960 industrialization (x1) predicts 1960 democracy (y1), which in turn predicts 1965 democracy (y5). Use lavaan::sem() to fit a two-equation path model with y1 regressed on x1, and y5 regressed on y1. Save the fitted object to ex_1_1 and print summary(ex_1_1) so the parameter estimates appear.
Expected result:
lavaan 0.6-21 ended normally after 1 iteration
Estimator ML
Optimization method NLMINB
Number of model parameters 4
Number of observations 75
Model Test User Model:
Test statistic 17.295
Degrees of freedom 1
P-value (Chi-square) 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|)
y1 ~
x1 1.367 0.382 3.580 0.000
y5 ~
y1 0.736 0.077 9.500 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.y1 5.796 0.947 6.124 0.000
.y5 3.057 0.499 6.124 0.000
Difficulty: Beginner
A path model is just a set of regression equations stacked together, where one outcome becomes the predictor of the next.
Write a model string with y1 ~ x1 and y5 ~ y1, then pass it to sem() with data = PoliticalDemocracy.
Click to reveal solution
Explanation: The ~ operator means "regressed on" in lavaan syntax, exactly as in base R formulas. With two equations and three observed variables, the model is just-identified (df = 0), so the chi-square is forced to zero and no fit can be assessed. Just-identified path models always reproduce the covariances perfectly, which is why interesting SEMs add latent variables or constraints to push degrees of freedom above zero.
Exercise 1.2: Specify a three-indicator measurement model with the =~ operator
Task: Define a one-factor measurement model where a latent visual factor is measured by three indicators (x1, x2, x3) from HolzingerSwineford1939. Use cfa() to fit it and save the result to ex_1_2. The =~ operator reads as "is measured by" and tells lavaan to estimate factor loadings.
Expected result:
lavaan 0.6-21 ended normally after 23 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 6
Number of observations 301
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
visual =~
x1 1.000
x2 0.778 0.141 5.532 0.000
x3 1.107 0.214 5.173 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.x1 0.835 0.118 7.064 0.000
.x2 1.065 0.105 10.177 0.000
.x3 0.633 0.129 4.899 0.000
visual 0.524 0.130 4.021 0.000
Difficulty: Beginner
A latent factor has no column in the data; you define it by listing the observed indicators that reflect it.
Build a model string using the =~ operator (visual =~ x1 + x2 + x3) and fit it with cfa().
Click to reveal solution
Explanation: The first indicator's loading is fixed to 1 by default ("marker variable" identification), which gives the latent factor a scale. Without this constraint the factor would be unidentified because you can rescale the latent variable and absorb the change into the loadings. With three indicators the model has 6 unique covariances and 6 free parameters (2 loadings + 3 residuals + 1 factor variance), so df = 0 and fit is perfect by construction.
Exercise 1.3: Free a residual covariance with the ~~ operator
Task: Returning to the three-indicator visual factor on HolzingerSwineford1939, suppose x1 and x3 share a residual correlation because both are timed tasks. Modify the model from Exercise 1.2 to also estimate x1 ~~ x3 (free residual covariance) and save the refit to ex_1_3. Use parameterEstimates(ex_1_3) to confirm the new parameter.
Expected result:
Warning: lavaan->lav_model_vcov():
Could not compute standard errors! The information matrix could not be
inverted. This may be a symptom that the model is not identified.
lhs op rhs est se z pvalue ci.lower ci.upper
1 visual =~ x1 1.000 0 NA NA 1 1
2 visual =~ x2 0.636 NA NA NA NA NA
3 visual =~ x3 1.107 NA NA NA NA NA
4 x1 ~~ x3 -0.130 NA NA NA NA NA
5 x1 ~~ x1 0.717 NA NA NA NA NA
6 x2 ~~ x2 1.123 NA NA NA NA NA
7 x3 ~~ x3 0.489 NA NA NA NA NA
8 visual ~~ visual 0.641 NA NA NA NA NA
Difficulty: Intermediate
Two indicators sharing a cause beyond the factor need their leftover association estimated rather than assumed to be zero.
Add an x1 ~~ x3 line to the Exercise 1.2 model, refit with cfa(), and check the new row in parameterEstimates().
Click to reveal solution
Explanation: The ~~ operator covers both variances (x1 ~~ x1) and covariances (x1 ~~ x3). lavaan estimates every indicator's residual variance automatically, so you only write ~~ lines when you want to free an off-diagonal residual covariance the default model fixed to zero. With this new parameter the model becomes unidentified for a three-indicator factor (df would go negative), so in practice you would need either four indicators or an equality constraint elsewhere to test the residual covariance.
Section 2. Confirmatory factor analysis (3 problems)
Exercise 2.1: Fit a classic three-factor CFA on HolzingerSwineford1939
Task: The canonical Holzinger-Swineford CFA defines three correlated factors: visual measured by x1, x2, x3; textual by x4, x5, x6; and speed by x7, x8, x9. Specify this model, fit it with cfa(), and save the fitted object to ex_2_1. Print only the fit statistics line of summary(ex_2_1, fit.measures = TRUE) (the Test User Model block).
Expected result:
lavaan 0.6-21 ended normally after 35 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 21
Number of observations 301
Model Test User Model:
Test statistic 85.306
Degrees of freedom 24
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 918.852
Degrees of freedom 36
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.931
Tucker-Lewis Index (TLI) 0.896
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3737.745
Loglikelihood unrestricted model (H1) -3695.092
Akaike (AIC) 7517.490
Bayesian (BIC) 7595.339
Sample-size adjusted Bayesian (SABIC) 7528.739
Root Mean Square Error of Approximation:
RMSEA 0.092
90 Percent confidence interval - lower 0.071
90 Percent confidence interval - upper 0.114
P-value H_0: RMSEA <= 0.050 0.001
P-value H_0: RMSEA >= 0.080 0.840
Standardized Root Mean Square Residual:
SRMR 0.065
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
visual =~
x1 1.000
x2 0.554 0.100 5.554 0.000
x3 0.729 0.109 6.685 0.000
textual =~
x4 1.000
x5 1.113 0.065 17.014 0.000
x6 0.926 0.055 16.703 0.000
speed =~
x7 1.000
x8 1.180 0.165 7.152 0.000
x9 1.082 0.151 7.155 0.000
Covariances:
Estimate Std.Err z-value P(>|z|)
visual ~~
textual 0.408 0.074 5.552 0.000
speed 0.262 0.056 4.660 0.000
textual ~~
speed 0.173 0.049 3.518 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.x1 0.549 0.114 4.833 0.000
.x2 1.134 0.102 11.146 0.000
.x3 0.844 0.091 9.317 0.000
.x4 0.371 0.048 7.779 0.000
.x5 0.446 0.058 7.642 0.000
.x6 0.356 0.043 8.277 0.000
.x7 0.799 0.081 9.823 0.000
.x8 0.488 0.074 6.573 0.000
.x9 0.566 0.071 8.003 0.000
visual 0.809 0.145 5.564 0.000
textual 0.979 0.112 8.737 0.000
speed 0.384 0.086 4.451 0.000
Difficulty: Intermediate
Each factor gets its own line listing its three indicators, and the factors are left free to correlate.
Write three =~ lines (visual, textual, speed), fit with cfa(), and read the Test User Model block from summary(..., fit.measures = TRUE).
Click to reveal solution
Explanation: With three factors and three indicators each, lavaan estimates 6 free factor loadings (one per factor is fixed at 1), 9 residual variances, 3 factor variances, and 3 factor covariances, for 21 parameters against 45 unique covariances, giving df = 24. The chi-square is significant, which technically rejects exact fit, but with n = 301 the test is over-powered. That is why we look at CFI, TLI, RMSEA, and SRMR before judging the model, which Exercise 3.1 covers.
Exercise 2.2: Extract standardized loadings with the std.all column
Task: Using ex_2_1 from the previous exercise, pull only the factor loadings (rows where op == "=~") with their standardized values (std.all). Save the resulting data frame to ex_2_2. Standardized loadings are easier to interpret than raw because all variables are on the same scale.
Expected result:
#> lhs op rhs est std.all
#> 1 visual =~ x1 1.000 0.772
#> 2 visual =~ x2 0.554 0.424
#> 3 visual =~ x3 0.729 0.581
#> 4 textual =~ x4 1.000 0.852
#> 5 textual =~ x5 1.113 0.855
#> 6 textual =~ x6 0.926 0.838
#> 7 speed =~ x7 1.000 0.570
#> 8 speed =~ x8 1.180 0.723
#> 9 speed =~ x9 1.082 0.665
Difficulty: Intermediate
Keep only the rows that describe how indicators load on factors, and read the column where every variable sits on a common scale.
Call parameterEstimates() with standardized = TRUE, subset to op == "=~", and select the std.all column.
Click to reveal solution
Explanation: parameterEstimates() with standardized = TRUE adds three new columns: std.lv (latent variable standardized), std.all (both latent and observed standardized), and std.nox (everything except exogenous covariates standardized). std.all is the right column for reading loadings as correlations between indicator and factor. Anything above ~.5 is a reasonable loading; values below ~.3 suggest the indicator is barely tapping the factor.
Exercise 2.3: Inspect estimates with the standardizedSolution helper
Task: Instead of slicing parameterEstimates() by hand, use standardizedSolution(ex_2_1) to get a clean table of standardized estimates with confidence intervals. Filter the result to only the factor covariance rows (where both sides are latent factors and op == "~~") and save it to ex_2_3. This gives you the inter-factor correlations.
Expected result:
lhs op rhs est.std se z pvalue ci.lower ci.upper
22 visual ~~ textual 0.459 0.064 7.189 0 0.334 0.584
23 visual ~~ speed 0.471 0.073 6.461 0 0.328 0.613
24 textual ~~ speed 0.283 0.069 4.117 0 0.148 0.418
Difficulty: Intermediate
You want the standardized associations between the factors themselves, not the indicator loadings.
Run standardizedSolution() and keep op == "~~" rows where both lhs and rhs are latent factor names and differ.
Click to reveal solution
Explanation: standardizedSolution() reports the standardized estimates directly with proper standard errors via the delta method, plus 95% confidence intervals out of the box. The factor correlations here (.46, .47, .28) show that visual and textual share moderate variance, and so do visual and speed, but textual and speed are weaker. None hit .85, so discriminant validity is supported and the three-factor structure makes sense.
Section 3. Model fit and respecification (3 problems)
Exercise 3.1: Pull CFI, TLI, RMSEA, and SRMR from fitMeasures
Task: Global fit of a CFA is judged by a handful of indices. Pull just cfi, tli, rmsea, and srmr from ex_2_1 using fitMeasures() with the index names passed as a character vector. Save the named numeric to ex_3_1 and round it to three decimals. Common thresholds for acceptable fit are CFI/TLI > .90, RMSEA < .08, SRMR < .08.
Expected result:
#> cfi tli rmsea srmr
#> 0.931 0.896 0.092 0.065
Difficulty: Intermediate
Global fit rests on a small set of named indices rather than on the chi-square alone.
Pass the vector c("cfi", "tli", "rmsea", "srmr") to fitMeasures() and wrap the result in round(..., 3).
Click to reveal solution
Explanation: CFI = .93 and SRMR = .065 look fine, but RMSEA = .092 sits just over the .08 mark and TLI = .896 is borderline. With n = 301 this often signals a minor misspecification that modification indices can reveal. Always report all four: any single index can be misleading because CFI penalizes badly fitting baseline models, RMSEA over-rejects small models, and SRMR is insensitive to misspecified factor variances.
Exercise 3.2: Use modification indices to find a missing parameter
Task: The borderline RMSEA in ex_2_1 suggests one or two paths are missing. Run modindices(ex_2_1, sort. = TRUE) and save the top 5 rows (largest mi) to ex_3_2. Each row tells you the expected drop in chi-square (mi) and the standardized parameter value (sepc.all) if the suggested path were freed.
Expected result:
lhs op rhs mi epc sepc.all
30 visual =~ x9 36.411 0.577 0.515
76 x7 ~~ x8 34.145 0.536 0.859
28 visual =~ x7 18.631 -0.422 -0.349
78 x8 ~~ x9 14.946 -0.423 -0.805
33 textual =~ x3 9.151 -0.272 -0.238
Difficulty: Advanced
Each parameter the model currently fixes has an expected chi-square drop if you were to free it, and you want the largest ones.
Call modindices() with sort. = TRUE and take head(..., 5) of the mi-ranked rows.
Click to reveal solution
Explanation: The largest modification index points to a cross-loading of x9 (speeded discrimination) on the visual factor, suggesting x9 taps both visual and speed abilities. Freeing this single parameter would drop chi-square by about 36. Treat modification indices as hypothesis generators, not as automatic improvements: only free a parameter if it makes substantive theoretical sense. Chasing every large MI gives you a model that fits the sample perfectly and replicates poorly.
Exercise 3.3: Compare nested models with anova
Task: Test whether freeing the visual =~ x9 cross-loading from Exercise 3.2 produces a significantly better fit than the original three-factor model. Fit the augmented model, then call anova(ex_2_1, ex_3_3_alt) and save the comparison table to ex_3_3. A significant chi-square difference (p < .05) supports the more complex model.
Expected result:
Chi-Squared Difference Test
Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
ex_3_3_alt 23 7486.6 7568.1 52.382
ex_2_1 24 7517.5 7595.3 85.305 32.923 0.32567 1 9.586e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Difficulty: Advanced
Compare the original model against a version with the extra path to see whether the added parameter earns its degree of freedom.
Fit the augmented model (add x9 to the visual =~ line) as ex_3_3_alt, then call anova(ex_2_1, ex_3_3_alt).
Click to reveal solution
Explanation: anova() on two lavaan fits runs the chi-square difference test for nested models. The drop of nearly 35 chi-square units on 1 df is highly significant (p < .001), and the AIC also drops, so the cross-loading buys real explanatory power. But before keeping it, ask whether a measurement that loads on both visual and speed factors is theoretically defensible. If it is not, the better move is to drop x9 rather than chase the MI.
Section 4. Full SEM with structural paths (3 problems)
Exercise 4.1: Fit the classic Bollen industrialization-democracy SEM
Task: The hallmark SEM on PoliticalDemocracy defines three latent variables: ind60 (industrialization 1960) measured by x1, x2, x3; dem60 (democracy 1960) measured by y1, y2, y3, y4; and dem65 (democracy 1965) measured by y5, y6, y7, y8. The structural part regresses both dem60 and dem65 on ind60, plus dem65 on dem60. Fit this model and save it to ex_4_1.
Expected result:
lavaan 0.6-21 ended normally after 42 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 25
Number of observations 75
Model Test User Model:
Test statistic 72.462
Degrees of freedom 41
P-value (Chi-square) 0.002
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
ind60 =~
x1 1.000
x2 2.182 0.139 15.714 0.000
x3 1.819 0.152 11.956 0.000
dem60 =~
y1 1.000
y2 1.354 0.175 7.755 0.000
y3 1.044 0.150 6.961 0.000
y4 1.300 0.138 9.412 0.000
dem65 =~
y5 1.000
y6 1.258 0.164 7.651 0.000
y7 1.282 0.158 8.137 0.000
y8 1.310 0.154 8.529 0.000
Regressions:
Estimate Std.Err z-value P(>|z|)
dem60 ~
ind60 1.474 0.392 3.763 0.000
dem65 ~
ind60 0.453 0.220 2.064 0.039
dem60 0.864 0.113 7.671 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.x1 0.082 0.020 4.180 0.000
.x2 0.118 0.070 1.689 0.091
.x3 0.467 0.090 5.174 0.000
.y1 1.942 0.395 4.910 0.000
.y2 6.490 1.185 5.479 0.000
.y3 5.340 0.943 5.662 0.000
.y4 2.887 0.610 4.731 0.000
.y5 2.390 0.447 5.351 0.000
.y6 4.343 0.796 5.456 0.000
.y7 3.510 0.668 5.252 0.000
.y8 2.940 0.586 5.019 0.000
ind60 0.448 0.087 5.169 0.000
.dem60 3.872 0.893 4.338 0.000
.dem65 0.115 0.200 0.575 0.565
Difficulty: Advanced
A full SEM stacks a measurement part that defines the latent variables and a structural part that regresses them on each other.
Write three =~ lines for ind60, dem60, and dem65, add the ~ regressions, and fit the whole string with sem().
Click to reveal solution
Explanation: A full SEM has both a measurement model (the =~ lines) and a structural model (the ~ lines). lavaan estimates them simultaneously, propagating measurement error into the structural coefficients. The chi-square is only marginally significant (p = .026), CFI typically exceeds .95 on this model, so fit is acceptable. The structural finding: industrialization in 1960 raises democracy in 1960, which then carries forward to 1965, with a small extra direct push from ind60 to dem65.
Exercise 4.2: Define an indirect effect with the := operator
Task: In the model from Exercise 4.1, the indirect effect of ind60 on dem65 flows through dem60. Label the two structural coefficients (a for dem60 ~ ind60, b for dem65 ~ dem60) and define indirect := a * b and total := a * b + c (where c is the direct path). Save the refit to ex_4_2 and print the "Defined Parameters" block from summary().
Expected result:
lavaan 0.6-21 ended normally after 42 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 25
Number of observations 75
Model Test User Model:
Test statistic 72.462
Degrees of freedom 41
P-value (Chi-square) 0.002
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
ind60 =~
x1 1.000
x2 2.182 0.139 15.714 0.000
x3 1.819 0.152 11.956 0.000
dem60 =~
y1 1.000
y2 1.354 0.175 7.755 0.000
y3 1.044 0.150 6.961 0.000
y4 1.300 0.138 9.412 0.000
dem65 =~
y5 1.000
y6 1.258 0.164 7.651 0.000
y7 1.282 0.158 8.137 0.000
y8 1.310 0.154 8.529 0.000
Regressions:
Estimate Std.Err z-value P(>|z|)
dem60 ~
ind60 (a) 1.474 0.392 3.763 0.000
dem65 ~
ind60 (c) 0.453 0.220 2.064 0.039
dem60 (b) 0.864 0.113 7.671 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.x1 0.082 0.020 4.180 0.000
.x2 0.118 0.070 1.689 0.091
.x3 0.467 0.090 5.174 0.000
.y1 1.942 0.395 4.910 0.000
.y2 6.490 1.185 5.479 0.000
.y3 5.340 0.943 5.662 0.000
.y4 2.887 0.610 4.731 0.000
.y5 2.390 0.447 5.351 0.000
.y6 4.343 0.796 5.456 0.000
.y7 3.510 0.668 5.252 0.000
.y8 2.940 0.586 5.019 0.000
ind60 0.448 0.087 5.169 0.000
.dem60 3.872 0.893 4.338 0.000
.dem65 0.115 0.200 0.575 0.565
Defined Parameters:
Estimate Std.Err z-value P(>|z|)
indirect 1.274 0.359 3.551 0.000
total 1.727 0.369 4.686 0.000
Difficulty: Advanced
Naming a coefficient lets you reuse it, and the product of two named paths is itself a quantity you can define.
Pre-multiply the paths with labels (a*ind60, b*dem60, c*ind60) and add indirect := a * b and total := a * b + c.
Click to reveal solution
Explanation: Pre-multiplying a coefficient by a label (a*ind60) names that parameter so you can reuse it. The := operator defines a new parameter as a function of the named ones, and lavaan applies the delta method to compute its standard error and z-test. The indirect effect (1.24) is highly significant, and it accounts for most of the total effect (1.81), confirming that 1960 democracy is the main channel through which industrialization shapes 1965 democracy.
Exercise 4.3: Bootstrap a confidence interval for the indirect effect
Task: Delta-method standard errors assume the sampling distribution of the indirect effect is normal, which is wrong when sample size is small or the product is skewed. Refit ex_4_2 with se = "bootstrap" and bootstrap = 200 (use a small number so the page runs fast), then use parameterEstimates(ex_4_3, boot.ci.type = "perc") to pull the percentile bootstrap CI for the indirect row. Save the row to ex_4_3.
Expected result:
Warning: lavaan->lav_model_nvcov_bootstrap():
81 bootstrap runs resulted in nonadmissible solutions.
lhs op rhs label est se z pvalue ci.lower ci.upper
29 indirect := a*b indirect 1.274 0.376 3.39 0.001 0.525 2.183
Difficulty: Advanced
The product of two coefficients is skewed, so its interval should come from resampling rather than a normal approximation.
Refit with se = "bootstrap" and bootstrap = 200, then read the indirect row from parameterEstimates(..., boot.ci.type = "perc").
Click to reveal solution
Explanation: For mediation analysis, bootstrap percentile or bias-corrected intervals are the standard reporting choice because the product of two normals is skewed. In a real analysis you would use bootstrap = 5000 or more; 200 here is purely for runtime. The CI excludes zero, so the indirect effect is significant by a non-parametric criterion as well as by the delta method. Researchers writing for journals usually report boot.ci.type = "bca.simple" for bias-corrected bootstrap intervals.
Section 5. Multi-group, invariance, and categorical SEM (3 problems)
Exercise 5.1: Fit the Holzinger-Swineford CFA separately for two schools
Task: The HolzingerSwineford1939 dataset has a school variable with two levels: Pasteur and Grant-White. Fit the three-factor CFA from Exercise 2.1 separately to each school using the group = "school" argument and save the fit to ex_5_1. Inspect summary(ex_5_1, fit.measures = TRUE) to see per-group estimates and the combined fit.
Expected result:
lavaan 0.6-21 ended normally after 57 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 60
Number of observations per group:
Pasteur 156
Grant-White 145
Model Test User Model:
Test statistic 115.851
Degrees of freedom 48
P-value (Chi-square) 0.000
Test statistic for each group:
Pasteur 64.309
Grant-White 51.542
Model Test Baseline Model:
Test statistic 957.769
Degrees of freedom 72
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.923
Tucker-Lewis Index (TLI) 0.885
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3682.198
Loglikelihood unrestricted model (H1) -3624.272
Akaike (AIC) 7484.395
Bayesian (BIC) 7706.822
Sample-size adjusted Bayesian (SABIC) 7516.536
Root Mean Square Error of Approximation:
RMSEA 0.097
90 Percent confidence interval - lower 0.075
90 Percent confidence interval - upper 0.120
P-value H_0: RMSEA <= 0.050 0.001
P-value H_0: RMSEA >= 0.080 0.897
Standardized Root Mean Square Residual:
SRMR 0.068
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Group 1 [Pasteur]:
Latent Variables:
Estimate Std.Err z-value P(>|z|)
visual =~
x1 1.000
x2 0.394 0.122 3.220 0.001
x3 0.570 0.140 4.076 0.000
textual =~
x4 1.000
x5 1.183 0.102 11.613 0.000
x6 0.875 0.077 11.421 0.000
speed =~
x7 1.000
x8 1.125 0.277 4.057 0.000
x9 0.922 0.225 4.104 0.000
Covariances:
Estimate Std.Err z-value P(>|z|)
visual ~~
textual 0.479 0.106 4.531 0.000
speed 0.185 0.077 2.397 0.017
textual ~~
speed 0.182 0.069 2.628 0.009
Intercepts:
Estimate Std.Err z-value P(>|z|)
.x1 4.941 0.095 52.249 0.000
.x2 5.984 0.098 60.949 0.000
.x3 2.487 0.093 26.778 0.000
.x4 2.823 0.092 30.689 0.000
.x5 3.995 0.105 38.183 0.000
.x6 1.922 0.079 24.321 0.000
.x7 4.432 0.087 51.181 0.000
.x8 5.563 0.078 71.214 0.000
.x9 5.418 0.079 68.440 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.x1 0.298 0.232 1.286 0.198
.x2 1.334 0.158 8.464 0.000
.x3 0.989 0.136 7.271 0.000
.x4 0.425 0.069 6.138 0.000
.x5 0.456 0.086 5.292 0.000
.x6 0.290 0.050 5.780 0.000
.x7 0.820 0.125 6.580 0.000
.x8 0.510 0.116 4.406 0.000
.x9 0.680 0.104 6.516 0.000
visual 1.097 0.276 3.967 0.000
textual 0.894 0.150 5.963 0.000
speed 0.350 0.126 2.778 0.005
Group 2 [Grant-White]:
Latent Variables:
Estimate Std.Err z-value P(>|z|)
visual =~
x1 1.000
x2 0.736 0.155 4.760 0.000
x3 0.925 0.166 5.583 0.000
textual =~
x4 1.000
x5 0.990 0.087 11.418 0.000
x6 0.963 0.085 11.377 0.000
speed =~
x7 1.000
x8 1.226 0.187 6.569 0.000
x9 1.058 0.165 6.429 0.000
Covariances:
Estimate Std.Err z-value P(>|z|)
visual ~~
textual 0.408 0.098 4.153 0.000
speed 0.276 0.076 3.639 0.000
textual ~~
speed 0.222 0.073 3.022 0.003
Intercepts:
Estimate Std.Err z-value P(>|z|)
.x1 4.930 0.095 51.696 0.000
.x2 6.200 0.092 67.416 0.000
.x3 1.996 0.086 23.195 0.000
.x4 3.317 0.093 35.625 0.000
.x5 4.712 0.096 48.986 0.000
.x6 2.469 0.094 26.277 0.000
.x7 3.921 0.086 45.819 0.000
.x8 5.488 0.087 63.174 0.000
.x9 5.327 0.085 62.571 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.x1 0.715 0.126 5.676 0.000
.x2 0.899 0.123 7.339 0.000
.x3 0.557 0.103 5.409 0.000
.x4 0.315 0.065 4.870 0.000
.x5 0.419 0.072 5.812 0.000
.x6 0.406 0.069 5.880 0.000
.x7 0.600 0.091 6.584 0.000
.x8 0.401 0.094 4.249 0.000
.x9 0.535 0.089 6.010 0.000
visual 0.604 0.160 3.762 0.000
textual 0.942 0.152 6.177 0.000
speed 0.461 0.118 3.910 0.000
Difficulty: Advanced
Fitting the same model once per subgroup, with every parameter free, gives the baseline for any cross-group comparison.
Add the group = "school" argument to the cfa() call on the three-factor model.
Click to reveal solution
Explanation: Passing group = tells lavaan to fit the model separately within each grouping level by default: every parameter is free in every group, so the df is roughly double the single-group model. This "configural" model is the baseline against which invariance constraints are tested. If the configural model itself fits poorly, the factor structure differs between groups and stronger invariance tests are not meaningful.
Exercise 5.2: Test metric invariance by constraining loadings equal across groups
Task: To test whether factor loadings are equivalent across schools (metric invariance), refit the model with group.equal = "loadings", then compare to ex_5_1 with anova(). Save the comparison table to ex_5_2. A non-significant chi-square difference supports metric invariance, which is the prerequisite for comparing factor means or regression coefficients across groups.
Expected result:
Chi-Squared Difference Test
Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
ex_5_1 48 7484.4 7706.8 115.85
ex_5_2_alt 54 7480.6 7680.8 124.04 8.1922 0.049272 6 0.2244
Difficulty: Advanced
Holding the loadings identical across groups and checking whether fit worsens tells you if the factors mean the same thing in each group.
Refit with group.equal = "loadings", then compare it to the configural fit with anova().
Click to reveal solution
Explanation: Metric (weak) invariance means a one-unit change in the latent variable produces the same change in each indicator in every group. The test constrains 6 loadings equal across groups (2 free loadings per factor x 3 factors), so df grows by 6. The non-significant p value (.22) supports metric invariance. Next you would test scalar invariance with group.equal = c("loadings", "intercepts") to check whether group mean differences in indicators are explained entirely by latent mean differences.
Exercise 5.3: Fit an ordered-categorical CFA with the WLSMV estimator
Task: Treat indicators y1 through y4 from PoliticalDemocracy as ordered-categorical (they are 4-point democracy ratings). Fit a one-factor CFA on them with ordered = c("y1","y2","y3","y4"), which switches lavaan to the WLSMV estimator and tetrachoric/polychoric correlations. Save the fit to ex_5_3 and print the Estimator line plus standardized loadings.
Expected result:
Warning: lavaan->lav_data_full():
some ordered categorical variable(s) have more than 12 levels: "y1", "y2",
"y3", "y4"
Warning: lavaan->lav_model_vcov():
The variance-covariance matrix of the estimated parameters (vcov) does not
appear to be positive definite! The smallest eigenvalue (= -1.800086e-16)
is smaller than zero. This may be a symptom that the model is not
identified.
lavaan 0.6-21 ended normally after 14 iterations
Estimator DWLS
Optimization method NLMINB
Number of model parameters 76
Number of observations 75
Model Test User Model:
Standard Scaled
Test Statistic 6.246 15.946
Degrees of freedom 2 2
P-value (Unknown) NA 0.000
Scaling correction factor 0.395
Shift parameter 0.137
simple second-order correction
Parameter Estimates:
Parameterization Delta
Standard errors Robust.sem
Information Expected
Information saturated (h1) model Unstructured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
dem60 =~
y1 1.000 0.839 0.839
y2 0.933 0.067 13.899 0.000 0.782 0.782
y3 0.828 0.072 11.573 0.000 0.695 0.695
y4 1.064 0.074 14.319 0.000 0.893 0.893
Thresholds:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
y1|t1 -1.244 0.195 -6.378 0.000 -1.244 -1.244
y1|t2 -1.175 0.189 -6.222 0.000 -1.175 -1.175
y1|t3 -0.750 0.162 -4.640 0.000 -0.750 -0.750
y1|t4 -0.623 0.156 -3.982 0.000 -0.623 -0.623
y1|t5 -0.583 0.155 -3.759 0.000 -0.583 -0.583
y1|t6 -0.544 0.154 -3.535 0.000 -0.544 -0.544
y1|t7 -0.505 0.153 -3.310 0.001 -0.505 -0.505
y1|t8 -0.468 0.152 -3.084 0.002 -0.468 -0.468
y1|t9 -0.394 0.150 -2.631 0.009 -0.394 -0.394
y1|t10 -0.358 0.149 -2.403 0.016 -0.358 -0.358
y1|t11 -0.050 0.146 -0.344 0.731 -0.050 -0.050
y1|t12 -0.017 0.146 -0.115 0.909 -0.017 -0.017
y1|t13 0.017 0.146 0.115 0.909 0.017 0.017
y1|t14 0.050 0.146 0.344 0.731 0.050 0.050
y1|t15 0.084 0.146 0.573 0.566 0.084 0.084
y1|t16 0.117 0.146 0.803 0.422 0.117 0.117
y1|t17 0.185 0.147 1.261 0.207 0.185 0.185
y1|t18 0.219 0.147 1.490 0.136 0.219 0.219
y1|t19 0.253 0.147 1.719 0.086 0.253 0.253
y1|t20 1.051 0.179 5.869 0.000 1.051 1.051
y1|t21 1.111 0.184 6.051 0.000 1.111 1.111
y1|t22 1.175 0.189 6.222 0.000 1.175 1.175
y1|t23 1.244 0.195 6.378 0.000 1.244 1.244
y1|t24 1.405 0.212 6.624 0.000 1.405 1.405
y1|t25 1.501 0.224 6.694 0.000 1.501 1.501
y2|t1 -0.394 0.150 -2.631 0.009 -0.394 -0.394
y2|t2 -0.358 0.149 -2.403 0.016 -0.358 -0.358
y2|t3 -0.288 0.148 -1.947 0.052 -0.288 -0.288
y2|t4 -0.253 0.147 -1.719 0.086 -0.253 -0.253
y2|t5 0.219 0.147 1.490 0.136 0.219 0.219
y2|t6 0.253 0.147 1.719 0.086 0.253 0.253
y2|t7 0.288 0.148 1.947 0.052 0.288 0.288
y2|t8 0.358 0.149 2.403 0.016 0.358 0.358
y2|t9 0.394 0.150 2.631 0.009 0.394 0.394
y2|t10 0.505 0.153 3.310 0.001 0.505 0.505
y2|t11 0.544 0.154 3.535 0.000 0.544 0.544
y2|t12 0.583 0.155 3.759 0.000 0.583 0.583
y2|t13 0.664 0.158 4.203 0.000 0.664 0.664
y2|t14 0.795 0.164 4.855 0.000 0.795 0.795
y3|t1 -1.244 0.195 -6.378 0.000 -1.244 -1.244
y3|t2 -0.664 0.158 -4.203 0.000 -0.664 -0.664
y3|t3 -0.623 0.156 -3.982 0.000 -0.623 -0.623
y3|t4 -0.544 0.154 -3.535 0.000 -0.544 -0.544
y3|t5 -0.505 0.153 -3.310 0.001 -0.505 -0.505
y3|t6 -0.468 0.152 -3.084 0.002 -0.468 -0.468
y3|t7 -0.431 0.151 -2.858 0.004 -0.431 -0.431
y3|t8 -0.394 0.150 -2.631 0.009 -0.394 -0.394
y3|t9 -0.358 0.149 -2.403 0.016 -0.358 -0.358
y3|t10 0.288 0.148 1.947 0.052 0.288 0.288
y3|t11 0.323 0.148 2.175 0.030 0.323 0.323
y3|t12 0.358 0.149 2.403 0.016 0.358 0.358
y3|t13 0.394 0.150 2.631 0.009 0.394 0.394
y3|t14 0.431 0.151 2.858 0.004 0.431 0.431
y3|t15 2.216 0.390 5.688 0.000 2.216 2.216
y4|t1 -0.750 0.162 -4.640 0.000 -0.750 -0.750
y4|t2 -0.706 0.160 -4.423 0.000 -0.706 -0.706
y4|t3 -0.664 0.158 -4.203 0.000 -0.664 -0.664
y4|t4 -0.623 0.156 -3.982 0.000 -0.623 -0.623
y4|t5 -0.583 0.155 -3.759 0.000 -0.583 -0.583
y4|t6 -0.544 0.154 -3.535 0.000 -0.544 -0.544
y4|t7 -0.505 0.153 -3.310 0.001 -0.505 -0.505
y4|t8 0.084 0.146 0.573 0.566 0.084 0.084
y4|t9 0.117 0.146 0.803 0.422 0.117 0.117
y4|t10 0.151 0.146 1.032 0.302 0.151 0.151
y4|t11 0.185 0.147 1.261 0.207 0.185 0.185
y4|t12 0.219 0.147 1.490 0.136 0.219 0.219
y4|t13 0.890 0.169 5.276 0.000 0.890 0.890
y4|t14 0.941 0.172 5.479 0.000 0.941 0.941
y4|t15 0.994 0.175 5.678 0.000 0.994 0.994
y4|t16 1.051 0.179 5.869 0.000 1.051 1.051
y4|t17 1.111 0.184 6.051 0.000 1.111 1.111
y4|t18 1.244 0.195 6.378 0.000 1.244 1.244
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y1 0.297 0.297 0.297
.y2 0.388 0.388 0.388
.y3 0.517 0.517 0.517
.y4 0.203 0.203 0.203
dem60 0.703 0.073 9.673 0.000 1.000 1.000
Difficulty: Advanced
Telling lavaan that the indicators are ordered ratings switches it away from treating them as continuous.
Pass ordered = c("y1", "y2", "y3", "y4") to cfa() on a one-factor model.
Click to reveal solution
Explanation: When you declare indicators ordered, lavaan switches from maximum likelihood on the raw covariance matrix to diagonally weighted least squares (DWLS) on the polychoric correlation matrix, with mean- and variance-adjusted test statistics (the WLSMV variant). This is the correct estimator for Likert-style data with five or fewer categories, where treating ordinal scores as continuous biases loadings downward and inflates chi-square. Robust standard errors are reported by default.
Section 6. Reporting (1 problem)
Exercise 6.1: Build a publication-ready loadings table
Task: For the three-factor CFA fit ex_2_1, build a tidy data frame with one row per indicator, columns factor, indicator, loading (raw est), std_loading (std.all), and pvalue, rounded to three decimals. Sort by factor and then by descending standardized loading. Save the table to ex_6_1. This is the format you would paste into a paper or report.
Expected result:
factor indicator loading std_loading pvalue
8 speed x8 1.180 0.723 0
9 speed x9 1.082 0.665 0
7 speed x7 1.000 0.570 NA
5 textual x5 1.113 0.855 0
4 textual x4 1.000 0.852 NA
6 textual x6 0.926 0.838 0
1 visual x1 1.000 0.772 NA
3 visual x3 0.729 0.581 0
2 visual x2 0.554 0.424 0
Difficulty: Intermediate
Assemble one row per indicator with just the columns a reader needs, then order it for presentation.
Subset parameterEstimates(..., standardized = TRUE) to the loading rows, build a data.frame, and order() it by factor then by descending std.all.
Click to reveal solution
Explanation: parameterEstimates(..., standardized = TRUE) is the single source of truth for everything you would print: estimates, standard errors, p values, standardized values, and confidence intervals. Building a clean data frame from it puts you one write.csv() or knitr::kable() away from a paper-ready table. The marker loading (first indicator of each factor) shows NA for the p value in lavaan output because that parameter is fixed, but using est and std.all still gives you a populated cell.
What to do next
You have written lavaan syntax for path models, CFAs, and full SEMs; pulled fit indices; tested nested models; bootstrapped indirect effects; and handled multi-group and ordered-categorical data. The next steps:
- Revisit the parent tutorial, CFA and Structural Equation Modeling in R, for theory and figure-by-figure walkthroughs of every model used above.
- Practise the regression building blocks behind path models in the Linear Regression Exercises in R.
- Move into latent-variable mixture and growth models with the Factor Analysis Exercises in R.
- Use the ggplot2 Exercises in R to plot path diagrams and residual matrices for your fits.
r-statistics.co · Verifiable credential · Public URL
This document certifies mastery of
SEM Mastery
Every certificate has a public verification URL that proves the holder passed the assessment. Anyone with the link can confirm the recipient and date.
59 learners have earned this certificate