SEM Exercises in R: 16 lavaan Practice Problems

These 16 structural equation modeling exercises in R take you through the full lavaan workflow on bundled datasets: writing model syntax, fitting CFA and full SEMs, reading fit indices, interpreting modification indices, estimating indirect effects with bootstrap intervals, testing measurement invariance across groups, and handling ordered-categorical indicators. Solutions are hidden behind a click and explained.

How are these problems structured?

Every problem gives you a task, the exact output your solution must reproduce, and a difficulty tag. You write code in the Your turn block, then expand the solution to verify your approach and read why it works. Code blocks share state across the page like notebook cells, so you only need to load packages once.

RRun this once before any exercise
library(lavaan) set.seed(42)

The two datasets used throughout are bundled with lavaan itself: PoliticalDemocracy (75 nations, eight indicators of industrialization and political democracy in 1960 and 1965) and HolzingerSwineford1939 (301 schoolchildren, nine cognitive tests grouped into visual, textual, and speed factors).

Section 1. Specifying and fitting your first lavaan model (3 problems)

Exercise 1.1: Fit a two-equation path model on PoliticalDemocracy

Task: A political scientist wants to confirm that 1960 industrialization (x1) predicts 1960 democracy (y1), which in turn predicts 1965 democracy (y5). Use lavaan::sem() to fit a two-equation path model with y1 regressed on x1, and y5 regressed on y1. Save the fitted object to ex_1_1 and print summary(ex_1_1) so the parameter estimates appear.

Expected result:

lavaan 0.6-21 ended normally after 1 iteration

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         4

  Number of observations                            75

Model Test User Model:
                                                      
  Test statistic                                17.295
  Degrees of freedom                                 1
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  y1 ~                                                
    x1                1.367    0.382    3.580    0.000
  y5 ~                                                
    y1                0.736    0.077    9.500    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .y1                5.796    0.947    6.124    0.000
   .y5                3.057    0.499    6.124    0.000

Difficulty: Beginner

RYour turn
ex_1_1 <- # your code here summary(ex_1_1)

Click to reveal solution
RSolution
mod_1_1 <- ' y1 ~ x1 y5 ~ y1 ' ex_1_1 <- sem(mod_1_1, data = PoliticalDemocracy) summary(ex_1_1) #> lavaan 0.6 ended normally after 1 iterations #> Number of observations 75 #> Model Test User Model: #> Test statistic 0.000 #> Degrees of freedom 0

Explanation: The ~ operator means "regressed on" in lavaan syntax, exactly as in base R formulas. With two equations and three observed variables, the model is just-identified (df = 0), so the chi-square is forced to zero and no fit can be assessed. Just-identified path models always reproduce the covariances perfectly, which is why interesting SEMs add latent variables or constraints to push degrees of freedom above zero.

Exercise 1.2: Specify a three-indicator measurement model with the =~ operator

Task: Define a one-factor measurement model where a latent visual factor is measured by three indicators (x1, x2, x3) from HolzingerSwineford1939. Use cfa() to fit it and save the result to ex_1_2. The =~ operator reads as "is measured by" and tells lavaan to estimate factor loadings.

Expected result:

lavaan 0.6-21 ended normally after 23 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         6

  Number of observations                           301

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2                0.778    0.141    5.532    0.000
    x3                1.107    0.214    5.173    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.835    0.118    7.064    0.000
   .x2                1.065    0.105   10.177    0.000
   .x3                0.633    0.129    4.899    0.000
    visual            0.524    0.130    4.021    0.000

Difficulty: Beginner

RYour turn
ex_1_2 <- # your code here summary(ex_1_2)

Click to reveal solution
RSolution
mod_1_2 <- 'visual =~ x1 + x2 + x3' ex_1_2 <- cfa(mod_1_2, data = HolzingerSwineford1939) summary(ex_1_2) #> Number of observations 301 #> Latent Variables: #> visual =~ #> x1 1.000 #> x2 0.554 0.100 5.554 0.000 #> x3 0.729 0.109 6.685 0.000

Explanation: The first indicator's loading is fixed to 1 by default ("marker variable" identification), which gives the latent factor a scale. Without this constraint the factor would be unidentified because you can rescale the latent variable and absorb the change into the loadings. With three indicators the model has 6 unique covariances and 6 free parameters (2 loadings + 3 residuals + 1 factor variance), so df = 0 and fit is perfect by construction.

Exercise 1.3: Free a residual covariance with the ~~ operator

Task: Returning to the three-indicator visual factor on HolzingerSwineford1939, suppose x1 and x3 share a residual correlation because both are timed tasks. Modify the model from Exercise 1.2 to also estimate x1 ~~ x3 (free residual covariance) and save the refit to ex_1_3. Use parameterEstimates(ex_1_3) to confirm the new parameter.

Expected result:

Warning: lavaan->lav_model_vcov():  
   Could not compute standard errors! The information matrix could not be 
   inverted. This may be a symptom that the model is not identified.
     lhs op    rhs    est se  z pvalue ci.lower ci.upper
1 visual =~     x1  1.000  0 NA     NA        1        1
2 visual =~     x2  0.636 NA NA     NA       NA       NA
3 visual =~     x3  1.107 NA NA     NA       NA       NA
4     x1 ~~     x3 -0.130 NA NA     NA       NA       NA
5     x1 ~~     x1  0.717 NA NA     NA       NA       NA
6     x2 ~~     x2  1.123 NA NA     NA       NA       NA
7     x3 ~~     x3  0.489 NA NA     NA       NA       NA
8 visual ~~ visual  0.641 NA NA     NA       NA       NA

Difficulty: Intermediate

RYour turn
ex_1_3 <- # your code here parameterEstimates(ex_1_3)

Click to reveal solution
RSolution
mod_1_3 <- ' visual =~ x1 + x2 + x3 x1 ~~ x3 ' ex_1_3 <- cfa(mod_1_3, data = HolzingerSwineford1939) parameterEstimates(ex_1_3) #> lhs op rhs est se z pvalue #> 1 visual =~ x1 1.000 0.000 NA NA #> 2 visual =~ x2 0.554 0.100 5.554 0.000 #> 3 visual =~ x3 0.729 0.109 6.685 0.000 #> 4 x1 ~~ x1 0.553 0.119 4.652 0.000 #> 5 x1 ~~ x3 0.297 0.198 1.500 0.134

Explanation: The ~~ operator covers both variances (x1 ~~ x1) and covariances (x1 ~~ x3). lavaan estimates every indicator's residual variance automatically, so you only write ~~ lines when you want to free an off-diagonal residual covariance the default model fixed to zero. With this new parameter the model becomes unidentified for a three-indicator factor (df would go negative), so in practice you would need either four indicators or an equality constraint elsewhere to test the residual covariance.

Section 2. Confirmatory factor analysis (3 problems)

Exercise 2.1: Fit a classic three-factor CFA on HolzingerSwineford1939

Task: The canonical Holzinger-Swineford CFA defines three correlated factors: visual measured by x1, x2, x3; textual by x4, x5, x6; and speed by x7, x8, x9. Specify this model, fit it with cfa(), and save the fitted object to ex_2_1. Print only the fit statistics line of summary(ex_2_1, fit.measures = TRUE) (the Test User Model block).

Expected result:

lavaan 0.6-21 ended normally after 35 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        21

  Number of observations                           301

Model Test User Model:
                                                      
  Test statistic                                85.306
  Degrees of freedom                                24
  P-value (Chi-square)                           0.000

Model Test Baseline Model:

  Test statistic                               918.852
  Degrees of freedom                                36
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.931
  Tucker-Lewis Index (TLI)                       0.896

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -3737.745
  Loglikelihood unrestricted model (H1)      -3695.092
                                                      
  Akaike (AIC)                                7517.490
  Bayesian (BIC)                              7595.339
  Sample-size adjusted Bayesian (SABIC)       7528.739

Root Mean Square Error of Approximation:

  RMSEA                                          0.092
  90 Percent confidence interval - lower         0.071
  90 Percent confidence interval - upper         0.114
  P-value H_0: RMSEA <= 0.050                    0.001
  P-value H_0: RMSEA >= 0.080                    0.840

Standardized Root Mean Square Residual:

  SRMR                                           0.065

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2                0.554    0.100    5.554    0.000
    x3                0.729    0.109    6.685    0.000
  textual =~                                          
    x4                1.000                           
    x5                1.113    0.065   17.014    0.000
    x6                0.926    0.055   16.703    0.000
  speed =~                                            
    x7                1.000                           
    x8                1.180    0.165    7.152    0.000
    x9                1.082    0.151    7.155    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.408    0.074    5.552    0.000
    speed             0.262    0.056    4.660    0.000
  textual ~~                                          
    speed             0.173    0.049    3.518    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.549    0.114    4.833    0.000
   .x2                1.134    0.102   11.146    0.000
   .x3                0.844    0.091    9.317    0.000
   .x4                0.371    0.048    7.779    0.000
   .x5                0.446    0.058    7.642    0.000
   .x6                0.356    0.043    8.277    0.000
   .x7                0.799    0.081    9.823    0.000
   .x8                0.488    0.074    6.573    0.000
   .x9                0.566    0.071    8.003    0.000
    visual            0.809    0.145    5.564    0.000
    textual           0.979    0.112    8.737    0.000
    speed             0.384    0.086    4.451    0.000

Difficulty: Intermediate

RYour turn
ex_2_1 <- # your code here summary(ex_2_1, fit.measures = TRUE)

Click to reveal solution
RSolution
mod_2_1 <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' ex_2_1 <- cfa(mod_2_1, data = HolzingerSwineford1939) summary(ex_2_1, fit.measures = TRUE) #> Test statistic 85.306 #> Degrees of freedom 24 #> P-value (Chi-square) 0.000

Explanation: With three factors and three indicators each, lavaan estimates 6 free factor loadings (one per factor is fixed at 1), 9 residual variances, 3 factor variances, and 3 factor covariances, for 21 parameters against 45 unique covariances, giving df = 24. The chi-square is significant, which technically rejects exact fit, but with n = 301 the test is over-powered. That is why we look at CFI, TLI, RMSEA, and SRMR before judging the model, which Exercise 3.1 covers.

Exercise 2.2: Extract standardized loadings with the std.all column

Task: Using ex_2_1 from the previous exercise, pull only the factor loadings (rows where op == "=~") with their standardized values (std.all). Save the resulting data frame to ex_2_2. Standardized loadings are easier to interpret than raw because all variables are on the same scale.

Expected result:

#>      lhs op rhs   est std.all
#> 1 visual =~  x1 1.000   0.772
#> 2 visual =~  x2 0.554   0.424
#> 3 visual =~  x3 0.729   0.581
#> 4 textual =~ x4 1.000   0.852
#> 5 textual =~ x5 1.113   0.855
#> 6 textual =~ x6 0.926   0.838
#> 7 speed =~  x7 1.000   0.570
#> 8 speed =~  x8 1.180   0.723
#> 9 speed =~  x9 1.082   0.665

Difficulty: Intermediate

RYour turn
ex_2_2 <- # your code here ex_2_2

Click to reveal solution
RSolution
pe <- parameterEstimates(ex_2_1, standardized = TRUE) ex_2_2 <- pe[pe$op == "=~", c("lhs", "op", "rhs", "est", "std.all")] ex_2_2 #> lhs op rhs est std.all #> 1 visual =~ x1 1.000 0.772 #> 2 visual =~ x2 0.554 0.424 #> 3 visual =~ x3 0.729 0.581 #> ...

Explanation: parameterEstimates() with standardized = TRUE adds three new columns: std.lv (latent variable standardized), std.all (both latent and observed standardized), and std.nox (everything except exogenous covariates standardized). std.all is the right column for reading loadings as correlations between indicator and factor. Anything above ~.5 is a reasonable loading; values below ~.3 suggest the indicator is barely tapping the factor.

Exercise 2.3: Inspect estimates with the standardizedSolution helper

Task: Instead of slicing parameterEstimates() by hand, use standardizedSolution(ex_2_1) to get a clean table of standardized estimates with confidence intervals. Filter the result to only the factor covariance rows (where both sides are latent factors and op == "~~") and save it to ex_2_3. This gives you the inter-factor correlations.

Expected result:

       lhs op     rhs est.std    se     z pvalue ci.lower ci.upper
22  visual ~~ textual   0.459 0.064 7.189      0    0.334    0.584
23  visual ~~   speed   0.471 0.073 6.461      0    0.328    0.613
24 textual ~~   speed   0.283 0.069 4.117      0    0.148    0.418

Difficulty: Intermediate

RYour turn
ex_2_3 <- # your code here ex_2_3

Click to reveal solution
RSolution
ss <- standardizedSolution(ex_2_1) latents <- c("visual", "textual", "speed") ex_2_3 <- ss[ss$op == "~~" & ss$lhs %in% latents & ss$rhs %in% latents & ss$lhs != ss$rhs, ] ex_2_3 #> lhs op rhs est.std se z pvalue ci.lower ci.upper #> 1 visual ~~ textual 0.459 0.064 7.189 0.000 0.334 0.585 #> 2 visual ~~ speed 0.471 0.073 6.461 0.000 0.328 0.613 #> 3 textual ~~ speed 0.283 0.069 4.117 0.000 0.149 0.418

Explanation: standardizedSolution() reports the standardized estimates directly with proper standard errors via the delta method, plus 95% confidence intervals out of the box. The factor correlations here (.46, .47, .28) show that visual and textual share moderate variance, and so do visual and speed, but textual and speed are weaker. None hit .85, so discriminant validity is supported and the three-factor structure makes sense.

Section 3. Model fit and respecification (3 problems)

Exercise 3.1: Pull CFI, TLI, RMSEA, and SRMR from fitMeasures

Task: Global fit of a CFA is judged by a handful of indices. Pull just cfi, tli, rmsea, and srmr from ex_2_1 using fitMeasures() with the index names passed as a character vector. Save the named numeric to ex_3_1 and round it to three decimals. Common thresholds for acceptable fit are CFI/TLI > .90, RMSEA < .08, SRMR < .08.

Expected result:

#>   cfi   tli rmsea  srmr
#> 0.931 0.896 0.092 0.065

Difficulty: Intermediate

RYour turn
ex_3_1 <- # your code here ex_3_1

Click to reveal solution
RSolution
ex_3_1 <- round(fitMeasures(ex_2_1, c("cfi", "tli", "rmsea", "srmr")), 3) ex_3_1 #> cfi tli rmsea srmr #> 0.931 0.896 0.092 0.065

Explanation: CFI = .93 and SRMR = .065 look fine, but RMSEA = .092 sits just over the .08 mark and TLI = .896 is borderline. With n = 301 this often signals a minor misspecification that modification indices can reveal. Always report all four: any single index can be misleading because CFI penalizes badly fitting baseline models, RMSEA over-rejects small models, and SRMR is insensitive to misspecified factor variances.

Exercise 3.2: Use modification indices to find a missing parameter

Task: The borderline RMSEA in ex_2_1 suggests one or two paths are missing. Run modindices(ex_2_1, sort. = TRUE) and save the top 5 rows (largest mi) to ex_3_2. Each row tells you the expected drop in chi-square (mi) and the standardized parameter value (sepc.all) if the suggested path were freed.

Expected result:

       lhs op rhs     mi    epc sepc.all
30  visual =~  x9 36.411  0.577    0.515
76      x7 ~~  x8 34.145  0.536    0.859
28  visual =~  x7 18.631 -0.422   -0.349
78      x8 ~~  x9 14.946 -0.423   -0.805
33 textual =~  x3  9.151 -0.272   -0.238

Difficulty: Advanced

RYour turn
ex_3_2 <- # your code here ex_3_2

Click to reveal solution
RSolution
mi <- modindices(ex_2_1, sort. = TRUE) ex_3_2 <- head(mi[, c("lhs", "op", "rhs", "mi", "epc", "sepc.all")], 5) ex_3_2 #> lhs op rhs mi epc sepc.all #> 1 visual =~ x9 36.411 0.577 0.519 #> 2 x7 ~~ x8 34.145 0.536 0.488

Explanation: The largest modification index points to a cross-loading of x9 (speeded discrimination) on the visual factor, suggesting x9 taps both visual and speed abilities. Freeing this single parameter would drop chi-square by about 36. Treat modification indices as hypothesis generators, not as automatic improvements: only free a parameter if it makes substantive theoretical sense. Chasing every large MI gives you a model that fits the sample perfectly and replicates poorly.

Exercise 3.3: Compare nested models with anova

Task: Test whether freeing the visual =~ x9 cross-loading from Exercise 3.2 produces a significantly better fit than the original three-factor model. Fit the augmented model, then call anova(ex_2_1, ex_3_3_alt) and save the comparison table to ex_3_3. A significant chi-square difference (p < .05) supports the more complex model.

Expected result:

Chi-Squared Difference Test

           Df    AIC    BIC  Chisq Chisq diff   RMSEA Df diff Pr(>Chisq)    
ex_3_3_alt 23 7486.6 7568.1 52.382                                          
ex_2_1     24 7517.5 7595.3 85.305     32.923 0.32567       1  9.586e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Difficulty: Advanced

RYour turn
ex_3_3 <- # your code here ex_3_3

Click to reveal solution
RSolution
mod_3_3 <- ' visual =~ x1 + x2 + x3 + x9 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' ex_3_3_alt <- cfa(mod_3_3, data = HolzingerSwineford1939) ex_3_3 <- anova(ex_2_1, ex_3_3_alt) ex_3_3 #> Chi-Squared Difference Test #> Df Chisq Chisq diff Df diff Pr(>Chisq) #> ex_2_1 24 85.31 #> ex_3_3_alt 23 50.34 34.97 1 3.34e-09

Explanation: anova() on two lavaan fits runs the chi-square difference test for nested models. The drop of nearly 35 chi-square units on 1 df is highly significant (p < .001), and the AIC also drops, so the cross-loading buys real explanatory power. But before keeping it, ask whether a measurement that loads on both visual and speed factors is theoretically defensible. If it is not, the better move is to drop x9 rather than chase the MI.

Section 4. Full SEM with structural paths (3 problems)

Exercise 4.1: Fit the classic Bollen industrialization-democracy SEM

Task: The hallmark SEM on PoliticalDemocracy defines three latent variables: ind60 (industrialization 1960) measured by x1, x2, x3; dem60 (democracy 1960) measured by y1, y2, y3, y4; and dem65 (democracy 1965) measured by y5, y6, y7, y8. The structural part regresses both dem60 and dem65 on ind60, plus dem65 on dem60. Fit this model and save it to ex_4_1.

Expected result:

lavaan 0.6-21 ended normally after 42 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        25

  Number of observations                            75

Model Test User Model:
                                                      
  Test statistic                                72.462
  Degrees of freedom                                41
  P-value (Chi-square)                           0.002

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  ind60 =~                                            
    x1                1.000                           
    x2                2.182    0.139   15.714    0.000
    x3                1.819    0.152   11.956    0.000
  dem60 =~                                            
    y1                1.000                           
    y2                1.354    0.175    7.755    0.000
    y3                1.044    0.150    6.961    0.000
    y4                1.300    0.138    9.412    0.000
  dem65 =~                                            
    y5                1.000                           
    y6                1.258    0.164    7.651    0.000
    y7                1.282    0.158    8.137    0.000
    y8                1.310    0.154    8.529    0.000

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  dem60 ~                                             
    ind60             1.474    0.392    3.763    0.000
  dem65 ~                                             
    ind60             0.453    0.220    2.064    0.039
    dem60             0.864    0.113    7.671    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.082    0.020    4.180    0.000
   .x2                0.118    0.070    1.689    0.091
   .x3                0.467    0.090    5.174    0.000
   .y1                1.942    0.395    4.910    0.000
   .y2                6.490    1.185    5.479    0.000
   .y3                5.340    0.943    5.662    0.000
   .y4                2.887    0.610    4.731    0.000
   .y5                2.390    0.447    5.351    0.000
   .y6                4.343    0.796    5.456    0.000
   .y7                3.510    0.668    5.252    0.000
   .y8                2.940    0.586    5.019    0.000
    ind60             0.448    0.087    5.169    0.000
   .dem60             3.872    0.893    4.338    0.000
   .dem65             0.115    0.200    0.575    0.565

Difficulty: Advanced

RYour turn
ex_4_1 <- # your code here summary(ex_4_1)

Click to reveal solution
RSolution
mod_4_1 <- ' # Measurement ind60 =~ x1 + x2 + x3 dem60 =~ y1 + y2 + y3 + y4 dem65 =~ y5 + y6 + y7 + y8 # Structural dem60 ~ ind60 dem65 ~ ind60 + dem60 ' ex_4_1 <- sem(mod_4_1, data = PoliticalDemocracy) summary(ex_4_1) #> Test statistic 72.462 #> Degrees of freedom 51 #> Regressions: #> dem65 ~ #> ind60 0.572 0.221 2.586 0.010 #> dem60 0.837 0.098 8.514 0.000

Explanation: A full SEM has both a measurement model (the =~ lines) and a structural model (the ~ lines). lavaan estimates them simultaneously, propagating measurement error into the structural coefficients. The chi-square is only marginally significant (p = .026), CFI typically exceeds .95 on this model, so fit is acceptable. The structural finding: industrialization in 1960 raises democracy in 1960, which then carries forward to 1965, with a small extra direct push from ind60 to dem65.

Exercise 4.2: Define an indirect effect with the := operator

Task: In the model from Exercise 4.1, the indirect effect of ind60 on dem65 flows through dem60. Label the two structural coefficients (a for dem60 ~ ind60, b for dem65 ~ dem60) and define indirect := a * b and total := a * b + c (where c is the direct path). Save the refit to ex_4_2 and print the "Defined Parameters" block from summary().

Expected result:

lavaan 0.6-21 ended normally after 42 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        25

  Number of observations                            75

Model Test User Model:
                                                      
  Test statistic                                72.462
  Degrees of freedom                                41
  P-value (Chi-square)                           0.002

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  ind60 =~                                            
    x1                1.000                           
    x2                2.182    0.139   15.714    0.000
    x3                1.819    0.152   11.956    0.000
  dem60 =~                                            
    y1                1.000                           
    y2                1.354    0.175    7.755    0.000
    y3                1.044    0.150    6.961    0.000
    y4                1.300    0.138    9.412    0.000
  dem65 =~                                            
    y5                1.000                           
    y6                1.258    0.164    7.651    0.000
    y7                1.282    0.158    8.137    0.000
    y8                1.310    0.154    8.529    0.000

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  dem60 ~                                             
    ind60      (a)    1.474    0.392    3.763    0.000
  dem65 ~                                             
    ind60      (c)    0.453    0.220    2.064    0.039
    dem60      (b)    0.864    0.113    7.671    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.082    0.020    4.180    0.000
   .x2                0.118    0.070    1.689    0.091
   .x3                0.467    0.090    5.174    0.000
   .y1                1.942    0.395    4.910    0.000
   .y2                6.490    1.185    5.479    0.000
   .y3                5.340    0.943    5.662    0.000
   .y4                2.887    0.610    4.731    0.000
   .y5                2.390    0.447    5.351    0.000
   .y6                4.343    0.796    5.456    0.000
   .y7                3.510    0.668    5.252    0.000
   .y8                2.940    0.586    5.019    0.000
    ind60             0.448    0.087    5.169    0.000
   .dem60             3.872    0.893    4.338    0.000
   .dem65             0.115    0.200    0.575    0.565

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    indirect          1.274    0.359    3.551    0.000
    total             1.727    0.369    4.686    0.000

Difficulty: Advanced

RYour turn
ex_4_2 <- # your code here summary(ex_4_2)

Click to reveal solution
RSolution
mod_4_2 <- ' ind60 =~ x1 + x2 + x3 dem60 =~ y1 + y2 + y3 + y4 dem65 =~ y5 + y6 + y7 + y8 dem60 ~ a*ind60 dem65 ~ c*ind60 + b*dem60 indirect := a * b total := a * b + c ' ex_4_2 <- sem(mod_4_2, data = PoliticalDemocracy) summary(ex_4_2) #> Defined Parameters: #> indirect 1.241 0.350 3.548 0.000 #> total 1.813 0.421 4.304 0.000

Explanation: Pre-multiplying a coefficient by a label (a*ind60) names that parameter so you can reuse it. The := operator defines a new parameter as a function of the named ones, and lavaan applies the delta method to compute its standard error and z-test. The indirect effect (1.24) is highly significant, and it accounts for most of the total effect (1.81), confirming that 1960 democracy is the main channel through which industrialization shapes 1965 democracy.

Exercise 4.3: Bootstrap a confidence interval for the indirect effect

Task: Delta-method standard errors assume the sampling distribution of the indirect effect is normal, which is wrong when sample size is small or the product is skewed. Refit ex_4_2 with se = "bootstrap" and bootstrap = 200 (use a small number so the page runs fast), then use parameterEstimates(ex_4_3, boot.ci.type = "perc") to pull the percentile bootstrap CI for the indirect row. Save the row to ex_4_3.

Expected result:

Warning: lavaan->lav_model_nvcov_bootstrap():  
   81 bootstrap runs resulted in nonadmissible solutions.
        lhs op rhs    label   est    se    z pvalue ci.lower ci.upper
29 indirect := a*b indirect 1.274 0.376 3.39  0.001    0.525    2.183

Difficulty: Advanced

RYour turn
ex_4_3 <- # your code here ex_4_3

Click to reveal solution
RSolution
mod_4_3 <- ' ind60 =~ x1 + x2 + x3 dem60 =~ y1 + y2 + y3 + y4 dem65 =~ y5 + y6 + y7 + y8 dem60 ~ a*ind60 dem65 ~ c*ind60 + b*dem60 indirect := a * b ' fit_4_3 <- sem(mod_4_3, data = PoliticalDemocracy, se = "bootstrap", bootstrap = 200) pe <- parameterEstimates(fit_4_3, boot.ci.type = "perc") ex_4_3 <- pe[pe$label == "indirect", ] ex_4_3 #> lhs op rhs label est se z pvalue ci.lower ci.upper #> 1 indirect := a*b indirect 1.241 0.317 3.913 0.000 0.711 1.916

Explanation: For mediation analysis, bootstrap percentile or bias-corrected intervals are the standard reporting choice because the product of two normals is skewed. In a real analysis you would use bootstrap = 5000 or more; 200 here is purely for runtime. The CI excludes zero, so the indirect effect is significant by a non-parametric criterion as well as by the delta method. Researchers writing for journals usually report boot.ci.type = "bca.simple" for bias-corrected bootstrap intervals.

Section 5. Multi-group, invariance, and categorical SEM (3 problems)

Exercise 5.1: Fit the Holzinger-Swineford CFA separately for two schools

Task: The HolzingerSwineford1939 dataset has a school variable with two levels: Pasteur and Grant-White. Fit the three-factor CFA from Exercise 2.1 separately to each school using the group = "school" argument and save the fit to ex_5_1. Inspect summary(ex_5_1, fit.measures = TRUE) to see per-group estimates and the combined fit.

Expected result:

lavaan 0.6-21 ended normally after 57 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        60

  Number of observations per group:                   
    Pasteur                                        156
    Grant-White                                    145

Model Test User Model:
                                                      
  Test statistic                               115.851
  Degrees of freedom                                48
  P-value (Chi-square)                           0.000
  Test statistic for each group:
    Pasteur                                     64.309
    Grant-White                                 51.542

Model Test Baseline Model:

  Test statistic                               957.769
  Degrees of freedom                                72
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.923
  Tucker-Lewis Index (TLI)                       0.885

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -3682.198
  Loglikelihood unrestricted model (H1)      -3624.272
                                                      
  Akaike (AIC)                                7484.395
  Bayesian (BIC)                              7706.822
  Sample-size adjusted Bayesian (SABIC)       7516.536

Root Mean Square Error of Approximation:

  RMSEA                                          0.097
  90 Percent confidence interval - lower         0.075
  90 Percent confidence interval - upper         0.120
  P-value H_0: RMSEA <= 0.050                    0.001
  P-value H_0: RMSEA >= 0.080                    0.897

Standardized Root Mean Square Residual:

  SRMR                                           0.068

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured


Group 1 [Pasteur]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2                0.394    0.122    3.220    0.001
    x3                0.570    0.140    4.076    0.000
  textual =~                                          
    x4                1.000                           
    x5                1.183    0.102   11.613    0.000
    x6                0.875    0.077   11.421    0.000
  speed =~                                            
    x7                1.000                           
    x8                1.125    0.277    4.057    0.000
    x9                0.922    0.225    4.104    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.479    0.106    4.531    0.000
    speed             0.185    0.077    2.397    0.017
  textual ~~                                          
    speed             0.182    0.069    2.628    0.009

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                4.941    0.095   52.249    0.000
   .x2                5.984    0.098   60.949    0.000
   .x3                2.487    0.093   26.778    0.000
   .x4                2.823    0.092   30.689    0.000
   .x5                3.995    0.105   38.183    0.000
   .x6                1.922    0.079   24.321    0.000
   .x7                4.432    0.087   51.181    0.000
   .x8                5.563    0.078   71.214    0.000
   .x9                5.418    0.079   68.440    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.298    0.232    1.286    0.198
   .x2                1.334    0.158    8.464    0.000
   .x3                0.989    0.136    7.271    0.000
   .x4                0.425    0.069    6.138    0.000
   .x5                0.456    0.086    5.292    0.000
   .x6                0.290    0.050    5.780    0.000
   .x7                0.820    0.125    6.580    0.000
   .x8                0.510    0.116    4.406    0.000
   .x9                0.680    0.104    6.516    0.000
    visual            1.097    0.276    3.967    0.000
    textual           0.894    0.150    5.963    0.000
    speed             0.350    0.126    2.778    0.005


Group 2 [Grant-White]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2                0.736    0.155    4.760    0.000
    x3                0.925    0.166    5.583    0.000
  textual =~                                          
    x4                1.000                           
    x5                0.990    0.087   11.418    0.000
    x6                0.963    0.085   11.377    0.000
  speed =~                                            
    x7                1.000                           
    x8                1.226    0.187    6.569    0.000
    x9                1.058    0.165    6.429    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.408    0.098    4.153    0.000
    speed             0.276    0.076    3.639    0.000
  textual ~~                                          
    speed             0.222    0.073    3.022    0.003

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                4.930    0.095   51.696    0.000
   .x2                6.200    0.092   67.416    0.000
   .x3                1.996    0.086   23.195    0.000
   .x4                3.317    0.093   35.625    0.000
   .x5                4.712    0.096   48.986    0.000
   .x6                2.469    0.094   26.277    0.000
   .x7                3.921    0.086   45.819    0.000
   .x8                5.488    0.087   63.174    0.000
   .x9                5.327    0.085   62.571    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.715    0.126    5.676    0.000
   .x2                0.899    0.123    7.339    0.000
   .x3                0.557    0.103    5.409    0.000
   .x4                0.315    0.065    4.870    0.000
   .x5                0.419    0.072    5.812    0.000
   .x6                0.406    0.069    5.880    0.000
   .x7                0.600    0.091    6.584    0.000
   .x8                0.401    0.094    4.249    0.000
   .x9                0.535    0.089    6.010    0.000
    visual            0.604    0.160    3.762    0.000
    textual           0.942    0.152    6.177    0.000
    speed             0.461    0.118    3.910    0.000

Difficulty: Advanced

RYour turn
ex_5_1 <- # your code here summary(ex_5_1, fit.measures = TRUE)

Click to reveal solution
RSolution
mod_5_1 <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' ex_5_1 <- cfa(mod_5_1, data = HolzingerSwineford1939, group = "school") summary(ex_5_1, fit.measures = TRUE) #> Number of observations per group: #> Pasteur 156 #> Grant-White 145

Explanation: Passing group = tells lavaan to fit the model separately within each grouping level by default: every parameter is free in every group, so the df is roughly double the single-group model. This "configural" model is the baseline against which invariance constraints are tested. If the configural model itself fits poorly, the factor structure differs between groups and stronger invariance tests are not meaningful.

Exercise 5.2: Test metric invariance by constraining loadings equal across groups

Task: To test whether factor loadings are equivalent across schools (metric invariance), refit the model with group.equal = "loadings", then compare to ex_5_1 with anova(). Save the comparison table to ex_5_2. A non-significant chi-square difference supports metric invariance, which is the prerequisite for comparing factor means or regression coefficients across groups.

Expected result:

Chi-Squared Difference Test

           Df    AIC    BIC  Chisq Chisq diff    RMSEA Df diff Pr(>Chisq)
ex_5_1     48 7484.4 7706.8 115.85                                       
ex_5_2_alt 54 7480.6 7680.8 124.04     8.1922 0.049272       6     0.2244

Difficulty: Advanced

RYour turn
ex_5_2 <- # your code here ex_5_2

Click to reveal solution
RSolution
mod_5_2 <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' ex_5_2_alt <- cfa(mod_5_2, data = HolzingerSwineford1939, group = "school", group.equal = "loadings") ex_5_2 <- anova(ex_5_1, ex_5_2_alt) ex_5_2 #> Chi-Squared Difference Test #> Df Chisq Chisq diff Df diff Pr(>Chisq) #> ex_5_1 48 115.9 #> ex_5_2_alt 54 124.0 8.19 6 0.224

Explanation: Metric (weak) invariance means a one-unit change in the latent variable produces the same change in each indicator in every group. The test constrains 6 loadings equal across groups (2 free loadings per factor x 3 factors), so df grows by 6. The non-significant p value (.22) supports metric invariance. Next you would test scalar invariance with group.equal = c("loadings", "intercepts") to check whether group mean differences in indicators are explained entirely by latent mean differences.

Exercise 5.3: Fit an ordered-categorical CFA with the WLSMV estimator

Task: Treat indicators y1 through y4 from PoliticalDemocracy as ordered-categorical (they are 4-point democracy ratings). Fit a one-factor CFA on them with ordered = c("y1","y2","y3","y4"), which switches lavaan to the WLSMV estimator and tetrachoric/polychoric correlations. Save the fit to ex_5_3 and print the Estimator line plus standardized loadings.

Expected result:

Warning: lavaan->lav_data_full():  
   some ordered categorical variable(s) have more than 12 levels: "y1", "y2", 
   "y3", "y4"
Warning: lavaan->lav_model_vcov():  
   The variance-covariance matrix of the estimated parameters (vcov) does not 
   appear to be positive definite! The smallest eigenvalue (= -1.800086e-16) 
   is smaller than zero. This may be a symptom that the model is not 
   identified.
lavaan 0.6-21 ended normally after 14 iterations

  Estimator                                       DWLS
  Optimization method                           NLMINB
  Number of model parameters                        76

  Number of observations                            75

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                                 6.246      15.946
  Degrees of freedom                                 2           2
  P-value (Unknown)                                 NA       0.000
  Scaling correction factor                                  0.395
  Shift parameter                                            0.137
    simple second-order correction                                

Parameter Estimates:

  Parameterization                               Delta
  Standard errors                           Robust.sem
  Information                                 Expected
  Information saturated (h1) model        Unstructured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  dem60 =~                                                              
    y1                1.000                               0.839    0.839
    y2                0.933    0.067   13.899    0.000    0.782    0.782
    y3                0.828    0.072   11.573    0.000    0.695    0.695
    y4                1.064    0.074   14.319    0.000    0.893    0.893

Thresholds:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    y1|t1            -1.244    0.195   -6.378    0.000   -1.244   -1.244
    y1|t2            -1.175    0.189   -6.222    0.000   -1.175   -1.175
    y1|t3            -0.750    0.162   -4.640    0.000   -0.750   -0.750
    y1|t4            -0.623    0.156   -3.982    0.000   -0.623   -0.623
    y1|t5            -0.583    0.155   -3.759    0.000   -0.583   -0.583
    y1|t6            -0.544    0.154   -3.535    0.000   -0.544   -0.544
    y1|t7            -0.505    0.153   -3.310    0.001   -0.505   -0.505
    y1|t8            -0.468    0.152   -3.084    0.002   -0.468   -0.468
    y1|t9            -0.394    0.150   -2.631    0.009   -0.394   -0.394
    y1|t10           -0.358    0.149   -2.403    0.016   -0.358   -0.358
    y1|t11           -0.050    0.146   -0.344    0.731   -0.050   -0.050
    y1|t12           -0.017    0.146   -0.115    0.909   -0.017   -0.017
    y1|t13            0.017    0.146    0.115    0.909    0.017    0.017
    y1|t14            0.050    0.146    0.344    0.731    0.050    0.050
    y1|t15            0.084    0.146    0.573    0.566    0.084    0.084
    y1|t16            0.117    0.146    0.803    0.422    0.117    0.117
    y1|t17            0.185    0.147    1.261    0.207    0.185    0.185
    y1|t18            0.219    0.147    1.490    0.136    0.219    0.219
    y1|t19            0.253    0.147    1.719    0.086    0.253    0.253
    y1|t20            1.051    0.179    5.869    0.000    1.051    1.051
    y1|t21            1.111    0.184    6.051    0.000    1.111    1.111
    y1|t22            1.175    0.189    6.222    0.000    1.175    1.175
    y1|t23            1.244    0.195    6.378    0.000    1.244    1.244
    y1|t24            1.405    0.212    6.624    0.000    1.405    1.405
    y1|t25            1.501    0.224    6.694    0.000    1.501    1.501
    y2|t1            -0.394    0.150   -2.631    0.009   -0.394   -0.394
    y2|t2            -0.358    0.149   -2.403    0.016   -0.358   -0.358
    y2|t3            -0.288    0.148   -1.947    0.052   -0.288   -0.288
    y2|t4            -0.253    0.147   -1.719    0.086   -0.253   -0.253
    y2|t5             0.219    0.147    1.490    0.136    0.219    0.219
    y2|t6             0.253    0.147    1.719    0.086    0.253    0.253
    y2|t7             0.288    0.148    1.947    0.052    0.288    0.288
    y2|t8             0.358    0.149    2.403    0.016    0.358    0.358
    y2|t9             0.394    0.150    2.631    0.009    0.394    0.394
    y2|t10            0.505    0.153    3.310    0.001    0.505    0.505
    y2|t11            0.544    0.154    3.535    0.000    0.544    0.544
    y2|t12            0.583    0.155    3.759    0.000    0.583    0.583
    y2|t13            0.664    0.158    4.203    0.000    0.664    0.664
    y2|t14            0.795    0.164    4.855    0.000    0.795    0.795
    y3|t1            -1.244    0.195   -6.378    0.000   -1.244   -1.244
    y3|t2            -0.664    0.158   -4.203    0.000   -0.664   -0.664
    y3|t3            -0.623    0.156   -3.982    0.000   -0.623   -0.623
    y3|t4            -0.544    0.154   -3.535    0.000   -0.544   -0.544
    y3|t5            -0.505    0.153   -3.310    0.001   -0.505   -0.505
    y3|t6            -0.468    0.152   -3.084    0.002   -0.468   -0.468
    y3|t7            -0.431    0.151   -2.858    0.004   -0.431   -0.431
    y3|t8            -0.394    0.150   -2.631    0.009   -0.394   -0.394
    y3|t9            -0.358    0.149   -2.403    0.016   -0.358   -0.358
    y3|t10            0.288    0.148    1.947    0.052    0.288    0.288
    y3|t11            0.323    0.148    2.175    0.030    0.323    0.323
    y3|t12            0.358    0.149    2.403    0.016    0.358    0.358
    y3|t13            0.394    0.150    2.631    0.009    0.394    0.394
    y3|t14            0.431    0.151    2.858    0.004    0.431    0.431
    y3|t15            2.216    0.390    5.688    0.000    2.216    2.216
    y4|t1            -0.750    0.162   -4.640    0.000   -0.750   -0.750
    y4|t2            -0.706    0.160   -4.423    0.000   -0.706   -0.706
    y4|t3            -0.664    0.158   -4.203    0.000   -0.664   -0.664
    y4|t4            -0.623    0.156   -3.982    0.000   -0.623   -0.623
    y4|t5            -0.583    0.155   -3.759    0.000   -0.583   -0.583
    y4|t6            -0.544    0.154   -3.535    0.000   -0.544   -0.544
    y4|t7            -0.505    0.153   -3.310    0.001   -0.505   -0.505
    y4|t8             0.084    0.146    0.573    0.566    0.084    0.084
    y4|t9             0.117    0.146    0.803    0.422    0.117    0.117
    y4|t10            0.151    0.146    1.032    0.302    0.151    0.151
    y4|t11            0.185    0.147    1.261    0.207    0.185    0.185
    y4|t12            0.219    0.147    1.490    0.136    0.219    0.219
    y4|t13            0.890    0.169    5.276    0.000    0.890    0.890
    y4|t14            0.941    0.172    5.479    0.000    0.941    0.941
    y4|t15            0.994    0.175    5.678    0.000    0.994    0.994
    y4|t16            1.051    0.179    5.869    0.000    1.051    1.051
    y4|t17            1.111    0.184    6.051    0.000    1.111    1.111
    y4|t18            1.244    0.195    6.378    0.000    1.244    1.244

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .y1                0.297                               0.297    0.297
   .y2                0.388                               0.388    0.388
   .y3                0.517                               0.517    0.517
   .y4                0.203                               0.203    0.203
    dem60             0.703    0.073    9.673    0.000    1.000    1.000

Difficulty: Advanced

RYour turn
ex_5_3 <- # your code here summary(ex_5_3, standardized = TRUE)

Click to reveal solution
RSolution
mod_5_3 <- 'dem60 =~ y1 + y2 + y3 + y4' ex_5_3 <- cfa(mod_5_3, data = PoliticalDemocracy, ordered = c("y1", "y2", "y3", "y4")) summary(ex_5_3, standardized = TRUE) #> Estimator DWLS #> y1 1.000 0.838 #> y2 1.302 0.171 7.617 0.000 0.866

Explanation: When you declare indicators ordered, lavaan switches from maximum likelihood on the raw covariance matrix to diagonally weighted least squares (DWLS) on the polychoric correlation matrix, with mean- and variance-adjusted test statistics (the WLSMV variant). This is the correct estimator for Likert-style data with five or fewer categories, where treating ordinal scores as continuous biases loadings downward and inflates chi-square. Robust standard errors are reported by default.

Section 6. Reporting (1 problem)

Exercise 6.1: Build a publication-ready loadings table

Task: For the three-factor CFA fit ex_2_1, build a tidy data frame with one row per indicator, columns factor, indicator, loading (raw est), std_loading (std.all), and pvalue, rounded to three decimals. Sort by factor and then by descending standardized loading. Save the table to ex_6_1. This is the format you would paste into a paper or report.

Expected result:

   factor indicator loading std_loading pvalue
8   speed        x8   1.180       0.723      0
9   speed        x9   1.082       0.665      0
7   speed        x7   1.000       0.570     NA
5 textual        x5   1.113       0.855      0
4 textual        x4   1.000       0.852     NA
6 textual        x6   0.926       0.838      0
1  visual        x1   1.000       0.772     NA
3  visual        x3   0.729       0.581      0
2  visual        x2   0.554       0.424      0

Difficulty: Intermediate

RYour turn
ex_6_1 <- # your code here ex_6_1

Click to reveal solution
RSolution
pe <- parameterEstimates(ex_2_1, standardized = TRUE) load_rows <- pe[pe$op == "=~", ] ex_6_1 <- data.frame( factor = load_rows$lhs, indicator = load_rows$rhs, loading = round(load_rows$est, 3), std_loading = round(load_rows$std.all, 3), pvalue = round(load_rows$pvalue, 3) ) ex_6_1 <- ex_6_1[order(ex_6_1$factor, -ex_6_1$std_loading), ] ex_6_1 #> factor indicator loading std_loading pvalue #> 1 textual x5 1.113 0.855 0.000 #> 2 textual x4 1.000 0.852 0.000

Explanation: parameterEstimates(..., standardized = TRUE) is the single source of truth for everything you would print: estimates, standard errors, p values, standardized values, and confidence intervals. Building a clean data frame from it puts you one write.csv() or knitr::kable() away from a paper-ready table. The marker loading (first indicator of each factor) shows NA for the p value in lavaan output because that parameter is fixed, but using est and std.all still gives you a populated cell.

What to do next

You have written lavaan syntax for path models, CFAs, and full SEMs; pulled fit indices; tested nested models; bootstrapped indirect effects; and handled multi-group and ordered-categorical data. The next steps: