SEM Fit Indices in R: CFI, RMSEA, SRMR, What Counts as Good Fit?
Three numbers, CFI, RMSEA, and SRMR, decide whether reviewers take your CFA or SEM model seriously. CFI is close to 1 when your model beats a baseline of "everything is uncorrelated"; RMSEA is close to 0 when the per-degree-of-freedom misfit is small; SRMR is close to 0 when the average residual correlation is small. This guide computes all three from a real lavaan fit, explains where the conventional cutoffs come from, and shows what to do when the numbers say your model does not fit.
How do you compute CFI, RMSEA, and SRMR in R?
Before debating what counts as "good," let's see all three indices for a real CFA fit. The block below loads lavaan, fits the classic three-factor model on the HolzingerSwineford1939 data, and pulls just the indices that matter. Reading these numbers in this exact order is the workflow you will repeat for every model you fit.
Three useful facts pop out. The chi-square test rejects exact fit (p < .001), which is the norm at N = 301 because chi-square is sensitive to sample size. CFI is 0.931, just below the conventional 0.95 line. RMSEA is 0.092, above the 0.06 line. SRMR is 0.065, comfortably under the 0.08 line. So the residuals look fine on average, but the model still leaves more structure on the table than you would want. We will fix that in Section 6.
Try it: Pull just the chi-square statistic and its degrees of freedom from fit_hs into a named numeric vector called ex_chisq. You will reuse these in Section 2.
Click to reveal solution
Explanation: fitMeasures() accepts a character vector of names; pass exactly the indices you need and you get a named numeric vector back.
What does CFI actually compare your model to?
CFI stands for Comparative Fit Index. The "comparative" part is the whole story: CFI compares your model's misfit to the misfit of a worst-case baseline in which every observed variable is independent (no correlations at all). If your model is much better than that baseline, CFI is close to 1; if your model is barely better, CFI is close to 0.
The formula expresses that intuition directly:
$$\text{CFI} = 1 - \frac{\max(\chi^2_{\text{target}} - df_{\text{target}},\ 0)}{\max(\chi^2_{\text{target}} - df_{\text{target}},\ \chi^2_{\text{baseline}} - df_{\text{baseline}},\ 0)}$$
Where:
- $\chi^2_{\text{target}}$, $df_{\text{target}}$ are the chi-square and degrees of freedom for the model you fit.
- $\chi^2_{\text{baseline}}$, $df_{\text{baseline}}$ are the same quantities for the independence (baseline) model.
The max(..., 0) parts protect against negative numerators when a model fits better than its degrees of freedom would predict. lavaan computes the baseline for you and reports CFI; the next block reproduces the calculation by hand so the formula is not magic.
The hand calculation matches lavaan's reported CFI to three decimals. The intuition to take away: the denominator is "how bad could things possibly get," and the ratio is "what fraction of that worst case does your model leave on the table." CFI = 0.931 means your model removes about 93 percent of the baseline's excess misfit. The textbook bar of 0.95 asks for 95 percent.
Try it: Compute TLI by hand for fit_hs using the formula TLI = (chi2_null/df_null - chi2_target/df_target) / (chi2_null/df_null - 1). Save it to ex_tli and check it against fitMeasures(fit_hs, "tli").
Click to reveal solution
Explanation: TLI rewards parsimony by dividing each chi-square by its df before comparing. That is why TLI (0.896) is below CFI (0.931) for the same model: TLI is harsher on models with many free parameters.
Why does RMSEA penalise complexity (and small N)?
RMSEA is the Root Mean Square Error of Approximation. It tries to estimate the per-degree-of-freedom population discrepancy between your model and the truth, scaled by sample size. The formula is:
$$\text{RMSEA} = \sqrt{\frac{\max(\chi^2 - df,\ 0)}{df \cdot (N - 1)}}$$
Where $\chi^2$ and $df$ come from the target model and $N$ is the sample size. Two implications matter for daily use. First, dividing by $df$ rewards parsimonious models: adding free parameters that do not improve fit will raise RMSEA. Second, dividing by $(N-1)$ means small samples produce inflated RMSEA estimates even for correct models, which is why the 90% confidence interval is more important than the point estimate.
Read the four numbers as one sentence: the best estimate of population RMSEA is 0.092, the 90% CI runs from 0.071 to 0.114, and the test of "RMSEA ≤ 0.05" returns p = 0.001 (we reject close fit). Even the lower CI bound (0.071) is above the 0.06 conventional threshold, so the misfit is not just sampling noise. For a small-N study (say N = 80) you might see a point estimate of 0.05 with a CI from 0.00 to 0.13; the CI tells you the data cannot distinguish good fit from bad and the point estimate is overconfident.
Try it: Extract just the upper bound of the 90% RMSEA CI into ex_rmsea_upper. A strict test rejects close fit when the upper CI exceeds 0.10. Print TRUE if the strict test rejects, FALSE if it does not.
Click to reveal solution
Explanation: The upper 90% CI is 0.114, above 0.10, so even the optimistic edge of the interval rules out close fit. This is a stronger conclusion than reading the point estimate alone.
When is SRMR more honest than the other two?
SRMR is the Standardized Root Mean Square Residual. While CFI and RMSEA are derived from the chi-square statistic and inherit its quirks, SRMR is computed directly from the gap between the observed correlation matrix and the model-implied correlation matrix. It has no dependence on the baseline model and very little dependence on the estimator, which makes it the most stable of the three indices when you switch from ML to robust ML or to ordered-categorical estimators.
The diagonal is zero by construction (each variable correlates 1.0 with itself, and the model reproduces variances exactly). The off-diagonal entries are the model's leftover correlations: how far each pair of indicators differs from what the model implies. Most are small (|r| < 0.05), which is why SRMR is comfortable at 0.065. The largest residual in the snippet above, 0.080 between x1 and x4, hints that the model could be improved by allowing those two to share variance beyond what their factors explain.
Try it: Find the indicator pair with the largest absolute off-diagonal residual in resid_cor. Save the value to ex_max_resid. The pair with the biggest residual is your first candidate for respecification.
Click to reveal solution
Explanation: Setting the diagonal to NA keeps it out of the max. The largest absolute residual (≈ 0.11) is between x7 and x9, two speed indicators, suggesting their relationship goes beyond the latent speed factor.
What thresholds make a fit "good"?
The most cited cutoffs come from Hu and Bentler (1999), who studied how often each index correctly accepted true models and rejected misspecified ones across thousands of simulated datasets. Their two-index recommendation is the rule most reviewers expect today: report SRMR with either CFI or RMSEA, and meet the cutoffs for both.
| Index | Excellent | Acceptable | Poor | Source |
|---|---|---|---|---|
| CFI | ≥ 0.95 | 0.90 – 0.95 | < 0.90 | Hu & Bentler (1999); Bentler (1990) |
| RMSEA | ≤ 0.06 | 0.06 – 0.08 | > 0.10 | Hu & Bentler (1999); Browne & Cudeck (1992) |
| SRMR | ≤ 0.08 | 0.08 – 0.10 | > 0.10 | Hu & Bentler (1999) |
Three caveats kill any blanket use of this table. Hu and Bentler simulated continuous, normally distributed indicators with N ≥ 250 and small-to-moderate models; outside that envelope, the cutoffs are advisory rather than law. With ordered-categorical data fit by WLSMV, RMSEA and CFI tend to look artificially better, so use the scaled/robust versions reported by fitMeasures() and demand a tougher CFI (Xia & Yang, 2019). With very large models (50+ indicators), CFI drifts down even when the model is correct, so a CFI of 0.92 on a big model is not the same red flag as 0.92 on a small one (Shi, Lee & Maydeu-Olivares, 2019).
.scaled columns when you use a robust estimator. Calls like cfa(model, data, estimator = "MLR") or cfa(model, data, ordered = TRUE) add columns such as cfi.scaled and rmsea.scaled to fitMeasures(). Report those, not the unscaled defaults; otherwise reviewers familiar with robust SEM will flag the inconsistency.Try it: A colleague reports CFI = 0.93, RMSEA = 0.07, SRMR = 0.05 for a CFA on continuous data with N = 400. Save the verdict ("acceptable", "excellent", or "poor") for each index to ex_verdict and decide whether the model passes the two-index rule.
Click to reveal solution
Explanation: The two-index rule (Hu & Bentler) wants either (CFI excellent AND SRMR excellent) or (RMSEA excellent AND SRMR excellent). Here both CFI and RMSEA are merely acceptable, so the model fails the strict version of the rule. It would pass the older Browne & Cudeck (1992) "adequate fit" bar.
What do you do when fit is bad?
When the indices say no, the next step is diagnosis, not deletion. lavaan's modindices() ranks the constraints (parameters fixed at zero) by the chi-square drop you would gain by freeing them. The biggest hits are candidates for respecification, but only if you can defend them theoretically. A residual covariance between two indicators is defensible when those indicators share method (same wording, same scale, same time of measurement); a cross-loading is defensible when the indicator is theoretically related to a second factor.
Two patterns show up. Indicator x9 wants to load on visual in addition to speed (mi = 36.4, epc = +0.58), which is plausible because x9 ("straight and curved capitals") has a visual-perception component. Indicators x7, x8, x9 also share residual covariance among themselves, hinting at a method effect from being administered as timed speeded tasks. Adding the cross-loading is the more defensible move; it has substantive justification beyond "this lowers chi-square."
Adding the single cross-loading drops chi-square by 33 points for one degree of freedom, lifts CFI from 0.931 to 0.967 (now above 0.95), pulls RMSEA down to 0.066 (one notch above the 0.06 cutoff), and lowers SRMR to 0.052. Three of the four indicators of fit improved meaningfully. That is a publishable respecification, and crucially, it has a substantive story.
Try it: Try a different respecification: instead of the visual =~ x9 cross-loading, add the x7 ~~ x8 residual covariance. Fit the new model and store it as ex_fit_v3. Print its CFI to compare with v2's 0.967.
Click to reveal solution
Explanation: The residual-covariance route improves fit too, but less than the cross-loading. When two respecifications both have a defensible story, prefer the one that improves fit more and changes parameter interpretation least; here that is the cross-loading because it adds substantive content (x9 measures both abilities).
Practice Exercises
Exercise 1: Fit on a subsample and read the indices
Subset HolzingerSwineford1939 to school = "Pasteur" and refit HS_model. Save the fit to fit_pasteur and report CFI, RMSEA, SRMR. Decide whether each index passes the Hu & Bentler cutoff.
Click to reveal solution
Explanation: With N ≈ 156 the indices look similar to the full sample but slightly worse, and the RMSEA point estimate alone fails the 0.06 cutoff. Inspect the 90% CI before condemning the model: fitMeasures(fit_pasteur, "rmsea.ci.upper") will be near 0.12, so even the optimistic bound rules out close fit. Same conclusion as before: the three-factor model is close, not clean.
Exercise 2: Compare a one-factor g model to the three-factor model
Fit a single-factor g model where one latent variable explains all nine indicators. Save the fit to fit_g. Compare its CFI, RMSEA, SRMR to fit_hs in a compare_g data frame and decide which model wins.
Click to reveal solution
Explanation: The one-factor model collapses on every index: CFI drops below 0.7, RMSEA jumps above 0.18, SRMR doubles. This is exactly the situation fit indices are designed to flag. The three-factor model wins decisively, even though it is not yet "excellent" by Hu & Bentler.
Putting It All Together
Here is the full workflow in one block: fit, read the trio, diagnose with modification indices, respecify, refit, and confirm improvement. This is the loop you will run on every CFA or SEM you build.
The respecified model passes the CFI and SRMR cutoffs and sits one notch above the strict RMSEA cutoff of 0.06. That combination is reportable as "good fit on two of three primary indices, marginal on RMSEA," which is honest and defensible.
Summary
| Index | What it asks | One-line formula | Cutoff (good) | Main caveat |
|---|---|---|---|---|
| CFI | How much better than uncorrelated baseline? | $1 - (\chi^2 - df)_T / (\chi^2 - df)_B$ | ≥ 0.95 | Drifts down for very large models |
| RMSEA | Per-df misfit, scaled by N | $\sqrt{(\chi^2 - df)/(df \cdot (N-1))}$ | ≤ 0.06 (with CI) | Inflated for small models / small N |
| SRMR | Average leftover correlation | RMS of off-diagonal standardised residuals | ≤ 0.08 | Insensitive to misspecified factor structure |
| TLI | Like CFI but penalises non-parsimony | $(\chi^2_B/df_B - \chi^2_T/df_T) / (\chi^2_B/df_B - 1)$ | ≥ 0.95 | Moves with CFI; report whichever your audience expects |
Three takeaways: report at least CFI, RMSEA (with its 90% CI), and SRMR for any CFA or SEM. Apply Hu & Bentler cutoffs by default but adjust for sample size, model size, and estimator. When fit is poor, use modification indices and residuals to diagnose, but only free parameters you can defend with theory.
References
- Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1-55. Link
- Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods & Research, 21(2), 230-258. Link
- Kenny, D. A. (2024). Measuring Model Fit. Link
- Kline, R. B. (2016). Principles and Practice of Structural Equation Modeling (4th ed.). Guilford Press.
- Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2). Link
- lavaan documentation,
fitMeasures()reference. Link - Kenny, D. A., Kaniskan, B., & McCoach, D. B. (2015). The performance of RMSEA in models with small degrees of freedom. Sociological Methods & Research, 44(3), 486-507. Link
- Xia, Y., & Yang, Y. (2019). RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data. Behavior Research Methods, 51(1), 409-428. Link
Continue Learning
- SEM and CFA in R With lavaan, the parent tutorial that walks the full path from model syntax to respecification.
- Exploratory Factor Analysis in R, when you do not have a hypothesised factor structure to test, EFA finds one.
- Linear Regression Assumptions in R, the OLS analogue of fit indices: residual checks, normality, and influence diagnostics.