Latent Variable Forests for Latent Variable Score Estimation

Educ Psychol Meas. 2024 Dec;84(6):1138-1172. doi: 10.1177/00131644241237502. Epub 2024 Apr 1.

Abstract

We develop a latent variable forest (LV Forest) algorithm for the estimation of latent variable scores with one or more latent variables. LV Forest estimates unbiased latent variable scores based on confirmatory factor analysis (CFA) models with ordinal and/or numerical response variables. Through parametric model restrictions paired with a nonparametric tree-based machine learning approach, LV Forest estimates latent variable scores using models that are unbiased with respect to relevant subgroups in the population. This way, estimated latent variable scores are interpretable with respect to systematic influences of covariates without being biased by these variables. By building a tree ensemble, LV Forest takes parameter heterogeneity in latent variable modeling into account to capture subgroups with both good model fit and stable parameter estimates. We apply LV Forest to simulated data with heterogeneous model parameters as well as to real large-scale survey data. We show that LV Forest improves the accuracy of score estimation if parameter heterogeneity is present.

Keywords: confirmatory factor analysis; differential item functioning; factor scores; item response theory; machine learning.