The Impact of Sparse Datasets When Harmonizing Data from Studies with Different Measures of the Same Construct

Prev Sci. 2024 Aug;25(6):989-1002. doi: 10.1007/s11121-024-01704-8. Epub 2024 Jul 18.

Abstract

Prevention science has increasingly turned to integrative data analysis (IDA) to combine individual participant-level data from multiple studies of the same topic, allowing us to evaluate overall effect size, test and model heterogeneity, and examine mediation. Studies included in IDA often use different measures for the same construct, leading to sparse datasets. We introduce a graph theory method for summarizing patterns of sparseness and use simulations to explore the impact of different patterns on measurement bias within three different measurement models: a single common factor, a hierarchical model, and a bifactor model. We simulated 1000 datasets with varying levels of sparseness and used Bayesian methods to estimate model parameters and evaluate bias. Results clarified that bias due to sparseness will depend on the strength of the general factor, the measurement model employed, and the level of indirect linkage among measures. We provide an example using a synthesis dataset that combined data on youth depression from 4146 youth who participated in 16 randomized field trials of prevention programs. Given that different synthesis datasets will embody different patterns of sparseness, we conclude by recommending that investigators use simulation methods to explore the potential for bias given the sparseness patterns they encounter.

Keywords: Data sparseness; Harmonization; Integrative data analysis.

MeSH terms

  • Adolescent
  • Bayes Theorem*
  • Data Analysis
  • Depression
  • Humans