BAYESIAN NESTED LATENT CLASS MODELS FOR CAUSE-OF-DEATH ASSIGNMENT USING VERBAL AUTOPSIES ACROSS MULTIPLE DOMAINS

Zehang Richard Li; Zhenke Wu; Irena Chen; Samuel J Clark

doi:10.1214/23-aoas1826

BAYESIAN NESTED LATENT CLASS MODELS FOR CAUSE-OF-DEATH ASSIGNMENT USING VERBAL AUTOPSIES ACROSS MULTIPLE DOMAINS

Ann Appl Stat. 2024 Jun;18(2):1137-1159. doi: 10.1214/23-aoas1826. Epub 2024 Apr 5.

Authors

Zehang Richard Li¹, Zhenke Wu², Irena Chen³, Samuel J Clark⁴

Affiliations

¹ Department of Statistics, University of California, Santa Cruz.
² Department of Biostatistics, University of Michigan.
³ Department of Digital and Computational Demography, Max Planck Institute for Demographic Research.
⁴ Department of Sociology, The Ohio State University.

Abstract

Understanding cause-specific mortality rates is crucial for monitoring population health and designing public health interventions. Worldwide, two-thirds of deaths do not have a cause assigned. Verbal autopsy (VA) is a well-established tool to collect information describing deaths outside of hospitals by conducting surveys to caregivers of a deceased person. It is routinely implemented in many low- and middle-income countries. Statistical algorithms to assign cause of death using VAs are typically vulnerable to the distribution shift between the data used to train the model and the target population. This presents a major challenge for analyzing VAs, as labeled data are usually unavailable in the target population. This article proposes a latent class model framework for VA data (LCVA) that jointly models VAs collected over multiple heterogeneous domains, assigns causes of death for out-of-domain observations and estimates cause-specific mortality fractions for a new domain. We introduce a parsimonious representation of the joint distribution of the collected symptoms using nested latent class models and develop a computationally efficient algorithm for posterior inference. We demonstrate that LCVA outperforms existing methods in predictive performance and scalability. Supplementary Material and reproducible analysis codes are available online. The R package LCVA implementing the method is available on GitHub (https://github.com/richardli/LCVA).

Keywords: Domain adaptation; data shift; dependent binary data; mixture model; quantification learning.

Abstract

Grants and funding