Estimating the Prevalence of Injection Drug Use Among Acute Hepatitis C Cases From a National Surveillance System: Application of Random Forest-Based Multiple Imputation

J Public Health Manag Pract. 2024 Sep-Oct;30(5):733-743. doi: 10.1097/PHH.0000000000002014. Epub 2024 Jul 22.

Abstract

Background: Injection drug use (IDU) is a major contributor to the syndemic of viral hepatitis, human immunodeficiency virus, and drug overdose. However, information on IDU is frequently missing in national viral hepatitis surveillance data, which limits our understanding of the full extent of IDU-associated infections. Multiple imputation by chained equations (MICE) has become a popular approach to address missing data, but its application for IDU imputation is less studied.

Methods: Using the 2019-2021 National Notifiable Diseases Surveillance System acute hepatitis C case data and publicly available county-level measures, we evaluated listwise deletion (LD) and 3 models imputing missing IDU data through MICE: parametric logistic regression, semi-parametric predictive mean matching (PMM), and nonparametric random forest (RF) (both standard RF [sRF] and fast implementation of RF [fRF]).

Results: The estimated IDU prevalence among acute hepatitis C cases increased from 63.5% by LD to 65.1% by logistic regression, 66.9% by PMM, 76.0% by sRF, and 85.1% by fRF. Evaluation studies showed that RF-based MICE imputation, especially fRF, has the highest accuracy (as measured by smallest raw bias, percent bias, and root mean square error) and highest efficiency (as measured by smallest 95% confidence interval width) compared to LD and other models. Sensitivity analyses indicated that fRF remained robust when data were missing not at random.

Conclusion: Our analysis suggested that RF-based MICE imputation, especially fRF, could be a valuable approach for addressing missing IDU data in the context of population-based surveillance systems like National Notifiable Diseases Surveillance System. The inclusion of imputed IDU data may enhance the effectiveness of future surveillance and prevention efforts for the IDU-driven syndemic.

MeSH terms

  • Epidemiological Monitoring
  • Hepatitis C* / epidemiology
  • Humans
  • Logistic Models
  • Population Surveillance / methods
  • Prevalence
  • Random Forest
  • Substance Abuse, Intravenous* / complications
  • Substance Abuse, Intravenous* / epidemiology