Generalizable pipeline for constructing HIV risk prediction models across electronic health record systems

J Am Med Inform Assoc. 2024 Feb 16;31(3):666-673. doi: 10.1093/jamia/ocad217.

Abstract

Objective: The HIV epidemic remains a significant public health issue in the United States. HIV risk prediction models could be beneficial for reducing HIV transmission by helping clinicians identify patients at high risk for infection and refer them for testing. This would facilitate initiation on treatment for those unaware of their status and pre-exposure prophylaxis for those uninfected but at high risk. Existing HIV risk prediction algorithms rely on manual construction of features and are limited in their application across diverse electronic health record systems. Furthermore, the accuracy of these models in predicting HIV in females has thus far been limited.

Materials and methods: We devised a pipeline for automatic construction of prediction models based on automatic feature engineering to predict HIV risk and tested our pipeline on a local electronic health records system and a national claims data. We also compared the performance of general models to female-specific models.

Results: Our models obtain similarly good performance on both health record datasets despite difference in represented populations and data availability (AUC = 0.87). Furthermore, our general models obtain good performance on females but are also improved by constructing female-specific models (AUC between 0.81 and 0.86 across datasets).

Discussion and conclusions: We demonstrated that flexible construction of prediction models performs well on HIV risk prediction across diverse health records systems and perform as well in predicting HIV risk in females, making deployment of such models into existing health care systems tangible.

Keywords: HIV; HIV prevention; electronic health records; predictive modeling; risk prediction.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Electronic Health Records*
  • Female
  • HIV Infections* / prevention & control
  • Humans
  • Machine Learning
  • Software
  • United States