Background: Invasive Escherichia coli disease (IED), also known as invasive extraintestinal pathogenic E. coli disease, is a leading cause of sepsis and bacteremia in older adults that can result in hospitalization and sometimes death and is frequently associated with antimicrobial resistance. Moreover, certain patient characteristics may increase the risk of developing IED. This study aimed to validate a machine learning approach for the unbiased identification of potential risk factors that correlate with an increased risk for IED.
Methods: Using electronic health records from 6.5 million people, an XGBoost model was trained to predict IED from 663 distinct patient features, and the most predictive features were identified as potential risk factors. Using Shapley Additive predictive values, the specific relationships between features and the outcome of developing IED were characterized.
Results: The model independently predicted that older age, a known risk factor for IED, increased the chance of developing IED. The model also predicted that a history of ≥ 1 urinary tract infection, as well as more frequent and/or more recent urinary tract infections, and ≥ 1 emergency department or inpatient visit increased the risk for IED. Outcomes were used to calculate risk ratios in selected subpopulations, demonstrating the impact of individual or combinations of features on the incidence of IED.
Conclusion: This study illustrates the viability and validity of using large electronic health records datasets and machine learning to identify correlating features and potential risk factors for infectious diseases, including IED. The next step is the independent validation of potential risk factors using conventional methods.
Keywords: Disease burden per status today; Electronic health records; Invasive E. coli disease; Machine learning.
© 2024. The Author(s).