Machine learning to improve the understanding of rabies epidemiology in low surveillance settings

Ravikiran Keshavamurthy; Cassandra Boutelle; Yoshinori Nakazawa; Haim Joseph; Dady W Joseph; Pierre Dilius; Andrew D Gibson; Ryan M Wallace

doi:10.1038/s41598-024-76089-3

Machine learning to improve the understanding of rabies epidemiology in low surveillance settings

Sci Rep. 2024 Oct 28;14(1):25851. doi: 10.1038/s41598-024-76089-3.

Authors

Ravikiran Keshavamurthy¹, Cassandra Boutelle², Yoshinori Nakazawa², Haim Joseph³, Dady W Joseph³, Pierre Dilius³, Andrew D Gibson⁴, Ryan M Wallace²

Affiliations

¹ Poxvirus and Rabies Branch, Division of High Consequence Pathogens and Pathology, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA, USA. [email protected].
² Poxvirus and Rabies Branch, Division of High Consequence Pathogens and Pathology, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA, USA.
³ Ministère de l'Agriculture, des Ressources Naturelles et du Développement Rural, Port au Prince, Haiti.
⁴ Mission Rabies, Cranborne, Dorset, UK.

Abstract

In low and middle-income countries, a large proportion of animal rabies investigations end without a conclusive diagnosis leading to epidemiologic interpretations informed by clinical, rather than laboratory data. We compared Extreme Gradient Boosting (XGB) with Logistic Regression (LR) for their ability to estimate the probability of rabies in animals investigated as part of an Integrated Bite Case Management program (IBCM). To balance our training data, we used Random Oversampling (ROS) and Synthetic Minority Oversampling Technique. We developed a risk stratification framework based on predicted rabies probabilities. XGB performed better at predicting rabies cases than LR. Oversampling strategies enhanced the model sensitivity making them the preferred technique to predict rare events like rabies in a biting animal. XGB-ROS classified most of the confirmed rabies cases and only a small proportion of non-cases as either high (confirmed cases = 85.2%, non-cases = 0.01%) or moderate (confirmed cases = 8.4%, non-cases = 4.0%) risk. Model-based risk stratification led to a 3.2-fold increase in epidemiologically useful data compared to a routine surveillance strategy using IBCM case definitions. Our study demonstrates the application of machine learning to strengthen zoonotic disease surveillance under resource-limited settings.

Keywords: Extreme gradient boosting; Machine learning; Prediction; Rabies epidemiology; Risk stratification; Zoonotic disease surveillance.

MeSH terms

Animals
Bites and Stings / epidemiology
Bites and Stings / virology
Dogs
Epidemiological Monitoring
Humans
Logistic Models
Machine Learning*
Rabies* / epidemiology
Rabies* / veterinary