Machine learning to improve the understanding of rabies epidemiology in low surveillance settings

Sci Rep. 2024 Oct 28;14(1):25851. doi: 10.1038/s41598-024-76089-3.

Abstract

In low and middle-income countries, a large proportion of animal rabies investigations end without a conclusive diagnosis leading to epidemiologic interpretations informed by clinical, rather than laboratory data. We compared Extreme Gradient Boosting (XGB) with Logistic Regression (LR) for their ability to estimate the probability of rabies in animals investigated as part of an Integrated Bite Case Management program (IBCM). To balance our training data, we used Random Oversampling (ROS) and Synthetic Minority Oversampling Technique. We developed a risk stratification framework based on predicted rabies probabilities. XGB performed better at predicting rabies cases than LR. Oversampling strategies enhanced the model sensitivity making them the preferred technique to predict rare events like rabies in a biting animal. XGB-ROS classified most of the confirmed rabies cases and only a small proportion of non-cases as either high (confirmed cases = 85.2%, non-cases = 0.01%) or moderate (confirmed cases = 8.4%, non-cases = 4.0%) risk. Model-based risk stratification led to a 3.2-fold increase in epidemiologically useful data compared to a routine surveillance strategy using IBCM case definitions. Our study demonstrates the application of machine learning to strengthen zoonotic disease surveillance under resource-limited settings.

Keywords: Extreme gradient boosting; Machine learning; Prediction; Rabies epidemiology; Risk stratification; Zoonotic disease surveillance.

MeSH terms

  • Animals
  • Bites and Stings / epidemiology
  • Bites and Stings / virology
  • Dogs
  • Epidemiological Monitoring
  • Humans
  • Logistic Models
  • Machine Learning*
  • Rabies* / epidemiology
  • Rabies* / veterinary