Machine learning mathematical models for incidence estimation during pandemics

PLoS Comput Biol. 2024 Dec 23;20(12):e1012687. doi: 10.1371/journal.pcbi.1012687. Online ahead of print.

Abstract

Accurate estimates of the incidence of infectious diseases are key for the control of epidemics. However, healthcare systems are often unable to test the population exhaustively, especially when asymptomatic and paucisymptomatic cases are widespread; this leads to significant and systematic under-reporting of the real incidence. Here, we propose a machine learning approach to estimate the incidence of a pandemic in real-time, using reported cases and the overall test rate. In particular, we use Bayesian symbolic regression to automatically learn the closed-form mathematical models that most parsimoniously describe incidence. We develop and validate our models using COVID-19 incidence values for nine different countries, confirming their ability to accurately predict daily incidence. Remarkably, despite the differences in epidemic trajectories and dynamics across countries, we find that a single model for all countries offers a more parsimonious description and is more predictive of actual incidence compared to separate models for each country. Our results show the potential to accurately model incidence in real-time using closed-form mathematical models, providing a valuable tool for public health decision-makers.