Supervised learning using routine surveillance data improves outbreak detection of Salmonella and Campylobacter infections in Germany

PLoS One. 2022 May 5;17(5):e0267510. doi: 10.1371/journal.pone.0267510. eCollection 2022.

Abstract

The early detection of infectious disease outbreaks is a crucial task to protect population health. To this end, public health surveillance systems have been established to systematically collect and analyse infectious disease data. A variety of statistical tools are available, which detect potential outbreaks as abberations from an expected endemic level using these data. Here, we present supervised hidden Markov models for disease outbreak detection, which use reported outbreaks that are routinely collected in the German infectious disease surveillance system and have not been leveraged so far. This allows to directly integrate labeled outbreak data in a statistical time series model for outbreak detection. We evaluate our model using real Salmonella and Campylobacter data, as well as simulations. The proposed supervised learning approach performs substantially better than unsupervised learning and on par with or better than a state-of-the-art approach, which is applied in multiple European countries including Germany.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Campylobacter Infections* / diagnosis
  • Campylobacter Infections* / epidemiology
  • Communicable Diseases* / epidemiology
  • Disease Outbreaks
  • Germany / epidemiology
  • Humans
  • Population Surveillance
  • Public Health Surveillance
  • Salmonella
  • Supervised Machine Learning

Grants and funding

BZ was supported by BMBF (Medical Informatics Initiative: HIGHmed) and the collaborative management platform for detection and analyses of (re-)emerging and foodborne outbreaks in Europe (COMPARE: European Union’s Horizon research and innovation programme, grant agreement No. 643476).