Predicting congenital syphilis cases: A performance evaluation of different machine learning models

PLoS One. 2023 Jun 2;18(6):e0276150. doi: 10.1371/journal.pone.0276150. eCollection 2023.

Abstract

Background: Communicable diseases represent a huge economic burden for healthcare systems and for society. Sexually transmitted infections (STIs) are a concerning issue, especially in developing and underdeveloped countries, in which environmental factors and other determinants of health play a role in contributing to its fast spread. In light of this situation, machine learning techniques have been explored to assess the incidence of syphilis and contribute to the epidemiological surveillance in this scenario.

Objective: The main goal of this work is to evaluate the performance of different machine learning models on predicting undesirable outcomes of congenital syphilis in order to assist resources allocation and optimize the healthcare actions, especially in a constrained health environment.

Method: We use clinical and sociodemographic data from pregnant women that were assisted by a social program in Pernambuco, Brazil, named Mãe Coruja Pernambucana Program (PMCP). Based on a rigorous methodology, we propose six experiments using three feature selection techniques to select the most relevant attributes, pre-process and clean the data, apply hyperparameter optimization to tune the machine learning models, and train and test models to have a fair evaluation and discussion.

Results: The AdaBoost-BODS-Expert model, an Adaptive Boosting (AdaBoost) model that used attributes selected by health experts, presented the best results in terms of evaluation metrics and acceptance by health experts from PMCP. By using this model, the results are more reliable and allows adoption on a daily usage to classify possible outcomes of congenital syphilis using clinical and sociodemographic data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Brazil / epidemiology
  • Child
  • Child, Preschool
  • Epidemiological Monitoring
  • Humans
  • Incidence
  • Machine Learning*
  • Sociodemographic Factors
  • Syphilis, Congenital* / classification
  • Syphilis, Congenital* / epidemiology

Grants and funding

This work was supported, in whole or in part, by the Bill & Melinda Gates Foundation [OPP1202194]. Under the grant conditions of the Foundation, a Creative Commons Attribution 4.0 Generic License has already been assigned to the Author Accepted Manuscript version that might arise from this submission. There was no additional external funding received for this study.