Machine learning for predicting Chagas disease infection in rural areas of Brazil

Fabio De Rose Ghilardi; Gabriel Silva; Thallyta Maria Vieira; Ariela Mota; Ana Luiza Bierrenbach; Renata Fiuza Damasceno; Lea Campos de Oliveira; Alexandre Dias Porto Chiavegatto Filho; Ester Sabino

doi:10.1371/journal.pntd.0012026

Machine learning for predicting Chagas disease infection in rural areas of Brazil

PLoS Negl Trop Dis. 2024 Apr 16;18(4):e0012026. doi: 10.1371/journal.pntd.0012026. eCollection 2024 Apr.

Authors

Fabio De Rose Ghilardi¹, Gabriel Silva², Thallyta Maria Vieira³, Ariela Mota³, Ana Luiza Bierrenbach¹, Renata Fiuza Damasceno³, Lea Campos de Oliveira⁴, Alexandre Dias Porto Chiavegatto Filho², Ester Sabino^{1

4}

Affiliations

¹ Faculdade de Medicina da Universidade de São Paulo-FMUSP, São Paulo, Brazil.
² Faculdade de Saúde Pública da Universidade de São Paulo-FSP USP, São Paulo, Brazil.
³ Universidade Estadual de Montes Claros-Unimontes, Montes Claros, Minas Gerais, Brazil.
⁴ Instituto de Medicina Tropical da Faculdade de Medicina da USP-IMT USP, São Paulo, Brazil.

Abstract

Introduction: Chagas disease is a severe parasitic illness that is prevalent in Latin America and often goes unaddressed. Early detection and treatment are critical in preventing the progression of the illness and its associated life-threatening complications. In recent years, machine learning algorithms have emerged as powerful tools for disease prediction and diagnosis.

Methods: In this study, we developed machine learning algorithms to predict the risk of Chagas disease based on five general factors: age, gender, history of living in a mud or wooden house, history of being bitten by a triatomine bug, and family history of Chagas disease. We analyzed data from the Retrovirus Epidemiology Donor Study (REDS) to train five popular machine learning algorithms. The sample comprised 2,006 patients, divided into 75% for training and 25% for testing algorithm performance. We evaluated the model performance using precision, recall, and AUC-ROC metrics.

Results: The Adaboost algorithm yielded an AUC-ROC of 0.772, a precision of 0.199, and a recall of 0.612. We simulated the decision boundary using various thresholds and observed that in this dataset a threshold of 0.45 resulted in a 100% recall. This finding suggests that employing such a threshold could potentially save 22.5% of the cost associated with mass testing of Chagas disease.

Conclusion: Our findings highlight the potential of applying machine learning to improve the sensitivity and effectiveness of Chagas disease diagnosis and prevention. Furthermore, we emphasize the importance of integrating socio-demographic and environmental factors into neglected disease prediction models to enhance their performance.

Copyright: © 2024 De Rose Ghilardi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Adolescent
Adult
Aged
Algorithms
Brazil / epidemiology
Chagas Disease* / diagnosis
Chagas Disease* / epidemiology
Child
Child, Preschool
Female
Humans
Machine Learning*
Male
Middle Aged
Risk Factors
Rural Population*
Young Adult

Grants and funding

The author(s) received no specific funding for this work.