Background: Several studies on noncommunicable diseases (NCDs) have been carried out worldwide, the basis of most of which is the identification of risk factors-modifiable (or behavioral) and metabolic. Majority of the NCDs are due to sociodemographic factors, lifestyle, and behavior, which can be prevented to a great extent. Thus, it is a health challenge and a necessity to identify such factors of NCDs.
Objectives: The objective is to make a thorough systematic and comparative analysis of diverse machine learning (ML) classifiers and identify the best-performing model to study social determinants of NCDs.
Materials and methods: We used data from the Longitudinal Ageing Study in India, and predicted the prevalence of NCDs based on a set of sociodemographic, lifestyle, and behavioral risk factors by conducting a comparative analysis among 25 different algorithms.
Results: Evaluating the performance metrics, the random forest model was found to be the most-suited method with 87.9% accuracy and hence chosen as the final model for the analysis. The model's performance was optimized by a hyper-parameter tuning process using grid-search with a 5-fold cross-validation strategy and results suggested that it was able to make accurate predictions on new instances.
Conclusion: The epidemic of chronic illness cannot be completely addressed without comprehending the social determinants. With advancements in medical and health-care industry, ML has been applied to analyze diseases based on clinical parameters. This work is an attempt by the authors to explore and encourage the use of ML in the field of social epidemiology.
Copyright © 2024 Copyright: © 2024 Indian Journal of Public Health.