Machine Learning For Tuning, Selection, And Ensemble Of Multiple Risk Scores For Predicting Type 2 Diabetes

Yujia Liu; Shangyuan Ye; Xianchao Xiao; Chenglin Sun; Gang Wang; Guixia Wang; Bo Zhang

doi:10.2147/RMHP.S225762

Machine Learning For Tuning, Selection, And Ensemble Of Multiple Risk Scores For Predicting Type 2 Diabetes

Risk Manag Healthc Policy. 2019 Nov 5:12:189-198. doi: 10.2147/RMHP.S225762. eCollection 2019.

Authors

Yujia Liu¹, Shangyuan Ye², Xianchao Xiao¹, Chenglin Sun¹, Gang Wang¹, Guixia Wang¹, Bo Zhang³

Affiliations

¹ Department of Endocrinology and Metabolism, The First Hospital of Jilin University, Changchun, Jilin 130021, People's Republic of China.
² Department of Population Medicine, Harvard Pilgrim Health Care and Harvard Medical School, Boston, MA, USA.
³ Department of Neurology and ICCTR Biostatistics and Research Design Center, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA.

Abstract

Background: This study proposes the use of machine learning algorithms to improve the accuracy of type 2 diabetes predictions using non-invasive risk score systems.

Methods: We evaluated and compared the prediction accuracies of existing non-invasive risk score systems using the data from the REACTION study (Risk Evaluation of Cancers in Chinese Diabetic Individuals: A Longitudinal Study). Two simple risk scores were established on the bases of logistic regression. Machine learning techniques (ensemble methods) were used to improve prediction accuracies by combining the individual score systems.

Results: Existing score systems from Western populations performed worse than the scores from Eastern populations in general. The two newly established score systems performed better than most existing scores systems but a little worse than the Chinese score system. Using ensemble methods with model selection algorithms yielded better prediction accuracy than all the simple score systems.

Conclusion: Our proposed machine learning methods can be used to improve the accuracy of screening the undiagnosed type 2 diabetes and identifying the high-risk patients.

Keywords: machine learning; prediction; risk score; stacking; type 2 diabetes; voting.

Grants and funding

The present work was one part of the baseline survey from REACTION study investigating the association of diabetes and cancer, which was conducted among 259,657 adults, aged 40 years and order in 25 communities across mainland China, from 2011 to 2012. This research was supported by the science and technology department of China through grant 20170623092TC-01 and 20180623083TC-01, and China’s national development and reform commission through grant 2017C019.