Predicting the risk of diabetic retinopathy using explainable machine learning algorithms

Diabetes Metab Syndr. 2023 Dec;17(12):102919. doi: 10.1016/j.dsx.2023.102919. Epub 2023 Dec 4.

Abstract

Background and objective: Diabetic retinopathy (DR) is a global health concern among diabetic patients. The objective of this study was to propose an explainable machine learning (ML)-based system for predicting the risk of DR.

Materials and methods: This study utilized publicly available cross-sectional data in a Chinese cohort of 6374 respondents. We employed boruta and least absolute shrinkage and selection operator (LASSO) based feature selection methods to identify the common predictors of DR. Using the identified predictors, we trained and optimized four widly applicable models (artificial neural network, support vector machine, random forest, and extreme gradient boosting (XGBoost) to predict patients with DR. Moreover, shapely additive explanation (SHAP) was adopted to show the contribution of each predictor of DR in the prediction.

Results: Combining Boruta and LASSO method revealed that community, TCTG, HDLC, BUN, FPG, HbAlc, weight, and duration were the most important predictors of DR. The XGBoost-based model outperformed the other models, with an accuracy of 90.01%, precision of 91.80%, recall of 97.91%, F1 score of 94.86%, and AUC of 0.850. Moreover, SHAP method showed that HbA1c, community, FPG, TCTG, duration, and UA1b were the influencing predictors of DR.

Conclusion: The proposed integrating system will be helpful as a tool for selecting significant predictors, which can predict patients who are at high risk of DR at an early stage in China.

Keywords: Diabetic retinopathy; Machine learning; Prediction; Predictors.

MeSH terms

  • Algorithms
  • Cross-Sectional Studies
  • Diabetes Mellitus*
  • Diabetic Retinopathy* / diagnosis
  • Diabetic Retinopathy* / epidemiology
  • Diabetic Retinopathy* / etiology
  • Humans
  • Machine Learning
  • Risk Factors