Development of risk models for early detection and prediction of chronic kidney disease in clinical settings

Sci Rep. 2024 Dec 30;14(1):32136. doi: 10.1038/s41598-024-83973-5.

Abstract

Chronic kidney disease (CKD) imposes a high burden with high mortality and morbidity rates. Early detection of CKD is imperative in preventing the adverse outcomes attributed to the later stages. Therefore, this study aims to utilize machine learning techniques to predict CKD at early stages. This study uses data obtained from a large longitudinal cohort study. The features include patients' sociodemographic, anthropometric, and laboratory tests that are mostly associated with CKD based on national and international studies. Missing data and outliers were deleted using listwise and interquartile range techniques, respectively. Data initially remained imbalanced to investigate the ability of models to work on imbalanced datasets. Stratified K-folds cross-validation, a robust approach that performs well on imbalanced data, was further performed to enhance the splitting. Interestingly, an interaction was found between age and gender where contrasting data was generated, therefore, to avoid this interaction gender-specific algorithms were developed. Four main algorithms and four algorithms using the stratified K-folds cross-validation technique, consisting of gender-specific Random Forest and feedforward Neural Networks were developed using the preprocessed data of 6855 participants. The RF model in women exhibited the highest AUC of 0.90 followed closely by 0.89 in their NN model. Both models constructed for men yielded an AUC of 0.88. Sensitivity scores were higher in men compared to women. Models demonstrated subpar results regarding specificity, however, the high precision and F1 scores, make the models extremely valuable in a clinical setting to accurately identify CKD cases while minimizing false positive diagnoses. Moreover, the results from stratified K-fold cross-validation indicated that the NN models were more sensitive to the imbalanced dataset and demonstrated a marked increase in performance, particularly specificity, after this approach. These data offer valuable insights for the development of future risk stratification models for CKD.

Keywords: Chronic kidney disease; Early diagnosis; Risk factors.

MeSH terms

  • Adult
  • Aged
  • Algorithms
  • Early Diagnosis*
  • Female
  • Humans
  • Longitudinal Studies
  • Machine Learning*
  • Male
  • Middle Aged
  • Neural Networks, Computer
  • Renal Insufficiency, Chronic* / diagnosis
  • Renal Insufficiency, Chronic* / epidemiology
  • Risk Assessment / methods
  • Risk Factors