Improving glomerular filtration rate estimation by semi-supervised learning: a development and external validation study

Int Urol Nephrol. 2021 Aug;53(8):1649-1658. doi: 10.1007/s11255-020-02771-w. Epub 2021 Mar 12.

Abstract

Background: Accurate estimating glomerular filtration rate (GFR) is crucial both in clinical practice and epidemiological survey. We incorporated semi-supervised learning technology to improve GFR estimation performance.

Methods: AASK [African American Study of Kidney Disease and Hypertension], CRIC [Chronic Renal Insufficiency Cohort] and DCCT [Diabetes Control and Complications Trial] studies were pooled together for model development, whereas MDRD [Modification of Diet in Renal Disease] and CRISP [Consortium for Radiological Imaging Studies of Polycystic Kidney Disease] studies for model external validation. A total of seven variables (Serum creatinine, Age, Sex, Black race, Diabetes status, Hypertension and Body Mass Index) were included as independent variables, while the outcome variable GFR was measured as the urinary clearance of 125I-iothalamate. The revised CKD-EPI [Chronic Kidney Disease Epidemiology Collaboration] creatinine equations was selected as benchmark for performance comparisons. Head-to-head performance comparisons from four-variable to seven-variable combination were conducted between revised CKD-EPI equations and semi-supervised models.

Results: In each independent variables combination, the semi-supervised models consistently achieved superior results in all three performance indicators compared with corresponding revised CKD-EPI equations in the external validation data set. Furthermore, compared with revised four-variable CKD-EPI equation, the seven-variable semi-supervised model performed less biased (mean of difference: 0.03 [- 0.28, 0.34] vs 1.53 [1.28, 1.85], P < 0.001), more precise (interquartile range of difference: 7.94 [7.37, 8.50] vs 8.28 [7.76, 8.83], P = 0.1) and accurate (P30: 88.9% [87.4%, 90.2%] vs 86.0% [84.4%, 87.4%], P < 0.001.

Conclusions: The superior performance of the semi-supervised models during head-to-head comparisons supported the hypothesis that semi-supervised learning technology could improve GFR estimation performance.

Keywords: Chronic kidney disease (CKD); Estimating equation; Glomerular filtration rate (GFR); Semi-supervised learning; Serum creatinine.

Publication types

  • Validation Study

MeSH terms

  • Adult
  • Female
  • Glomerular Filtration Rate*
  • Humans
  • Kidney Function Tests / standards*
  • Male
  • Middle Aged
  • Supervised Machine Learning*