A Novel Chronic Kidney Disease Phenotyping Algorithm Using Combined Electronic Health Record and Claims Data

Clin Epidemiol. 2023 Mar 8:15:299-307. doi: 10.2147/CLEP.S397020. eCollection 2023.

Abstract

Purpose: Because chronic kidney disease (CKD) is often under-coded as a diagnosis in claims data, we aimed to develop claims-based prediction models for CKD phenotypes determined by laboratory results in electronic health records (EHRs).

Patients and methods: We linked EHR from two networks (used as training and validation cohorts, respectively) with Medicare claims data. The study cohort included individuals ≥65 years with a valid serum creatinine result in the EHR from 2007 to 2017, excluding those with end-stage kidney disease or on dialysis. We used LASSO regression to select among 134 predictors for predicting continuous estimated glomerular filtration rate (eGFR). We assessed the model performance when predicting eGFR categories of <60, <45, <30 mL/min/1.73m2 in terms of area under the receiver operating curves (AUC).

Results: The model training cohort included 117,476 patients (mean age 74.8 years, female 58.2%) and the validation cohort included 56,744 patients (mean age 73.8 years, female 59.6%). In the validation cohort, the AUC of the primary model (with 113 predictors and an adjusted R2 of 0.35) for predicting eGFR <60, eGFR<45, and eGFR <30 mL/min/1.73m2 categories was 0.81, 0.88, and 0.92, respectively, and the corresponding positive predictive values for these 3 phenotypes were 0.80 (95% confidence interval: 0.79, 0.81), 0.79 (0.75, 0.84), and 0.38 (0.30, 0.45), respectively.

Conclusion: We developed a claims-based model to determine clinical phenotypes of CKD stages defined by eGFR values. Researchers without access to laboratory results can use the model-predicted phenotypes as a proxy clinical endpoint or confounder and to enhance subgroup effect assessment.

Keywords: EHR; RPDR; prediction.