Latent space representation of electronic health records for clustering dialysis-associated kidney failure subtypes

Comput Biol Med. 2024 Oct 5:183:109243. doi: 10.1016/j.compbiomed.2024.109243. Online ahead of print.

Abstract

Objective: Kidney failure manifests in various forms, from sudden occurrences such as Acute Kidney Injury (AKI) to progressive like Chronic Kidney Disease (CKD). Given its intricate nature, marked by overlapping comorbidities and clinical similarities-including treatment modalities like dialysis-we sought to design and validate an end-to-end framework for clustering kidney failure subtypes.

Materials and methods: Our emphasis was on dialysis, utilizing a comprehensive dataset from the UK Biobank (UKB). We transformed raw Electronic Health Record (EHR) data into standardized matrices that incorporate patient demographics, clinical visit data, and the innovative feature of visit time-gaps. This matrix structure was achieved using a unique data cutting method. Latent space transformation was facilitated using a convolution autoencoder (ConvAE) model, which was then subjected to clustering using Principal Component Analysis (PCA) and K-means algorithms.

Results: Our transformation model effectively reduced data dimensionality, thereby accelerating computational processes. The derived latent space demonstrated remarkable clustering capacities. Through cluster analysis, two distinct groups were identified: CKD-majority (cluster 1) and a mixed group of non-CKD and some CKD subtypes (cluster 0). Cluster 1 exhibited notably low survival probability, suggesting it predominantly represented severe CKD. In contrast, cluster 0, with substantially higher survival probability, likely to include milder CKD forms and severe AKI. Our end-to-end framework effectively differentiates kidney failure subtypes using the UKB dataset, offering potential for nuanced therapeutic interventions.

Conclusions: This innovative approach integrates diverse data sources, providing a holistic understanding of kidney failure, which is imperative for patient management and targeted therapeutic interventions.

Keywords: Acute kidney injury; Chronic kidney disease; Clustering; Convolutional autoencoder; Electronic health record; Kidney failure.