Stratification of Alzheimer's Disease Patients Using Knowledge-Guided Unsupervised Latent Factor Clustering with Electronic Health Record Data

medRxiv [Preprint]. 2024 Dec 26:2024.12.23.24319588. doi: 10.1101/2024.12.23.24319588.

Abstract

Background: People with Alzheimer's disease (AD) exhibit varying clinical trajectories. There is a need to predict future AD-related outcomes such as morbidity and mortality using clinical profile at the point of care.

Objective: To stratify AD patients based on baseline clinical profiles (up to two years prior to AD diagnosis) and update the model after AD diagnosis to prognosticate future AD-related outcomes.

Methods: Using the electronic health record (EHR) data of a large healthcare system (2011-2022), we first identified patients with ≥1 diagnosis code for AD or related dementia and applied a validated unsupervised phenotyping algorithm to assign AD diagnosis status. Next, we applied an unsupervised latent factor clustering approach, guided by knowledge graph embeddings of relevant EHR features up to the baseline, to cluster patients into two groups at AD diagnosis. We then prognosticated the risk of two readily ascertainable and clinically relevant AD-related outcomes ( i.e., nursing home admission indicating greater need for assistance and mortality), adjusting for baseline confounders ( e.g., age, gender, race, ethnicity, healthcare utilization, and comorbidities). For patients remaining at risk one year post-diagnosis, we updated their group membership and repeated the prognostication.

Results: We stratified 16,411 algorithm-identified AD patients into two groups based on their baseline clinical profiles (41% Group 1, 59% Group 2). Patients in Group 1 were marginally older at AD diagnosis (age Mean [SD]: 81.4 [9.3] vs 81.0 [8.7], p =.007), exhibited greater comorbidity burden (Elixhauser comorbidity index Mean [SD]: 11.3 [10.3] vs 7.5 [8.6], p <.0001), and more frequently received AD-related medications (47.7% vs 40.9%, p <.0001) than those in Group 2. Compared to Group 1, Group 2 had a lower risk of nursing home admission (HR [95% CI]=0.804 [0.765, 0.844], p <.001), while the two groups had similar mortality risk (HR [95% CI]=1.008 [0.963, 1.056], p =.733). One year after AD diagnosis, 12,606 patients remained at risk (45.7% Group 1, 54.3% Group 2). Consistent with baseline findings, Group 2 had a lower risk of nursing home admission than (HR [95% CI]=0.815 [0.766, 0.868], p <.001) and similar mortality risk as (HR [95% CI]=0.977 [0.922, 1.035], p =0.430) Group 1 in the updated model.

Conclusions: It is feasible to stratify patients based on readily available clinical profiles before AD diagnosis and crucially to update the model one year after diagnosis to effectively prognosticate future AD-related outcomes.

Short abstract: Prognostication for people with Alzheimer's disease (AD) at the point of care could improve clinical management. Applying a novel unsupervised latent factor clustering approach guided by knowledge graph embeddings of relevant clinical features from electronic health records, we stratified 16,411 AD patients into two groups at diagnosis and prognosticated their risk of AD-related outcomes ( i.e., nursing home admission, mortality), adjusting for baseline confounders. To reflect real-world evolution in clinical trajectories, we updated patient stratification for 12,606 AD patients remaining at risk 1-year post-diagnosis and repeated prognostication. At both timepoints, one group had a higher nursing home admission risk and exhibited characteristics suggesting greater symptom burden, but the mortality risk remained comparable between groups. This study supports that patient stratification can enable outcome prognosis for AD patients. While baseline prognostication can guide early treatment and tailored management, dynamic prognostication may inform more timely interventions to improve long-term outcomes.

Publication types

  • Preprint