Rationale: Acute exacerbations of chronic obstructive pulmonary disease (AECOPDs) are heterogeneous. Machine learning (ML) has previously been used to dissect some of the heterogeneity in COPD. The widespread adoption of electronic health records (EHRs) has led to the rapid accumulation of large amounts of patient data as part of routine clinical care. However, it is unclear whether the implementation of ML in EHR-derived data has the potential to identify subgroups of AECOPD.
Objectives: To determine whether ML implementation using EHR data from severe AECOPDs requiring hospitalization identifies relevant subgroups.
Methods: This study used 2 retrospective cohorts of patients with AECOPDs (non-COVID-19 and COVID-19) treated at Yale-New Haven Hospital. K-means clustering was used to identify patient subgroups.
Measurements and main results: We identified 3 subgroups in the non-COVID cohort (n=1736). Each subgroup had distinct clinical characteristics. The reference subgroup was the largest (n=904), followed by cardio-renal (n=548) and eosinophilic (n=284). The eosinophilic subgroup had milder severity of AECOPD, including a shorter hospital stay (p<0.01). The cardio-renal subgroup had the highest mortality during (5%) and in the year after hospitalization (30%). Validation of the severe AECOPD classifier in the COVID-19 cohort recapitulated the characteristics seen in the non-COVID cohort. AECOPD subgroups in the COVID-19 cohort had different interleukin (IL)-1 beta, IL-2R, and IL-8 levels (false discovery rate ≤ 0.05). These specific leukocyte and cytokine profiles resulted in inflammatory differences between the AECOPD subgroups based on C-reactive protein levels.
Conclusions: Incorporating ML with EHR data allows the identification of specific clinical and biological subgroups for severe AECOPD.
Keywords: acute exacerbations of COPD; clusters; electronic health records; machine learning.
JCOPDF © 2024.