Novel Machine Learning Identifies 5 Asthma Phenotypes Using Cluster Analysis of Real-World Data

J Allergy Clin Immunol Pract. 2024 Aug;12(8):2084-2091.e4. doi: 10.1016/j.jaip.2024.04.035. Epub 2024 Apr 27.

Abstract

Background: Asthma classification into different subphenotypes is important to guide personalized therapy and improve outcomes.

Objectives: To further explore asthma heterogeneity through determination of multiple patient groups by using novel machine learning (ML) approaches and large-scale real-world data.

Methods: We used electronic health records of patients with asthma followed at the Cleveland Clinic between 2010 and 2021. We used k-prototype unsupervised ML to develop a clustering model where predictors were age, sex, race, body mass index, prebronchodilator and postbronchodilator spirometry measurements, and the usage of inhaled/systemic steroids. We applied elbow and silhouette plots to select the optimal number of clusters. These clusters were then evaluated through LightGBM's supervised ML approach on their cross-validated F1 score to support their distinctiveness.

Results: Data from 13,498 patients with asthma with available postbronchodilator spirometry measurements were extracted to identify 5 stable clusters. Cluster 1 included a young nonsevere asthma population with normal lung function and higher frequency of acute exacerbation (0.8 /patient-year). Cluster 2 had the highest body mass index (mean ± SD, 44.44 ± 7.83 kg/m2), and the highest proportion of females (77.5%) and Blacks (28.9%). Cluster 3 comprised patients with normal lung function. Cluster 4 included patients with lower percent of predicted FEV1 of 77.03 (12.79) and poor response to bronchodilators. Cluster 5 had the lowest percent of predicted FEV1 of 68.08 (15.02), the highest postbronchodilator reversibility, and the highest proportion of severe asthma (44.9%) and blood eosinophilia (>300 cells/μL) (34.8%).

Conclusions: Using real-world data and unsupervised ML, we classified asthma into 5 clinically important subphenotypes where group-specific asthma treatment and management strategies can be designed and deployed.

Keywords: Asthma; Asthma phenotypes; Cluster analysis; Machine learning.

MeSH terms

  • Adult
  • Aged
  • Asthma* / diagnosis
  • Asthma* / drug therapy
  • Asthma* / epidemiology
  • Asthma* / physiopathology
  • Cluster Analysis
  • Electronic Health Records
  • Female
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Phenotype*
  • Spirometry
  • Young Adult