Prodromal clinical, demographic, and socio-ecological correlates of asthma in adults: a 10-year statewide big data multi-domain analysis

J Asthma. 2020 Nov;57(11):1155-1167. doi: 10.1080/02770903.2019.1642352. Epub 2019 Jul 26.

Abstract

Objectives: To identify prodromal correlates of asthma as compared to chronic obstructive pulmonary disease and allied-conditions (COPDAC) using a multi domain analysis of socio-ecological, clinical, and demographic domains.Methods: This is a retrospective case-risk-control study using data from Florida's statewide Healthcare Cost and Utilization Project (HCUP). Patients were grouped into three groups: asthma, COPDAC (without asthma), and neither asthma nor COPDAC. To identify socio-ecological, clinical, demographic, and clinical predictors of asthma and COPDAC, we used univariate analysis, feature ranking by bootstrapped information gain ratio, multivariable logistic regression with LogitBoost selection, decision trees, and random forests.Results: A total of 141,729 patients met inclusion criteria, of whom 56,052 were diagnosed with asthma, 85,677 with COPDAC, and 84,737 with neither asthma nor COPDAC. The multi-domain approach proved superior in distinguishing asthma versus COPDAC and non-asthma/non-COPDAC controls (area under the curve (AUROC) 84%). The best domain to distinguish asthma from COPDAC without controls was prior clinical diagnoses (AUROC 82%). Ranking variables from all the domains found the most important predictors for the asthma versus COPDAC and controls were primarily socio-ecological variables, while for asthma versus COPDAC without controls, demographic and clinical variables such as age, CCI, and prior clinical diagnoses, scored better.Conclusions: In this large statewide study using a machine learning approach, we found that a multi-domain approach with demographics, clinical, and socio-ecological variables best predicted an asthma diagnosis. Future work should focus on integrating machine learning-generated predictive models into clinical practice to improve early detection of those common respiratory diseases.

Keywords: Asthma; COPD; machine learning; multi-domain; prediction.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Administrative Claims, Healthcare / statistics & numerical data
  • Adult
  • Asthma / diagnosis*
  • Asthma / epidemiology
  • Big Data
  • Case-Control Studies
  • Early Diagnosis
  • Female
  • Florida / epidemiology
  • Humans
  • Longitudinal Studies
  • Machine Learning*
  • Male
  • Middle Aged
  • Models, Biological*
  • ROC Curve
  • Retrospective Studies
  • Risk Assessment / methods
  • Risk Assessment / statistics & numerical data
  • Risk Factors
  • Socioeconomic Factors