Learning From Past Respiratory Infections to Predict COVID-19 Outcomes: Retrospective Study

J Med Internet Res. 2021 Feb 22;23(2):e23026. doi: 10.2196/23026.

Abstract

Background: For the clinical care of patients with well-established diseases, randomized trials, literature, and research are supplemented with clinical judgment to understand disease prognosis and inform treatment choices. In the void created by a lack of clinical experience with COVID-19, artificial intelligence (AI) may be an important tool to bolster clinical judgment and decision making. However, a lack of clinical data restricts the design and development of such AI tools, particularly in preparation for an impending crisis or pandemic.

Objective: This study aimed to develop and test the feasibility of a "patients-like-me" framework to predict the deterioration of patients with COVID-19 using a retrospective cohort of patients with similar respiratory diseases.

Methods: Our framework used COVID-19-like cohorts to design and train AI models that were then validated on the COVID-19 population. The COVID-19-like cohorts included patients diagnosed with bacterial pneumonia, viral pneumonia, unspecified pneumonia, influenza, and acute respiratory distress syndrome (ARDS) at an academic medical center from 2008 to 2019. In total, 15 training cohorts were created using different combinations of the COVID-19-like cohorts with the ARDS cohort for exploratory purposes. In this study, two machine learning models were developed: one to predict invasive mechanical ventilation (IMV) within 48 hours for each hospitalized day, and one to predict all-cause mortality at the time of admission. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, positive predictive value, and negative predictive value. We established model interpretability by calculating SHapley Additive exPlanations (SHAP) scores to identify important features.

Results: Compared to the COVID-19-like cohorts (n=16,509), the patients hospitalized with COVID-19 (n=159) were significantly younger, with a higher proportion of patients of Hispanic ethnicity, a lower proportion of patients with smoking history, and fewer patients with comorbidities (P<.001). Patients with COVID-19 had a lower IMV rate (15.1 versus 23.2, P=.02) and shorter time to IMV (2.9 versus 4.1 days, P<.001) compared to the COVID-19-like patients. In the COVID-19-like training data, the top models achieved excellent performance (AUROC>0.90). Validating in the COVID-19 cohort, the top-performing model for predicting IMV was the XGBoost model (AUROC=0.826) trained on the viral pneumonia cohort. Similarly, the XGBoost model trained on all 4 COVID-19-like cohorts without ARDS achieved the best performance (AUROC=0.928) in predicting mortality. Important predictors included demographic information (age), vital signs (oxygen saturation), and laboratory values (white blood cell count, cardiac troponin, albumin, etc). Our models had class imbalance, which resulted in high negative predictive values and low positive predictive values.

Conclusions: We provided a feasible framework for modeling patient deterioration using existing data and AI technology to address data limitations during the onset of a novel, rapidly changing pandemic.

Keywords: COVID-19; all-cause mortality; artificial intelligence; data; feasibility; framework; infection; invasive mechanical ventilation; machine learning; outcome; respiratory.

MeSH terms

  • Aged
  • Area Under Curve
  • COVID-19 / diagnosis*
  • COVID-19 / mortality*
  • Cohort Studies
  • Comorbidity
  • Female
  • Hospitalization / statistics & numerical data
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Pandemics
  • Pneumonia, Viral / diagnosis*
  • Pneumonia, Viral / mortality
  • Predictive Value of Tests
  • Prognosis
  • ROC Curve
  • Respiration, Artificial / statistics & numerical data
  • Retrospective Studies
  • SARS-CoV-2
  • Treatment Outcome