Federated Learning of Electronic Health Records to Improve Mortality Prediction in Hospitalized Patients With COVID-19: Machine Learning Approach

Akhil Vaid; Suraj K Jaladanki; Jie Xu; Shelly Teng; Arvind Kumar; Samuel Lee; Sulaiman Somani; Ishan Paranjpe; Jessica K De Freitas; Tingyi Wanyan; Kipp W Johnson; Mesude Bicak; Eyal Klang; Young Joon Kwon; Anthony Costa; Shan Zhao; Riccardo Miotto; Alexander W Charney; Erwin Böttinger; Zahi A Fayad; Girish N Nadkarni; Fei Wang; Benjamin S Glicksberg

doi:10.2196/24207

Federated Learning of Electronic Health Records to Improve Mortality Prediction in Hospitalized Patients With COVID-19: Machine Learning Approach

JMIR Med Inform. 2021 Jan 27;9(1):e24207. doi: 10.2196/24207.

Authors

Akhil Vaid^#^{1

2}, Suraj K Jaladanki^#^{1

2}, Jie Xu³, Shelly Teng^{1

2}, Arvind Kumar^{1

2}, Samuel Lee^{1

2}, Sulaiman Somani^{1

2}, Ishan Paranjpe^{1

2}, Jessica K De Freitas^{1

2

4}, Tingyi Wanyan^{1

5

6}, Kipp W Johnson^{1

2}, Mesude Bicak^{1

2

4}, Eyal Klang⁷, Young Joon Kwon⁸, Anthony Costa⁸, Shan Zhao^{1

9}, Riccardo Miotto^{1

4}, Alexander W Charney^{2

4

10

11}, Erwin Böttinger^{1

2

12}, Zahi A Fayad^{13

14}, Girish N Nadkarni^{1

2

15

16}, Fei Wang³, Benjamin S Glicksberg^{1

2

4}

Affiliations

¹ The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
² The Mount Sinai Clinical Intelligence Center, New York, NY, United States.
³ Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, United States.
⁴ Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
⁵ Intelligent System Engineering, Indiana University, Bloomington, IN, United States.
⁶ School of Information, University of Texas Austin, Austin, TX, United States.
⁷ Institute for Healthcare Delivery Science, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
⁸ Department of Neurological Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
⁹ Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
¹⁰ The Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
¹¹ Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
¹² Digital Health Center, Hasso Plattner Institute, University of Potsdam, Potsdam, Germany.
¹³ The BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
¹⁴ Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
¹⁵ Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
¹⁶ The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States.

^# Contributed equally.

PMID: 33400679
PMCID: PMC7842859
DOI: 10.2196/24207

Abstract

Background: Machine learning models require large datasets that may be siloed across different health care institutions. Machine learning studies that focus on COVID-19 have been limited to single-hospital data, which limits model generalizability.

Objective: We aimed to use federated learning, a machine learning technique that avoids locally aggregating raw clinical data across multiple institutions, to predict mortality in hospitalized patients with COVID-19 within 7 days.

Methods: Patient data were collected from the electronic health records of 5 hospitals within the Mount Sinai Health System. Logistic regression with L1 regularization/least absolute shrinkage and selection operator (LASSO) and multilayer perceptron (MLP) models were trained by using local data at each site. We developed a pooled model with combined data from all 5 sites, and a federated model that only shared parameters with a central aggregator.

Results: The LASSO_federated model outperformed the LASSO_local model at 3 hospitals, and the MLP_federated model performed better than the MLP_local model at all 5 hospitals, as determined by the area under the receiver operating characteristic curve. The LASSO_pooled model outperformed the LASSO_federated model at all hospitals, and the MLP_federated model outperformed the MLP_pooled model at 2 hospitals.

Conclusions: The federated learning of COVID-19 electronic health record data shows promise in developing robust predictive models without compromising patient privacy.

Keywords: COVID-19; electronic health records; federated learning; machine learning.

©Akhil Vaid, Suraj K Jaladanki, Jie Xu, Shelly Teng, Arvind Kumar, Samuel Lee, Sulaiman Somani, Ishan Paranjpe, Jessica K De Freitas, Tingyi Wanyan, Kipp W Johnson, Mesude Bicak, Eyal Klang, Young Joon Kwon, Anthony Costa, Shan Zhao, Riccardo Miotto, Alexander W Charney, Erwin Böttinger, Zahi A Fayad, Girish N Nadkarni, Fei Wang, Benjamin S Glicksberg. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 27.01.2021.

Grants and funding

UL1 TR001433/TR/NCATS NIH HHS/United States