A multi-center study on the adaptability of a shared foundation model for electronic health records

Lin Lawrence Guo; Jason Fries; Ethan Steinberg; Scott Lanyon Fleming; Keith Morse; Catherine Aftandilian; Jose Posada; Nigam Shah; Lillian Sung

doi:10.1038/s41746-024-01166-w

A multi-center study on the adaptability of a shared foundation model for electronic health records

NPJ Digit Med. 2024 Jun 27;7(1):171. doi: 10.1038/s41746-024-01166-w.

Authors

Affiliations

¹ Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON, Canada.
² Stanford Center for Biomedical Informatics Research, Stanford University, Palo Alto, CA, USA.
³ Division of Pediatric Hospital Medicine, Department of Pediatrics, Stanford University, Palo Alto, CA, USA.
⁴ Division of Hematology/Oncology, Department of Pediatrics, Stanford University, Palo Alto, CA, USA.
⁵ Universidad del Norte, Barranquilla, Colombia.
⁶ Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON, Canada. [email protected].
⁷ Division of Haematology/Oncology, The Hospital for Sick Children, Toronto, ON, Canada. [email protected].

^# Contributed equally.

Abstract

Foundation models are transforming artificial intelligence (AI) in healthcare by providing modular components adaptable for various downstream tasks, making AI development more scalable and cost-effective. Foundation models for structured electronic health records (EHR), trained on coded medical records from millions of patients, demonstrated benefits including increased performance with fewer training labels, and improved robustness to distribution shifts. However, questions remain on the feasibility of sharing these models across hospitals and their performance in local tasks. This multi-center study examined the adaptability of a publicly accessible structured EHR foundation model (FM_SM), trained on 2.57 M patient records from Stanford Medicine. Experiments used EHR data from The Hospital for Sick Children (SickKids) and Medical Information Mart for Intensive Care (MIMIC-IV). We assessed both adaptability via continued pretraining on local data, and task adaptability compared to baselines of locally training models from scratch, including a local foundation model. Evaluations on 8 clinical prediction tasks showed that adapting the off-the-shelf FM_SM matched the performance of gradient boosting machines (GBM) locally trained on all data while providing a 13% improvement in settings with few task-specific training labels. Continued pretraining on local data showed FM_SM required fewer than 1% of training examples to match the fully trained GBM's performance, and was 60 to 90% more sample-efficient than training local foundation models from scratch. Our findings demonstrate that adapting EHR foundation models across hospitals provides improved prediction performance at less cost, underscoring the utility of base foundation models as modular components to streamline the development of healthcare AI.