Background: Data-driven risk stratification models built using data from a single hospital often have a paucity of training data. However, leveraging data from other hospitals can be challenging owing to institutional differences with patients and with data coding and capture.
Objective: To investigate three approaches to learning hospital-specific predictions about the risk of hospital-associated infection with Clostridium difficile, and perform a comparative analysis of the value of different ways of using external data to enhance hospital-specific predictions.
Materials and methods: We evaluated each approach on 132 853 admissions from three hospitals, varying in size and location. The first approach was a single-task approach, in which only training data from the target hospital (ie, the hospital for which the model was intended) were used. The second used only data from the other two hospitals. The third approach jointly incorporated data from all hospitals while seeking a solution in the target space.
Results: The relative performance of the three different approaches was found to be sensitive to the hospital selected as the target. However, incorporating data from all hospitals consistently had the highest performance.
Discussion: The results characterize the challenges and opportunities that come with (1) using data or models from collections of hospitals without adapting them to the site at which the model will be used, and (2) using only local data to build models for small institutions or rare events.
Conclusions: We show how external data from other hospitals can be successfully and efficiently incorporated into hospital-specific models.
Keywords: c. difficile; electronic health records; predictive models; risk stratification; transfer learning.
Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.