Background: Racial disparities in health care are well documented in the United States. As machine learning methods become more common in health care settings, it is important to ensure that these methods do not contribute to racial disparities through biased predictions or differential accuracy across racial groups.
Objective: The goal of the research was to assess a machine learning algorithm intentionally developed to minimize bias in in-hospital mortality predictions between white and nonwhite patient groups.
Methods: Bias was minimized through preprocessing of algorithm training data. We performed a retrospective analysis of electronic health record data from patients admitted to the intensive care unit (ICU) at a large academic health center between 2001 and 2012, drawing data from the Medical Information Mart for Intensive Care-III database. Patients were included if they had at least 10 hours of available measurements after ICU admission, had at least one of every measurement used for model prediction, and had recorded race/ethnicity data. Bias was assessed through the equal opportunity difference. Model performance in terms of bias and accuracy was compared with the Modified Early Warning Score (MEWS), the Simplified Acute Physiology Score II (SAPS II), and the Acute Physiologic Assessment and Chronic Health Evaluation (APACHE).
Results: The machine learning algorithm was found to be more accurate than all comparators, with a higher sensitivity, specificity, and area under the receiver operating characteristic. The machine learning algorithm was found to be unbiased (equal opportunity difference 0.016, P=.20). APACHE was also found to be unbiased (equal opportunity difference 0.019, P=.11), while SAPS II and MEWS were found to have significant bias (equal opportunity difference 0.038, P=.006 and equal opportunity difference 0.074, P<.001, respectively).
Conclusions: This study indicates there may be significant racial bias in commonly used severity scoring systems and that machine learning algorithms may reduce bias while improving on the accuracy of these methods.
Keywords: health disparities; machine learning; mortality; prediction; racial disparities.
©Angier Allen, Samson Mataraso, Anna Siefkas, Hoyt Burdick, Gregory Braden, R Phillip Dellinger, Andrea McCoy, Emily Pellegrini, Jana Hoffman, Abigail Green-Saxena, Gina Barnes, Jacob Calvert, Ritankar Das. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 22.10.2020.