Assessing socioeconomic bias in machine learning algorithms in health care: a case study of the HOUSES index

Young J Juhn; Euijung Ryu; Chung-Il Wi; Katherine S King; Momin Malik; Santiago Romero-Brufau; Chunhua Weng; Sunghwan Sohn; Richard R Sharp; John D Halamka

doi:10.1093/jamia/ocac052

Assessing socioeconomic bias in machine learning algorithms in health care: a case study of the HOUSES index

J Am Med Inform Assoc. 2022 Jun 14;29(7):1142-1151. doi: 10.1093/jamia/ocac052.

Authors

Young J Juhn^{1

2}, Euijung Ryu³, Chung-Il Wi^{1

2}, Katherine S King³, Momin Malik⁴, Santiago Romero-Brufau⁵, Chunhua Weng⁶, Sunghwan Sohn⁷, Richard R Sharp⁸, John D Halamka^{4

9}

Affiliations

¹ Precision Population Science Lab, Mayo Clinic, Rochester, Minnesota, USA.
² Artificial Intelligence Program of Department of Pediatric and Adolescent Medicine, Mayo Clinic, Rochester, Minnesota, USA.
³ Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, USA.
⁴ Center for Digital Health, Mayo Clinic, Rochester, Minnesota, USA.
⁵ Department of Internal Medicine, Mayo Clinic, Rochester, Minnesota, USA.
⁶ Department of Biomedical Informatics, Columbia University, New York, New York, USA.
⁷ Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA.
⁸ Biomedical Ethics Program, Mayo Clinic, Rochester, Minnesota, USA.
⁹ Mayo Clinic Platform, Rochester, Minnesota, USA.

Abstract

Objective: Artificial intelligence (AI) models may propagate harmful biases in performance and hence negatively affect the underserved. We aimed to assess the degree to which data quality of electronic health records (EHRs) affected by inequities related to low socioeconomic status (SES), results in differential performance of AI models across SES.

Materials and methods: This study utilized existing machine learning models for predicting asthma exacerbation in children with asthma. We compared balanced error rate (BER) against different SES levels measured by HOUsing-based SocioEconomic Status measure (HOUSES) index. As a possible mechanism for differential performance, we also compared incompleteness of EHR information relevant to asthma care by SES.

Results: Asthmatic children with lower SES had larger BER than those with higher SES (eg, ratio = 1.35 for HOUSES Q1 vs Q2-Q4) and had a higher proportion of missing information relevant to asthma care (eg, 41% vs 24% for missing asthma severity and 12% vs 9.8% for undiagnosed asthma despite meeting asthma criteria).

Discussion: Our study suggests that lower SES is associated with worse predictive model performance. It also highlights the potential role of incomplete EHR data in this differential performance and suggests a way to mitigate this bias.

Conclusion: The HOUSES index allows AI researchers to assess bias in predictive model performance by SES. Although our case study was based on a small sample size and a single-site study, the study results highlight a potential strategy for identifying bias by using an innovative SES measure.

Keywords: HOUSES; algorithmic bias; artificial intelligence; electronic health records; social determinants of health.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Artificial Intelligence*
Asthma* / diagnosis
Bias
Child
Delivery of Health Care
Humans
Maschinelles Lernen
Social Class

Abstract

Publication types

MeSH terms

Grants and funding