Evaluating bias in electronic health record data: using agent-based models to examine whether geographic disparities in community-acquired Methicillin-resistant Staphylococcus aureus are due to differential healthcare-seeking behaviors

Am J Epidemiol. 2025 Jan 6:kwae481. doi: 10.1093/aje/kwae481. Online ahead of print.

Abstract

Electronic health records (EHR) are increasingly used in public health research. However, biases may exist when using EHR due to whether someone is captured in the data. Assessing the impact of bias in generating disparities identified with EHR data is difficult because information about healthcare-seeking behaviors is not included in the record. We developed an agent-based model (ABM) to simulate the healthcare-seeking behavior for community-acquired Methicillin-resistant Staphylococcus aureus (CA-MRSA) infection in a subregion of California. The ABM assumed no difference in prevalence across the study area. We modeled the healthcare-seeking process to see if geographic differences in prevalence would emerge from the ABM when only looking at those who sought treatment, matching empirical data. The ABM reproduced prevalence in observed data for nine of the 21 geographies. Simulated differences in prevalence across geographies did not reach the magnitude in observed data, and spatial patterns had low to moderate agreement. Our results suggest that geographic disparities in the CA-MRSA prevalence previously identified in California EHR data may be due to determinants beyond bias and healthcare-seeking behaviors. Future studies could adapt this model for other health outcomes by adjusting the healthcare-seeking behavior parameters and modifying the disease progression process.

Keywords: Agent-based model; CA-MRSA; EHR disparities; bias analysis; geographic disparities.