Identifying Potential Factors Associated With Racial Disparities in COVID-19 Outcomes: Retrospective Cohort Study Using Machine Learning on Real-World Data

JMIR Public Health Surveill. 2024 Sep 26:10:e54421. doi: 10.2196/54421.

Abstract

Background: Racial disparities in COVID-19 incidence and outcomes have been widely reported. Non-Hispanic Black patients endured worse outcomes disproportionately compared with non-Hispanic White patients, but the epidemiological basis for these observations was complex and multifaceted.

Objective: This study aimed to elucidate the potential reasons behind the worse outcomes of COVID-19 experienced by non-Hispanic Black patients compared with non-Hispanic White patients and how these variables interact using an explainable machine learning approach.

Methods: In this retrospective cohort study, we examined 28,943 laboratory-confirmed COVID-19 cases from the OneFlorida Research Consortium's data trust of health care recipients in Florida through April 28, 2021. We assessed the prevalence of pre-existing comorbid conditions, geo-socioeconomic factors, and health outcomes in the structured electronic health records of COVID-19 cases. The primary outcome was a composite of hospitalization, intensive care unit admission, and mortality at index admission. We developed and validated a machine learning model using Extreme Gradient Boosting to evaluate predictors of worse outcomes of COVID-19 and rank them by importance.

Results: Compared to non-Hispanic White patients, non-Hispanic Blacks patients were younger, more likely to be uninsured, had a higher prevalence of emergency department and inpatient visits, and were in regions with higher area deprivation index rankings and pollutant concentrations. Non-Hispanic Black patients had the highest burden of comorbidities and rates of the primary outcome. Age was a key predictor in all models, ranking highest in non-Hispanic White patients. However, for non-Hispanic Black patients, congestive heart failure was a primary predictor. Other variables, such as food environment measures and air pollution indicators, also ranked high. By consolidating comorbidities into the Elixhauser Comorbidity Index, this became the top predictor, providing a comprehensive risk measure.

Conclusions: The study reveals that individual and geo-socioeconomic factors significantly influence the outcomes of COVID-19. It also highlights varying risk profiles among different racial groups. While these findings suggest potential disparities, further causal inference and statistical testing are needed to fully substantiate these observations. Recognizing these relationships is vital for creating effective, tailored interventions that reduce disparities and enhance health outcomes across all racial and socioeconomic groups.

Keywords: COVID-19; COVID-19 outcomes; SARS-CoV-2; area deprivation index; health disparities; health outcomes; machine learning; racial disparities; real-world data; social determinants of health; socioeconomic status.

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Black or African American*
  • COVID-19* / epidemiology
  • COVID-19* / ethnology
  • Cohort Studies
  • Female
  • Florida / epidemiology
  • Health Status Disparities*
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Retrospective Studies
  • Risk Factors
  • Socioeconomic Factors
  • White
  • Young Adult