Missingness and algorithmic bias: an example from the United States National Outbreak Reporting System, 2009-2019

J Public Health Policy. 2024 Jun;45(2):198-204. doi: 10.1057/s41271-024-00477-2. Epub 2024 May 3.

Abstract

Growing debates about algorithmic bias in public health surveillance lack specific examples. We tested a common assumption that exposure and illness periods coincide and demonstrated how algorithmic bias can arise due to missingness of critical information related to illness and exposure durations. We examined 9407 outbreaks recorded by the United States National Outbreak Reporting System (NORS) from January 1, 2009 through December 31, 2019 and detected algorithmic bias, a systematic over- or under-estimation of foodborne disease outbreak (FBDO) durations due to missing start and end dates. For 7037 (75%) FBDOs with complete date-time information, ~ 60% reported that the exposure period ended before the illness period started. For 2079 (87.7%) FBDOs with missing exposure dates, average illness durations were ~ 5.3 times longer (p < 0.001) than those with complete information, prompting the potential for algorithmic bias. Modern surveillance systems must be equipped with investigative capacities to examine and assess structural data missingness that can lead to bias.

Keywords: Foodborne disease outbreak; Missing data; National Outbreak Reporting System (NORS); Outbreak progression.

MeSH terms

  • Algorithms*
  • Bias*
  • Disease Outbreaks*
  • Foodborne Diseases* / epidemiology
  • Humans
  • Population Surveillance
  • Public Health Surveillance / methods
  • United States / epidemiology