Deciphering data anomalies in BioSense

MMWR Suppl. 2005 Aug 26:54:133-9.

Abstract

Introduction: Since June 2004, CDC's BioIntelligence Center has monitored daily nationwide syndromic data by using the BioSense surveillance application.

Objectives: The BioSense application has been monitored by a team of full-time CDC analysts. This report examines their role in identifying and deciphering data anomalies. It also discusses the limitations of the current surveillance application, lessons learned, and potential next steps to improve national syndromic surveillance methodology.

Methods: Data on clinical diagnoses (International Classification of Diseases, Ninth Revision, Clinical Modifications [ICD-9-CM]) and medical procedures (CPT codes) are provided by Department of Veterans Affairs and Department of Defense ambulatory-care clinics; data on select sales of over-the-counter health-care products are provided by participating retail pharmacies; and data on laboratory tests ordered are provided by Laboratory Corporation of America, Inc. All data are filtered to exclude information irrelevant to syndromic surveillance.

Results: During June-November 2004, of the approximately 160 data anomalies examined, no events involving disease outbreaks or deliberate exposure to a pathogen were detected. Data anomalies were detected by using a combination of statistical algorithms and analytical visualization features. The anomalies primarily reflected unusual changes in either daily data volume or in types of clinical diagnoses and procedures. This report describes steps taken in routine monitoring, including 1) detecting data anomalies, 2) estimating geographic and temporal scope of the anomalies, 3) gathering supplemental facts, 4) comparing data from multiple data sources, 5) developing hypotheses, and 6) ruling out or validating the existence of an actual event. To be useful for early detection, these steps must be completed quickly (i.e., in hours or days). Anomalies described are attributable to multiple causes, including miscoded data, effects of retail sales promotions, and smaller but explainable signals.

Conclusion: BioSense requires an empirical learning curve to make the best use of the public health data it contains. This process can be made more effective by continued improvements to the user interface and collective input from local public health partners.

MeSH terms

  • Bioterrorism
  • Data Interpretation, Statistical
  • Disaster Planning
  • Disease Outbreaks / prevention & control*
  • Epidemiologic Measurements
  • Humans
  • Population Surveillance / methods*
  • Public Health Informatics / instrumentation*
  • Software