Information on the causative agent in an enteric disease outbreak can be used to generate hypotheses about the route of transmission and possible vehicles, to guide environmental assessments, and to target outbreak control measures. However, only about 40% of outbreaks reported in the United States include a confirmed etiology. The goal of this project was to identify clinical and demographic characteristics that can be used to predict the causative agent in an enteric disease outbreak and to use these data to develop an online tool for investigators to use during an outbreak when hypothesizing about the causative agent. Using data on enteric disease outbreaks from all transmission routes (animal contact, environmental contamination, foodborne, person-to-person, waterborne, unknown) reported to the U.S. Centers for Disease Control and Prevention, we developed random forest models to predict the etiology of an outbreak based on aggregated clinical and demographic characteristics at both the etiology category (i.e., bacteria, parasites, toxins, viruses) and individual etiology (Clostridium perfringens, Campylobacter, Cryptosporidium, norovirus, Salmonella, Shiga toxin-producing Escherichia coli, and Shigella) levels. The etiology category model had a kappa of 0.85 and an accuracy of 0.92, whereas the etiology-specific model had a kappa of 0.75 and an accuracy of 0.86. The highest sensitivities in the etiology category model were for bacteria and viruses; all categories had high specificities (>0.90). For the etiology-specific model, norovirus and Salmonella had the highest sensitivity and all etiologies had high specificities. When laboratory confirmation is unavailable, information on the clinical signs and symptoms reported by people associated with the outbreak, with other characteristics including case demographics and illness severity, can be used to predict the etiology or etiology category. An online publicly available tool was developed to assist investigators in their enteric disease outbreak investigations.
Keywords: enteric; hypothesis generation; outbreaks; surveillance.