Objectives: Much has been written about the utility of genomic databases to public health. Within food safety these databases contain data from two types of isolates-those from patients (i.e., clinical) and those from non-clinical sources (e.g., a food manufacturing environment). A genetic match between isolates from these sources represents a signal of interest. We investigate the match rate within three large genomic databases (Listeria monocytogenes, Escherichia coli, and Salmonella) and the smaller Cronobacter database; the databases are part of the Pathogen Detection project at NCBI (National Center for Biotechnology Information).
Results: Currently, the match rate of clinical isolates to non-clinical isolates is 33% for L. monocytogenes, 46% for Salmonella, and 7% for E. coli. These match rates are associated with several database features including the diversity of the organism, the database size, and the proportion of non-clinical BioSamples. Modeling match rate via logistic regression showed relatively good performance. Our prediction model illustrates the importance of populating databases with non-clinical isolates to better identify a match for clinical samples. Such information should help public health officials prioritize surveillance strategies and show the critical need to populate fledgling databases (e.g., Cronobacter sakazakii).
Keywords: Foodborne pathogen; Genomics; Surveillance.
© 2024. This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply.