Complete Whole Genome Sequences of Escherichia coli Surrogate Strains and Comparison of Sequence Methods with Application to the Food Industry

Dustin A Therrien; Kranti Konganti; Jason J Gill; Brian W Davis; Andrew E Hillhouse; Jordyn Michalik; H Russell Cross; Gary C Smith; Thomas M Taylor; Penny K Riggs

doi:10.3390/microorganisms9030608

Complete Whole Genome Sequences of Escherichia coli Surrogate Strains and Comparison of Sequence Methods with Application to the Food Industry

Microorganisms. 2021 Mar 16;9(3):608. doi: 10.3390/microorganisms9030608.

Affiliations

¹ Department of Animal Science, Texas A&M University, College Station, TX 77843-2471, USA.
² Texas A&M Institute for Genome Sciences and Society, MS 2470, College Station, TX 77843-2470, USA.
³ Department of Veterinary Integrated Biosciences, Texas A&M University, College Station, TX 77843-4461, USA.

Abstract

In 2013, the U.S. Department of Agriculture Food Safety and Inspection Service (USDA-FSIS) began transitioning to whole genome sequencing (WGS) for foodborne disease outbreak- and recall-associated isolate identification of select bacterial species. While WGS offers greater precision, certain hurdles must be overcome before widespread application within the food industry is plausible. Challenges include diversity of sequencing platform outputs and lack of standardized bioinformatics workflows for data analyses. We sequenced DNA from USDA-FSIS approved, non-pathogenic E. coli surrogates and a derivative group of rifampicin-resistant mutants (rif^R) via both Oxford Nanopore MinION and Illumina MiSeq platforms to generate and annotate complete genomes. Genome sequences from each clone were assembled separately so long-read, short-read, and combined sequence assemblies could be directly compared. The combined sequence data approach provides more accurate completed genomes. The genomes from these isolates were verified to lack functional key E. coli elements commonly associated with pathogenesis. Genetic alterations known to confer rif^R were also identified. As the food industry adopts WGS within its food safety programs, these data provide completed genomes for commonly used surrogate strains, with a direct comparison of sequence platforms and assembly strategies relevant to research/testing workflows applicable for both processors and regulators.

Keywords: Escherichia coli; bacterial surrogate; closed genome; high throughput sequencing; long reads; short reads; whole genome sequence.

Grants and funding

AFRI 2012-68003-30155/U.S. Department of Agriculture