Comparison of SNP-based subtyping workflows for bacterial isolates using WGS data, applied to Salmonella enterica serotype Typhimurium and serotype 1,4,[5],12:i:

PLoS One. 2018 Feb 6;13(2):e0192504. doi: 10.1371/journal.pone.0192504. eCollection 2018.

Abstract

Whole genome sequencing represents a promising new technology for subtyping of bacterial pathogens. Besides the technological advances which have pushed the approach forward, the last years have been marked by considerable evolution of the whole genome sequencing data analysis methods. Prior to application of the technology as a routine epidemiological typing tool, however, reliable and efficient data analysis strategies need to be identified among the wide variety of the emerged methodologies. In this work, we have compared three existing SNP-based subtyping workflows using a benchmark dataset of 32 Salmonella enterica subsp. enterica serovar Typhimurium and serovar 1,4,[5],12:i:- isolates including five isolates from a confirmed outbreak and three isolates obtained from the same patient at different time points. The analysis was carried out using the original (high-coverage) and a down-sampled (low-coverage) datasets and two different reference genomes. All three tested workflows, namely CSI Phylogeny-based workflow, CFSAN-based workflow and PHEnix-based workflow, were able to correctly group the confirmed outbreak isolates and isolates from the same patient with all combinations of reference genomes and datasets. However, the workflows differed strongly with respect to the SNP distances between isolates and sensitivity towards sequencing coverage, which could be linked to the specific data analysis strategies used therein. To demonstrate the effect of particular data analysis steps, several modifications of the existing workflows were also tested. This allowed us to propose data analysis schemes most suitable for routine SNP-based subtyping applied to S. Typhimurium and S. 1,4,[5],12:i:-. Results presented in this study illustrate the importance of using correct data analysis strategies and to define benchmark and fine-tune parameters applied within routine data analysis pipelines to obtain optimal results.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genes, Bacterial
  • Phylogeny
  • Polymorphism, Single Nucleotide*
  • Salmonella enterica / classification
  • Salmonella enterica / genetics*
  • Salmonella typhimurium / classification
  • Salmonella typhimurium / genetics*
  • Whole Genome Sequencing*

Grants and funding

This work was supported by RP/PJ WIV-ISP (NeXSplorer.iph), the Federal Public Service of Health, Food Chain Safety and Environment. The National Reference Centre for Salmonella and Shigella is partially supported by the Belgian Ministry of Social Affairs through a fund within the Health Insurance System. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.