An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data

Infect Genet Evol. 2020 Apr:79:104152. doi: 10.1016/j.meegid.2019.104152. Epub 2019 Dec 24.

Abstract

Whole-genome sequencing (WGS) data allow for an inference of Mycobacterium tuberculosis (Mtb) clusters by using a pairwise genetic distance of ≤12 single nucleotide polymorphisms (SNPs) as a threshold. However, a problem of discrepancies in numbers of SNPs and genetic distance measurement is a great concern when combining WGS data from different next generation sequencing (NGS) platforms. We performed SNP variant calling on WGS data of 9 multidrug-resistant (MDR-TB), 3 extensively drug-resistant tuberculosis (XDR-TB) and a standard M. tuberculosis strain H37Rv from an Illumina/NextSeq500 and an Ion Torrent PGM. Variant calls were obtained using four different common variant calling tools, including Genome Analysis Toolkit (GATK) HaplotypeCaller (GATK-VCF workflow), GATK HaplotypeCaller and GenotypeGVCFs (GATK-GVCF workflow), SAMtools, and VarScan 2. Cross-platform pairwise SNP differences, minimum spanning networks and average nucleotide identity (ANI) were analysed to measure performance of the variant calling tools. Minimum pairwise SNP differences ranged from 2 to 14 SNPs when using GVCF workflow while maximum pairwise SNP differences ranged from 7 to 158 SNPs when using VarScan 2. ANI comparison between SNPs data from NextSeq500 and PGM of MDR-TB and XDR-TB showed maximum ANI of 99.7% and 99.0%, respectively, with GVCF workflow while the other SNP calling results showed lower ANI in a range of 98.6% to 95.1%. In this study, we suggest that the GVCF workflow showed the best performing variant caller to avoid cross-platform pairwise SNP differences.

Keywords: Cluster analysis; Mycobacterium tuberculosis; Next-generation sequencing; Outbreak investigation; Tuberculosis; Whole-genome sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Drug Resistance, Multiple, Bacterial
  • Extensively Drug-Resistant Tuberculosis / microbiology
  • Genome, Bacterial
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Mycobacterium tuberculosis / classification*
  • Mycobacterium tuberculosis / genetics
  • Polymorphism, Single Nucleotide*
  • Tuberculosis / classification*
  • Tuberculosis / microbiology
  • Tuberculosis, Multidrug-Resistant / microbiology
  • Whole Genome Sequencing / methods*
  • Workflow