Evaluation of an optimized germline exomes pipeline using BWA-MEM2 and Dragen-GATK tools

PLoS One. 2023 Aug 3;18(8):e0288371. doi: 10.1371/journal.pone.0288371. eCollection 2023.

Abstract

The next-generation sequencing (NGS) technology represents a significant advance in genomics and medical diagnosis. Nevertheless, the time it takes to perform sequencing, data analysis, and variant interpretation is a bottleneck in using next-generation sequencing in precision medicine. For accurate and efficient performance in clinical diagnostic lab practice, a consistent data analysis pipeline is necessary to avoid false variant calls and achieve optimum accuracy. This study aims to compare the performance of two NGS data analysis pipeline compartments, including short-read mapping (BWA-MEM and BWA-MEM2) and variant calling (GATK-HaplotypeCaller and DRAGEN-GATK). On Whole Exome Sequencing (WES) data, computational performance was assessed using several criteria, including mapping efficiency, variant calling performance, false positive calls rate, and time. We examined four gold-standard WES data sets: Ashkenazim father (NA24149), Ashkenazim mother (NA24143), Ashkenazim son (NA24385), and Asian son (NA25631). In addition, eighteen exome samples were analyzed based on different read counts, and coverage was used precisely in the run-time assessment. By using BWA-MEM 2 and Dragen-GATK, this study achieved faster and more accurate detection for SNVs and indels than the standard GATK Best Practices workflow. This systematic comparison will enable the bioinformatics community to develop a more efficient and faster solution for analyzing NGS data.

MeSH terms

  • Computational Biology
  • Exome* / genetics
  • Genomics
  • High-Throughput Nucleotide Sequencing
  • Polymorphism, Single Nucleotide
  • Software*

Grants and funding

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.