V-pipe 3.0: a sustainable pipeline for within-sample viral genetic diversity estimation

Gigascience. 2024 Jan 2:13:giae065. doi: 10.1093/gigascience/giae065.

Abstract

The large amount and diversity of viral genomic datasets generated by next-generation sequencing technologies poses a set of challenges for computational data analysis workflows, including rigorous quality control, scaling to large sample sizes, and tailored steps for specific applications. Here, we present V-pipe 3.0, a computational pipeline designed for analyzing next-generation sequencing data of short viral genomes. It is developed to enable reproducible, scalable, adaptable, and transparent inference of genetic diversity of viral samples. By presenting 2 large-scale data analysis projects, we demonstrate the effectiveness of V-pipe 3.0 in supporting sustainable viral genomic data science.

Keywords: NGS data processing; benchmark; global haplotype reconstruction; next-generation sequencing; sustainable data analysis workflow; viral genetic diversity.

MeSH terms

  • Computational Biology / methods
  • Genetic Variation*
  • Genome, Viral*
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing* / methods
  • Humans
  • Software*
  • Viruses / genetics