ContigExtender: a new approach to improving de novo sequence assembly for viral metagenomics data

BMC Bioinformatics. 2021 Mar 12;22(1):119. doi: 10.1186/s12859-021-04038-2.

Abstract

Background: Metagenomics is the study of microbial genomes for pathogen detection and discovery in human clinical, animal, and environmental samples via Next-Generation Sequencing (NGS). Metagenome de novo sequence assembly is a crucial analytical step in which longer contigs, ideally whole chromosomes/genomes, are formed from shorter NGS reads. However, the contigs generated from the de novo assembly are often very fragmented and rarely longer than a few kilo base pairs (kb). Therefore, a time-consuming extension process is routinely performed on the de novo assembled contigs.

Results: To facilitate this process, we propose a new tool for metagenome contig extension after de novo assembly. ContigExtender employs a novel recursive extending strategy that explores multiple extending paths to achieve highly accurate longer contigs. We demonstrate that ContigExtender outperforms existing tools in synthetic, animal, and human metagenomics datasets.

Conclusions: A novel software tool ContigExtender has been developed to assist and enhance the performance of metagenome de novo assembly. ContigExtender effectively extends contigs from a variety of sources and can be incorporated in most viral metagenomics analysis pipelines for a wide variety of applications, including pathogen detection and viral discovery.

Keywords: De novo assembly; Metagenomics; Next-Gen Sequencing; Pathogen detection; Viral discovery.

MeSH terms

  • Algorithms
  • Animals
  • Genome, Viral*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Metagenome*
  • Metagenomics*
  • Sequence Analysis, DNA
  • Software*