Benchmarking short-, long- and hybrid-read assemblers for metagenome sequencing of complex microbial communities

Microbiology (Reading). 2024 Jun;170(6):001469. doi: 10.1099/mic.0.001469.

Abstract

Metagenome community analyses, driven by the continued development in sequencing technology, is rapidly providing insights in many aspects of microbiology and becoming a cornerstone tool. Illumina, Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are the leading technologies, each with their own advantages and drawbacks. Illumina provides accurate reads at a low cost, but their length is too short to close bacterial genomes. Long reads overcome this limitation, but these technologies produce reads with lower accuracy (ONT) or with lower throughput (PacBio high-fidelity reads). In a critical first analysis step, reads are assembled to reconstruct genomes or individual genes within the community. However, to date, the performance of existing assemblers has never been challenged with a complex mock metagenome. Here, we evaluate the performance of current assemblers that use short, long or both read types on a complex mock metagenome consisting of 227 bacterial strains with varying degrees of relatedness. We show that many of the current assemblers are not suited to handle such a complex metagenome. In addition, hybrid assemblies do not fulfil their potential. We conclude that ONT reads assembled with CANU and Illumina reads assembled with SPAdes offer the best value for reconstructing genomes and individual genes of complex metagenomes, respectively.

Keywords: DNA mock metagenome; assembly; long-read sequencing; sequencer benchmark; software benchmark.

MeSH terms

  • Bacteria* / classification
  • Bacteria* / genetics
  • Bacteria* / isolation & purification
  • Genome, Bacterial / genetics
  • High-Throughput Nucleotide Sequencing* / methods
  • Metagenome*
  • Metagenomics* / methods
  • Microbiota / genetics
  • Sequence Analysis, DNA* / methods