LMAS: evaluating metagenomic short de novo assembly methods through defined communities

Gigascience. 2022 Dec 28:12:giac122. doi: 10.1093/gigascience/giac122.

Abstract

Background: The de novo assembly of raw sequence data is key in metagenomic analysis. It allows recovering draft genomes from a pool of mixed raw reads, yielding longer sequences that offer contextual information and provide a more complete picture of the microbial community.

Findings: To better compare de novo assemblers for metagenomic analysis, LMAS (Last Metagenomic Assembler Standing) was developed as a flexible platform allowing users to evaluate assembler performance given known standard communities. Overall, in our test datasets, k-mer De Bruijn graph assemblers outperformed the alternative approaches but came with a greater computational cost. Furthermore, assemblers branded as metagenomic specific did not consistently outperform other genomic assemblers in metagenomic samples. Some assemblers still in use, such as ABySS, MetaHipmer2, minia, and VelvetOptimiser, perform relatively poorly and should be used with caution when assembling complex samples. Meaningful strain resolution at the single-nucleotide polymorphism level was not achieved, even by the best assemblers tested.

Conclusions: The choice of a de novo assembler depends on the computational resources available, the replicon of interest, and the major goals of the analysis. No single assembler appeared an ideal choice for short-read metagenomic prokaryote replicon assembly, each showing specific strengths. The choice of metagenomic assembler should be guided by user requirements and characteristics of the sample of interest, and LMAS provides an interactive evaluation platform for this purpose. LMAS is open source, and the workflow and its documentation are available at https://github.com/B-UMMI/LMAS and https://lmas.readthedocs.io/, respectively.

Keywords: de novo assembly; benchmark; draft genome quality; shotgun metagenomics; simulation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing / methods
  • Metagenome
  • Sequence Analysis, DNA / methods
  • Software*