SLIMM: species level identification of microorganisms from metagenomes

PeerJ. 2017 Mar 28:5:e3138. doi: 10.7717/peerj.3138. eCollection 2017.

Abstract

Identification and quantification of microorganisms is a significant step in studying the alpha and beta diversities within and between microbial communities respectively. Both identification and quantification of a given microbial community can be carried out using whole genome shotgun sequences with less bias than when using 16S-rDNA sequences. However, shared regions of DNA among reference genomes and taxonomic units pose a significant challenge in assigning reads correctly to their true origins. The existing microbial community profiling tools commonly deal with this problem by either preparing signature-based unique references or assigning an ambiguous read to its least common ancestor in a taxonomic tree. The former method is limited to making use of the reads which can be mapped to the curated regions, while the latter suffer from the lack of uniquely mapped reads at lower (more specific) taxonomic ranks. Moreover, even if the tools exhibited good performance in calling the organisms present in a sample, there is still room for improvement in determining the correct relative abundance of the organisms. We present a new method Species Level Identification of Microorganisms from Metagenomes (SLIMM) which addresses the above issues by using coverage information of reference genomes to remove unlikely genomes from the analysis and subsequently gain more uniquely mapped reads to assign at lower ranks of a taxonomic tree. SLIMM is based on a few, seemingly easy steps which when combined create a tool that outperforms state-of-the-art tools in run-time and memory usage while being on par or better in computing quantitative and qualitative information at species-level.

Keywords: Metagenomics; Microbial communities; Microbiology; Microorganisms; NGS data; Taxonomic profiling.

Grants and funding

This work is supported by the International Max Planck Research School for Computational Biology and Scientific Computing and by the InfectControl 2020 Project (TFP-TV4). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.