Comparative analysis of functional metagenomic annotation and the mappability of short reads

Rogan Carr; Elhanan Borenstein

doi:10.1371/journal.pone.0105776

Comparative analysis of functional metagenomic annotation and the mappability of short reads

PLoS One. 2014 Aug 22;9(8):e105776. doi: 10.1371/journal.pone.0105776. eCollection 2014.

Authors

Rogan Carr¹, Elhanan Borenstein²

Affiliations

¹ Department of Genome Sciences, University of Washington, Seattle, WA, United States of America.
² Department of Genome Sciences, University of Washington, Seattle, WA, United States of America; Department of Computer Science and Engineering, University of Washington, Seattle, WA, United States of America; Santa Fe Institute, Santa Fe, NM, United States of America.

Abstract

To assess the functional capacities of microbial communities, including those inhabiting the human body, shotgun metagenomic reads are often aligned to a database of known genes. Such homology-based annotation practices critically rely on the assumption that short reads can map to orthologous genes of similar function. This assumption, however, and the various factors that impact short read annotation, have not been systematically evaluated. To address this challenge, we generated an extremely large database of simulated reads (totaling 15.9 Gb), spanning over 500,000 microbial genes and 170 curated genomes and including, for many genomes, every possible read of a given length. We annotated each read using common metagenomic protocols, fully characterizing the effect of read length, sequencing error, phylogeny, database coverage, and mapping parameters. We additionally rigorously quantified gene-, genome-, and protocol-specific annotation biases. Overall, our findings provide a first comprehensive evaluation of the capabilities and limitations of functional metagenomic annotation, providing crucial goal-specific best-practice guidelines to inform future metagenomic research.

Publication types

Comparative Study
Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Computational Biology / methods
Genome, Archaeal
Genome, Bacterial
Humans
Metagenomics / methods*
Microbial Consortia / genetics*
Microbiota / genetics
Molecular Sequence Annotation / methods*
Phylogeny
Sequence Analysis, DNA / methods
Software
Streptococcus / genetics

Abstract

Publication types

MeSH terms

Grants and funding