MetaGen: reference-free learning with multiple metagenomic samples

Xin Xing; Jun S Liu; Wenxuan Zhong

doi:10.1186/s13059-017-1323-y

MetaGen: reference-free learning with multiple metagenomic samples

Genome Biol. 2017 Oct 3;18(1):187. doi: 10.1186/s13059-017-1323-y.

Authors

Xin Xing¹, Jun S Liu^{2

3}, Wenxuan Zhong⁴

Affiliations

¹ Department of Statistics, University of Georgia, Athens, 30602, GA, USA.
² Department of Statistics, Harvard University, Cambridge, 02138, MA, USA.
³ Center for Statistical Science & Department of Industry Entering, Beijing, 100084, China.
⁴ Department of Statistics, University of Georgia, Athens, 30602, GA, USA. [email protected].

Abstract

A major goal of metagenomics is to identify and study the entire collection of microbial species in a set of targeted samples. We describe a statistical metagenomic algorithm that simultaneously identifies microbial species and estimates their abundances without using reference genomes. As a trade-off, we require multiple metagenomic samples, usually ≥10 samples, to get highly accurate binning results. Compared to reference-free methods based primarily on k-mer distributions or coverage information, the proposed approach achieves a higher species binning accuracy and is particularly powerful when sequencing coverage is low. We demonstrated the performance of this new method through both simulation and real metagenomic studies. The MetaGen software is available at https://github.com/BioAlgs/MetaGen .

Keywords: Binning; Metagenomics; Mixture model; Multinomial; Unsupervised learning.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.
Research Support, N.I.H., Extramural

MeSH terms

Bayes Theorem
Diabetes Mellitus, Type 2 / microbiology
Humans
Inflammatory Bowel Diseases / microbiology
Metagenomics / methods*
Obesity / microbiology
Software

Abstract

Publication types

MeSH terms

Grants and funding