A metagenomics portal for a democratized sequencing world

Methods Enzymol. 2013:531:487-523. doi: 10.1016/B978-0-12-407863-5.00022-8.

Abstract

The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST addresses many of these challenges for diverse datasets, including amplicon datasets, shotgun metagenomes, and metatranscriptomes. The changes from version 2 to version 3 include the addition of a dedicated gene calling stage using FragGenescan, clustering of predicted proteins at 90% identity, and the use of BLAT for the computation of similarities. Together with changes in the underlying software infrastructure, this has enabled the dramatic scaling up of pipeline throughput while remaining on a limited hardware budget. The Web-based service allows upload, fully automated analysis, and visualization of results. As a result of the plummeting cost of sequencing and the readily available analytical power of MG-RAST, over 78,000 metagenomic datasets have been analyzed, with over 12,000 of them publicly available in MG-RAST.

Keywords: Automated analysis of metagenomes; Metadata-enabled data discovery; Metatranscriptomes and amplicon data; Next-generation sequence analysis; Public archive for data and analysis results; Scalable analysis pipeline.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bacteria / classification
  • Bacteria / genetics
  • Computational Biology / methods*
  • Genome, Bacterial
  • High-Throughput Nucleotide Sequencing
  • Internet
  • Metagenomics*
  • Software*