Computational methods for high-throughput comparative analyses of natural microbial communities

Methods Enzymol. 2013:531:353-70. doi: 10.1016/B978-0-12-407863-5.00018-6.

Abstract

One of the most widely employed methods in metagenomics is the amplification and sequencing of the highly conserved ribosomal RNA (rRNA) genes from organisms in complex microbial communities. rRNA surveys, typically using the 16S rRNA gene for prokaryotic identification, provide information about the total diversity and taxonomic affiliation of organisms present in a sample. Greatly enhanced by high-throughput sequencing, these surveys have uncovered the remarkable diversity of uncultured organisms and revealed unappreciated ecological roles ranging from nutrient cycling to human health. This chapter outlines the best practices for comparative analyses of microbial community surveys. We explain how to transform raw data into meaningful units for further analysis and discuss how to calculate sample diversity and community distance metrics. Finally, we outline how to find associations of species with specific metadata and true correlations between species from compositional data. We focus on data generated by next-generation sequencing platforms, using the Illumina platform as a test case, because of its widespread use especially among researchers just entering the field.

Keywords: 16S ribosomal RNA survey; community distance metric; correlation analysis; diversity estimates; operational taxonomic units.

MeSH terms

  • Classification
  • Computational Biology / methods*
  • Genetic Variation
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Metagenomics*
  • Microbial Consortia / genetics*
  • Phylogeny
  • RNA, Ribosomal, 16S / genetics*
  • Sequence Analysis, DNA

Substances

  • RNA, Ribosomal, 16S