One of the most widely employed methods in metagenomics is the amplification and sequencing of the highly conserved ribosomal RNA (rRNA) genes from organisms in complex microbial communities. rRNA surveys, typically using the 16S rRNA gene for prokaryotic identification, provide information about the total diversity and taxonomic affiliation of organisms present in a sample. Greatly enhanced by high-throughput sequencing, these surveys have uncovered the remarkable diversity of uncultured organisms and revealed unappreciated ecological roles ranging from nutrient cycling to human health. This chapter outlines the best practices for comparative analyses of microbial community surveys. We explain how to transform raw data into meaningful units for further analysis and discuss how to calculate sample diversity and community distance metrics. Finally, we outline how to find associations of species with specific metadata and true correlations between species from compositional data. We focus on data generated by next-generation sequencing platforms, using the Illumina platform as a test case, because of its widespread use especially among researchers just entering the field.
Keywords: 16S ribosomal RNA survey; community distance metric; correlation analysis; diversity estimates; operational taxonomic units.
© 2013 Elsevier Inc. All rights reserved.