Next-generation sequencing is a common method for analysing microbial community diversity and composition. Configuring an appropriate sequence processing strategy within the variety of tools and methods is a nontrivial task and can considerably influence the resulting community characteristics. We analysed the V4 region of 18S rRNA gene sequences of marine samples by 454-pyrosequencing. Along this process, we generated several data sets with QIIME, mothur, and a custom-made pipeline based on DNAStar and the phylogenetic tree-based PhyloAssigner. For all processing strategies, default parameter settings and punctual variations were used. Our results revealed strong differences in total number of operational taxonomic units (OTUs), indicating that sequence preprocessing and clustering had a major impact on protist diversity estimates. However, diversity estimates of the abundant biosphere (abundance of ≥1%) were reproducible for all conducted processing pipeline versions. A qualitative comparison of diatom genera emphasized strong differences between the pipelines in which phylogenetic placement of sequences came closest to light microscopy-based diatom identification. We conclude that diversity studies using different sequence processing strategies are comparable if the focus is on higher taxonomic levels, and if abundance thresholds are used to filter out OTUs of the rare biosphere.
Keywords: amplicon metagenetics; biodiversity; diatoms; next-generation sequencing; protists.
© 2017 John Wiley & Sons Ltd.