Background: Identification of bacteria in human vaginal specimens is commonly performed using 16S ribosomal RNA (rRNA) gene sequences. However, studies utilize different 16S primer sets, sequence databases, and parameters for sample and database clustering. Our goal was to assess the ability of these methods to detect common species of vaginal bacteria.
Methods: We performed an in silico analysis of 16S rRNA gene primer sets, targeting different hypervariable regions. Using vaginal samples from women with bacterial vaginosis, we sequenced 16S genes using the V1-V3, V3-V4, and V4 primer sets. For analysis, we used an extended Greengenes database including 16S gene sequences from vaginal bacteria not already present. We compared results with those obtained using the SILVA 16S database. Using multiple database and sample clustering parameters, each primer set's ability to detect common vaginal bacteria at the species level was determined. We also compared these methods to the use of DADA2 for denoising and clustering of sequence reads.
Results: V4 sequence reads clustered at 99% identity and using the 99% clustered, extended Greengenes database provided optimal species-level identification of vaginal bacteria.
Conclusions: This study is a first step toward standardizing methods for 16S rRNA gene sequencing and bioinformatics analysis of vaginal microbiome data.