Background: The application of next-generation sequencing to the study of the vaginal microbiome is revealing the spectrum of microbial communities that inhabit the human vagina. High-resolution identification of bacterial taxa, minimally to the species level, is necessary to fully understand the association of the vaginal microbiome with bacterial vaginosis, sexually transmitted infections, pregnancy complications, menopause, and other physiological and infectious conditions. However, most current taxonomic assignment strategies based on metagenomic 16S rDNA sequence analysis provide at best a genus-level resolution. While surveys of 16S rRNA gene sequences are common in microbiome studies, few well-curated, body-site-specific reference databases of 16S rRNA gene sequences are available, and no such resource is available for vaginal microbiome studies.
Results: We constructed the Vaginal 16S rDNA Reference Database, a comprehensive and non-redundant database of 16S rDNA reference sequences for bacterial taxa likely to be associated with vaginal health, and we developed STIRRUPS, a new method that employs the USEARCH algorithm with a curated reference database for rapid species-level classification of 16S rDNA partial sequences. The method was applied to two datasets of V1-V3 16S rDNA reads: one generated from a mock community containing DNA from six bacterial strains associated with vaginal health, and a second generated from over 1,000 mid-vaginal samples collected as part of the Vaginal Human Microbiome Project at Virginia Commonwealth University. In both datasets, STIRRUPS, used in conjunction with the Vaginal 16S rDNA Reference Database, classified more than 95% of processed reads to a species-level taxon using a 97% global identity threshold for assignment.
Conclusions: This database and method provide accurate species-level classifications of metagenomic 16S rDNA sequence reads that will be useful for analysis and comparison of microbiome profiles from vaginal samples. STIRRUPS can be used to classify 16S rDNA sequence reads from other ecological niches if an appropriate reference database of 16S rDNA sequences is available.