Cyanobacteria are photosynthetic bacteria that occupy various habitats across the globe, playing critical roles in many of Earth's biogeochemical cycles both in both aquatic and terrestrial systems. Despite their well-known significance, their taxonomy remains problematic and is the subject of much research. Taxonomic issues of Cyanobacteria have consequently led to inaccurate curation within known reference databases, ultimately leading to problematic taxonomic assignment during diversity studies. Recent advances in sequencing technologies have increased our ability to characterize and understand microbial communities, leading to the generation of thousands of sequences that require taxonomic assignment. We herein propose CyanoSeq (https://zenodo.org/record/7569105), a database of cyanobacterial 16S rRNA gene sequences with curated taxonomy. The taxonomy of CyanoSeq is based on the current state of cyanobacterial taxonomy, with ranks from the domain to genus level. Files are provided for use with common naive Bayes taxonomic classifiers, such as those included in DADA2 or the QIIME2 platform. Additionally, FASTA files are provided for creation of de novo phylogenetic trees with (near) full-length 16S rRNA gene sequences to determine the phylogenetic relationship of cyanobacterial strains and/or ASV/OTUs. The database currently consists of 5410 cyanobacterial 16S rRNA gene sequences along with 123 Chloroplast, Bacterial, and Vampirovibrionia (formally Melainabacteria) sequences.
Keywords: dataset; eDNA; metabarcoding; microbial diversity; phylogeny.
© 2023 Phycological Society of America.