ACDC, a global database of amphibian cytochrome-b sequences using reproducible curation for GenBank records

Sci Data. 2020 Aug 13;7(1):268. doi: 10.1038/s41597-020-00598-9.

Abstract

Genetic data are a crucial and exponentially growing resource across all biological sciences, yet curated databases are scarce. The widespread occurrence of sequence and (meta)data errors in public repositories calls for comprehensive improvements of curation protocols leading to robust research and downstream analyses. We collated and curated all available GenBank cytochrome-b sequences for amphibians, a benchmark marker in this globally declining vertebrate clade. The Amphibia's Curated Database of Cytochrome-b (ACDC) consists of 36,514 sequences representing 2,309 species from 398 genera (median = 2 with 50% interquartile ranges of 1-7 species/genus). We updated the taxonomic identity of >4,800 sequences (ca. 13%) and found 2,359 (6%) conflicting sequences with 84% of the errors originating from taxonomic misidentifications. The database (accessible at https://doi.org/10.6084/m9.figshare.9944759 ) also includes an R script to replicate our study for other loci and taxonomic groups. We provide recommendations to improve genetic-data quality in public repositories and flag species for which there is a need for taxonomic refinement in the face of increased rate of amphibian extinctions in the Anthropocene.

Publication types

  • Dataset
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amphibians / classification
  • Amphibians / genetics*
  • Animals
  • Cytochromes b / genetics*
  • Databases, Nucleic Acid*

Substances

  • Cytochromes b