Value, but high costs in post-deposition data curation

Database (Oxford). 2016 Feb 9:2016:bav126. doi: 10.1093/database/bav126. Print 2016.

Abstract

Discoverability of sequence data in primary data archives is proportional to the richness of contextual information associated with the data. Here, we describe an exercise in the improvement of contextual information surrounding sample records associated with metagenomics sequence reads available in the European Nucleotide Archive. We outline the annotation process and summarize findings of this effort aimed at increasing usability of publicly available environmental data. Furthermore, we emphasize the benefits of such an exercise and detail its costs. We conclude that such a third party annotation approach is expensive and has value as an element of curation, but should form only part of a more sustainable submitter-driven approach. Database URL: http://www.ebi.ac.uk/ena.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / economics*
  • Data Collection
  • Databases, Nucleic Acid / economics*
  • Ecosystem
  • Europe
  • Geography
  • Humans
  • Metagenomics*
  • Microbiota
  • Molecular Sequence Annotation
  • Semantics
  • Sequence Analysis