Bioinformatics issues for automating the annotation of genomic sequences

Genome Inform. 2001:12:204-11.

Abstract

The rapid explosion in the amount of biological data being generated worldwide is surpassing efforts to manage analysis of the data. As part of an ongoing project to automate and manage bioinformatics analysis, the authors have designed and implemented a simple automated annotation system, which is described in this paper. The system is applied to existing GenBank/DDBJ/EMBL entries and compared with existing annotations to illustrate not only potential errors but also that they are generally not up-to-date, as a result of new versions of analysis tools and updates of genomic repositories. We highlight the important Bioinformatics issues of storage and management of information to ensure data and results are kept up-to-date in light of new information becoming available. Surprisingly, from just four database entries, a significant number of new features were found. We describe the results as well as identify important issues that need to be addressed in order to automate the re-analysis/re-annotation of genomic sequences within a reasonable timeframe.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology*
  • Databases, Nucleic Acid
  • Expressed Sequence Tags
  • Genome, Human
  • Genomics / statistics & numerical data*
  • Humans
  • Sequence Alignment / statistics & numerical data
  • Software