Efficient secondary database driven annotation using model organism sequences

In Silico Biol. 2006;6(5):363-72.

Abstract

The use of sequences from specific organisms for annotation requires that it does not represent great loss of information and that the sequences available suffice for annotation. In order to investigate whether or not sequences from model organisms may suffice for annotation of sequences from the trematode Schistosoma mansoni, we performed local BLAST searches of S. mansoni sequences against other organisms sequences present in the NCBI database nr. Results have been inserted into a relational database and hits to sequences from three model organisms, Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens have been computed and compared to hits to sequences from other organisms present in nr; score values of each alignment have also been registered. Our observations have shown that a large fraction of orthologous proteins exists in the set of sequences from the three model organisms selected, and therefore a similar fraction of transcripts can be annotated when using either nr or model organism datasets. Moreover, hits to model organisms' sequences are largely as informative as nr. Results suggest that model organisms provide a reliable set of sequences to use as a reference database for S. mansoni sequence annotation, showing the clear possibility of using a restricted dataset of expected better quality for functional annotation and therefore supporting secondary database driven annotation approaches.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Caenorhabditis elegans / genetics
  • Computer Simulation
  • Databases, Genetic*
  • Drosophila melanogaster / genetics
  • Expressed Sequence Tags
  • Genome
  • Models, Genetic*
  • Proteome
  • Schistosoma mansoni / genetics
  • Sequence Alignment

Substances

  • Proteome