Drosophila genomic sequence annotation using the BLOCKS+ database

Genome Res. 2000 Apr;10(4):543-6. doi: 10.1101/gr.10.4.543.

Abstract

A simple and general homology-based method for gene finding was applied to the 2.9-Mb Drosophila melanogaster Adh region, the target sequence of the Genome Annotation Assessment Project (GASP). Each strand of the entire sequence was used as query of the BLOCKS+ database of conserved regions of proteins. This led to functional assignments for more than one-third of the genes and two-thirds of the transposons. Considering the enormous size of the query, the fact that only two false-positive matches were reported emphasizes the high selectivity of protein family-based methods for gene finding. We used the search results to improve BLOCKS+ by identifying compositionally biased blocks. Our results confirm that protein family databases can be used effectively in automated sequence annotation efforts.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Alcohol Dehydrogenase / genetics
  • Animals
  • Computational Biology
  • Databases, Factual*
  • Drosophila melanogaster / enzymology
  • Drosophila melanogaster / genetics*
  • Genes, Insect / genetics
  • Genome*
  • Sequence Homology, Nucleic Acid
  • Software*

Substances

  • Alcohol Dehydrogenase