An organism-specific method to rank predicted coding regions in Trypanosoma brucei

Nucleic Acids Res. 2003 Oct 15;31(20):5877-85. doi: 10.1093/nar/gkg798.

Abstract

Genome annotation in differently evolved organisms presents challenges because the lack of sequence-based homology limits the ability to determine the function of putative coding regions. To provide an alternative to annotation by sequence homology, we developed a method that takes advantage of unusual trypanosomatid biology and skews in nucleotide composition between coding regions and upstream regions to rank putative open reading frames based on the likelihood of coding. The method is 93% accurate when tested on known genes. We have applied our method to the full complement of open reading frames on Chromosome I of Trypanosoma brucei, and we can predict with high confidence that 226 putative coding regions are likely to be functional. Methods such as the one described here for discriminating true coding regions are critical for genome annotation when other sources of evidence for function are limited.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Computational Biology / methods*
  • Databases, Nucleic Acid
  • Genes, Protozoan / genetics
  • Molecular Sequence Data
  • Open Reading Frames / genetics*
  • Protozoan Proteins / genetics
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Analysis, DNA / methods
  • Trypanosoma brucei brucei / genetics*

Substances

  • Protozoan Proteins