CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction

Genome Biol. 2007;8(12):R269. doi: 10.1186/gb-2007-8-12-r269.

Abstract

We describe CONTRAST, a gene predictor which directly incorporates information from multiple alignments rather than employing phylogenetic models. This is accomplished through the use of discriminative machine learning techniques, including a novel training algorithm. We use a two-stage approach, in which a set of binary classifiers designed to recognize coding region boundaries is combined with a global model of gene structure. CONTRAST predicts exact coding region structures for 65% more human genes than the previous state-of-the-art method, misses 46% fewer exons and displays comparable gains in specificity.

MeSH terms

  • Algorithms
  • Animals
  • Artificial Intelligence
  • Base Sequence
  • Exons
  • Expressed Sequence Tags
  • Genome, Human
  • Genomics*
  • Humans
  • Proteins / genetics*
  • Sequence Alignment / methods*
  • Software*

Substances

  • Proteins