Using multiple alignments to improve gene prediction

J Comput Biol. 2006 Mar;13(2):379-93. doi: 10.1089/cmb.2006.13.379.

Abstract

The multiple species de novo gene prediction problem can be stated as follows: given an alignment of genomic sequences from two or more organisms, predict the location and structure of all protein-coding genes in one or more of the sequences. Here, we present a new system, N-SCAN (a.k.a. TWINSCAN 3.0), for addressing this problem. N-SCAN can model the phylogenetic relationships between the aligned genome sequences, context dependent substitution rates, and insertions and deletions. An implementation of N-SCAN was created and used to generate predictions for the entire human genome and the genome of the fruit fly Drosophila melanogaster. Analyses of the predictions reveal that N-SCAN's accuracy in both human and fly exceeds that of all previously published whole-genome de novo gene predictors.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Animals
  • DNA / chemistry
  • DNA / genetics*
  • Databases, Factual
  • Drosophila melanogaster / genetics
  • Exons / genetics*
  • Genome*
  • Humans
  • Models, Genetic
  • Predictive Value of Tests
  • Sequence Alignment* / methods
  • Sequence Alignment* / statistics & numerical data
  • Software*

Substances

  • DNA