Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models

BMC Bioinformatics. 2004 Mar 5:5:23. doi: 10.1186/1471-2105-5-23.

Abstract

Background: Many current gene prediction methods use only one model to represent protein-coding regions in a genome, and so are less likely to predict the location of genes that have an atypical sequence composition. It is likely that future improvements in gene finding will involve the development of methods that can adequately deal with intra-genomic compositional variation.

Results: This work explores a new approach to gene-prediction, based on the Self-Organizing Map, which has the ability to automatically identify multiple gene models within a genome. The current implementation, named RescueNet, uses relative synonymous codon usage as the indicator of protein-coding potential.

Conclusions: While its raw accuracy rate can be less than other methods, RescueNet consistently identifies some genes that other methods do not, and should therefore be of interest to gene-prediction software developers and genome annotation teams alike. RescueNet is recommended for use in conjunction with, or as a complement to, other gene prediction methods.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromosome Mapping / methods*
  • Chromosome Mapping / statistics & numerical data
  • Codon / genetics
  • Computational Biology / methods
  • Computational Biology / statistics & numerical data
  • Deinococcus / genetics
  • GC Rich Sequence / genetics
  • Genes, Archaeal / genetics*
  • Genes, Bacterial / genetics*
  • Genome, Archaeal
  • Genome, Bacterial
  • Gram-Negative Bacteria / genetics
  • Gram-Positive Endospore-Forming Bacteria / genetics
  • Markov Chains
  • Methanococcus / genetics
  • Models, Genetic*
  • Multigene Family / genetics*
  • Predictive Value of Tests
  • Reading Frames / genetics
  • Software

Substances

  • Codon