The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study

Genome Res. 2002 Jan;12(1):198-202. doi: 10.1101/gr.200901.

Abstract

Comparative genomics is a simple, powerful way to increase the accuracy of gene prediction. In this study, we show the utility of a simple test for the identification of protein-coding exons using human/mouse sequence comparisons. The test takes advantage of the fact that in the vast majority of coding regions, synonymous substitutions (K(S)) occur much more frequently than nonsynonymous ones (K(A)) and uses the K(A)/K(S) ratio as the criterion. We show the following: (1) most of the human and mouse exons are sufficiently long and have a suitable degree of sequence divergence for the test to perform reliably; (2) the test is suited for the identification of long exons and single exon genes, which are difficult to predict by current methods; (3) the test has a false-negative rate, lower than most of current gene prediction methods and a false-positive rate lower than all current methods; (4) the test has been automated and can be used in combination with other existing gene-prediction methods.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Computational Biology / methods*
  • Computational Biology / statistics & numerical data
  • Computer Simulation*
  • Exons / genetics
  • False Positive Reactions
  • Genes / genetics
  • Genome*
  • Genome, Human
  • Humans
  • Mice
  • Models, Genetic*
  • Proteins / genetics*
  • Proteome / genetics
  • Sequence Homology, Nucleic Acid

Substances

  • Proteins
  • Proteome