The mutual information theory for the certification of rice coding sequences

Nicolas Carels; Ramon Vidal; Ricardo Mansilla; Diego Frías

doi:10.1016/j.febslet.2004.05.026

The mutual information theory for the certification of rice coding sequences

FEBS Lett. 2004 Jun 18;568(1-3):155-8. doi: 10.1016/j.febslet.2004.05.026.

Authors

Nicolas Carels¹, Ramon Vidal, Ricardo Mansilla, Diego Frías

Affiliation

¹ Laboratório de Bioinformática, Universidade Estadual de Santa Cruz, Rodovia Ilhéus/Itabuna km. 16, Ilhéus, Bahia, Brazil. [email protected]

PMID: 15196938
DOI: 10.1016/j.febslet.2004.05.026

Abstract

We report here the use of the mutual information theory for the certification of annotated rice coding sequences of both GenBank and TIGR databases. Considering coding sequences larger than 600 bp, we successfully screened out genes with aberrant compositional features. We found that they represent about 10% of both datasets after cleaning for gene redundancy. Most of the rejected accessions showed a different trend in GC3% vs GC2% plot compared to the set of accessions that have been published in international journals. This suggests the existence of a bias in the pattern recognition algorithms used by gene prediction programs.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Information Theory*
Oryza / genetics*