ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences

C Iseli; C V Jongeneel; P Bucher

ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences

Proc Int Conf Intell Syst Mol Biol. 1999:138-48.

Authors

C Iseli¹, C V Jongeneel, P Bucher

Affiliation

¹ Swiss Institute of Bioinformatics, Epalinges, Switzerland. [email protected]

PMID: 10786296

Abstract

One of the problems associated with the large-scale analysis of unannotated, low quality EST sequences is the detection of coding regions and the correction of frameshift errors that they often contain. We introduce a new type of hidden Markov model that explicitly deals with the possibility of errors in the sequence to analyze, and incorporates a method for correcting these errors. This model was implemented in an efficient and robust program, ESTScan. We show that ESTScan can detect and extract coding regions from low-quality sequences with high selectivity and sensitivity, and is able to accurately correct frameshift errors. In the framework of genome sequencing projects, ESTScan could become a very useful tool for gene discovery, for quality control, and for the assembly of contigs representing the coding regions of genes.

MeSH terms

Algorithms
Amino Acid Sequence
Base Sequence
DNA, Complementary / genetics
Exons
Expressed Sequence Tags*
Gene Library
Markov Chains
Molecular Sequence Data
Reading Frames
Reproducibility of Results
Sensitivity and Specificity
Sequence Analysis, DNA*
Sequence Homology, Amino Acid
Software*

Substances

DNA, Complementary