ProtEST: protein multiple sequence alignments from expressed sequence tags

J A Cuff; E Birney; M E Clamp; G J Barton

doi:10.1093/bioinformatics/16.2.111

ProtEST: protein multiple sequence alignments from expressed sequence tags

Bioinformatics. 2000 Feb;16(2):111-6. doi: 10.1093/bioinformatics/16.2.111.

Authors

J A Cuff¹, E Birney, M E Clamp, G J Barton

Affiliation

¹ European Molecular Biology Laboratory Outstation, European Bioinformatics Institute, Cambridge, UK.

PMID: 10842731
DOI: 10.1093/bioinformatics/16.2.111

Abstract

Motivation: An automatic sequence searching method (ProtEST) is described which constructs multiple protein sequence alignments from protein sequences and translated expressed sequence tags (ESTs). ProtEST is more effective than a simple TBLASTN search of the query against the EST database, as the sequences are automatically clustered, assembled, made non-redundant, checked for sequence errors, translated into protein and then aligned and displayed.

Results: A ProtEST search found a non-redundant, translated, error- and length-corrected EST sequence for > 58% of sequences when single sequences from 1407 Pfam-A seed alignments were used as the probe. The average family size of the resulting alignments of translated EST sequences contained > 10 sequences. In a cross-validated test of protein secondary structure prediction, alignments from the new procedure led to an improvement of 3.4% average Q3 prediction accuracy over single sequences.

Availability: The ProtEST method is available as an Internet World Wide Web service http://barton.ebi.ac.uk/servers/protest.html+ ++ The Wise2 package for protein and genomic comparisons and the ProtESTWise script can be found at http://www.sanger.ac.uk/Software/Wise2

Contact: [email protected]

MeSH terms

Amino Acid Sequence
Expressed Sequence Tags*
Molecular Sequence Data
Protein Biosynthesis
Proteins / analysis*
Sequence Alignment / methods*

Substances

Proteins