Predicting reliable regions in protein sequence alignments

Melissa Cline; Richard Hughey; Kevin Karplus

doi:10.1093/bioinformatics/18.2.306

Predicting reliable regions in protein sequence alignments

Bioinformatics. 2002 Feb;18(2):306-14. doi: 10.1093/bioinformatics/18.2.306.

Authors

Melissa Cline¹, Richard Hughey, Kevin Karplus

Affiliation

¹ Center for Biomolecular Science and Engineering, Jack Baskin School of Engineering, University of California, Santa Cruz, CA 95064, USA. [email protected]

PMID: 11847078
DOI: 10.1093/bioinformatics/18.2.306

Abstract

Motivation: Protein sequence alignments have a myriad of applications in bioinformatics, including secondary and tertiary structure prediction, homology modeling, and phylogeny. Unfortunately, all alignment methods make mistakes, and mistakes in alignments often yield mistakes in their application. Thus, a method to identify and remove suspect alignment positions could benefit many areas in protein sequence analysis.

Results: We tested four predictors of alignment position reliability, including near-optimal alignment information, column score, and secondary structural information. We validated each predictor against a large library of alignments, removing positions predicted as unreliable. Near-optimal alignment information was the best predictor, removing 70% of the substantially-misaligned positions and 58% of the over-aligned positions, while retaining 86% of those aligned accurately.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Computational Biology
Neural Networks, Computer
Proteins / genetics*
Sequence Alignment / statistics & numerical data*
Software

Substances

Proteins