Distribution of Indel lengths

B Qian; R A Goldstein

doi:10.1002/prot.1129

Distribution of Indel lengths

Proteins. 2001 Oct 1;45(1):102-4. doi: 10.1002/prot.1129.

Authors

B Qian¹, R A Goldstein

Affiliation

¹ Biophysics Research Division, University of Michigan, Ann Arbor, USA.

PMID: 11536366
DOI: 10.1002/prot.1129

Abstract

Protein sequence alignment has become a widely used method in the study of newly sequenced proteins. Most sequence alignment methods use an affine gap penalty to assign scores to insertions and deletions. Although affine gap penalties represent the relative ease of extending a gap compared with initializing a gap, it is still an obvious oversimplification of the real processes that occur during sequence evolution. To improve the efficiency of sequence alignment methods and to obtain a better understanding of the process of sequence evolution, we wanted to find a more accurate model of insertions and deletions in homologous proteins. In this work, we extract the probability of a gap occurrence and the resulting gap length distribution in distantly related proteins (sequence identity < 25%) using alignments based on their common structures. We observe a distribution of gaps that can be fitted with a multiexponential with four distinct components. The results suggest new approaches to modeling insertions and deletions in sequence alignments.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Amino Acid Sequence
Amino Acid Substitution
Computational Biology / methods*
Databases, Factual
Entropy
Evolution, Molecular*
Probability
Proteins / chemistry*
Reproducibility of Results
Sequence Alignment / methods*
Sequence Homology, Amino Acid*
Software

Substances

Proteins

Grants and funding

LM0577/LM/NLM NIH HHS/United States