A Graph-Based Approach for Detecting Sequence Homology in Highly Diverged Repeat Protein Families

Jonathan N Wells; Joseph A Marsh

doi:10.1007/978-1-4939-8736-8_13

A Graph-Based Approach for Detecting Sequence Homology in Highly Diverged Repeat Protein Families

Methods Mol Biol. 2019:1851:251-261. doi: 10.1007/978-1-4939-8736-8_13.

Authors

Jonathan N Wells¹, Joseph A Marsh²

Affiliations

¹ MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK. [email protected].
² MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK.

PMID: 30298401
DOI: 10.1007/978-1-4939-8736-8_13

Abstract

Reconstructing evolutionary relationships in repeat proteins is notoriously difficult due to the high degree of sequence divergence that typically occurs between duplicated repeats. This is complicated further by the fact that proteins with a large number of similar repeats are more likely to produce significant local sequence alignments than proteins with fewer copies of the repeat motif. Furthermore, biologically correct sequence alignments are sometimes impossible to achieve in cases where insertion or translocation events disrupt the order of repeats in one of the sequences being aligned. Combined, these attributes make traditional phylogenetic methods for studying protein families unreliable for repeat proteins, due to the dependence of such methods on accurate sequence alignment.We present here a practical solution to this problem, making use of graph clustering combined with the open-source software package HH-suite, which enables highly sensitive detection of sequence relationships. Carrying out multiple rounds of homology searches via alignment of profile hidden Markov models, large sets of related proteins are generated. By representing the relationships between proteins in these sets as graphs, subsequent clustering with the Markov cluster algorithm enables robust detection of repeat protein subfamilies.

Keywords: Evolution; Graph clustering; Profile-HMM alignment; Protein families; Repeat proteins; Sequence homology.

MeSH terms

Algorithms
Amino Acid Sequence
Databases, Protein
Evolution, Molecular
Phylogeny
Proteins / chemistry*
Proteins / classification
Proteins / genetics
Sequence Alignment
Sequence Analysis, Protein
Sequence Homology
Sequence Homology, Amino Acid

Substances

Proteins