Adaptive Local Realignment of Protein Sequences

Dan DeBlasio; John Kececioglu

doi:10.1089/cmb.2018.0045

Adaptive Local Realignment of Protein Sequences

J Comput Biol. 2018 Jul;25(7):780-793. doi: 10.1089/cmb.2018.0045. Epub 2018 Jun 11.

Authors

Dan DeBlasio¹, John Kececioglu²

Affiliations

¹ 1 Computational Biology Department, Carnegie Mellon University , Pittsburgh, Pennsylvania.
² 2 Department of Computer Science, The University of Arizona , Tucson, Arizona.

Abstract

While mutation rates can vary markedly over the residues of a protein, multiple sequence alignment tools typically use the same values for their scoring-function parameters across a protein's entire length. We present a new approach, called adaptive local realignment, that in contrast automatically adapts to the diversity of mutation rates along protein sequences. This builds upon a recent technique known as parameter advising, which finds global parameter settings for an aligner, to now adaptively find local settings. Our approach in essence identifies local regions with low estimated accuracy, constructs a set of candidate realignments using a carefully-chosen collection of parameter settings, and replaces the region if a realignment has higher estimated accuracy. This new method of local parameter advising, when combined with prior methods for global advising, boosts alignment accuracy as much as 26% over the best default setting on hard-to-align protein benchmarks, and by 6.4% over global advising alone. Adaptive local realignment has been implemented within the Opal aligner using the Facet accuracy estimator.

Keywords: alignment accuracy; iterative refinement; local mutation rates; multiple sequence alignment; parameter advising.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Amino Acid Sequence / genetics
Computational Biology*
Proteins / genetics*
Sequence Alignment
Software*

Substances

Proteins