DiffPaSS-high-performance differentiable pairing of protein sequences using soft scores

Bioinformatics. 2024 Dec 26;41(1):btae738. doi: 10.1093/bioinformatics/btae738.

Abstract

Motivation: Identifying interacting partners from two sets of protein sequences has important applications in computational biology. Interacting partners share similarities across species due to their common evolutionary history, and feature correlations in amino acid usage due to the need to maintain complementary interaction interfaces. Thus, the problem of finding interacting pairs can be formulated as searching for a pairing of sequences that maximizes a sequence similarity or a coevolution score. Several methods have been developed to address this problem, applying different approximate optimization methods to different scores.

Results: We introduce Differentiable Pairing using Soft Scores (DiffPaSS), a differentiable framework for flexible, fast, and hyperparameter-free optimization for pairing interacting biological sequences, which can be applied to a wide variety of scores. We apply it to a benchmark prokaryotic dataset, using mutual information and neighbor graph alignment scores. DiffPaSS outperforms existing algorithms for optimizing the same scores. We demonstrate the usefulness of our paired alignments for the prediction of protein complex structure. DiffPaSS does not require sequences to be aligned, and we also apply it to nonaligned sequences from T-cell receptors.

Availability and implementation: A PyTorch implementation and installable Python package are available at https://github.com/Bitbol-Lab/DiffPaSS.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Computational Biology / methods
  • Databases, Protein
  • Proteins* / chemistry
  • Sequence Alignment / methods
  • Sequence Analysis, Protein* / methods
  • Software

Substances

  • Proteins