Attention network for predicting T-cell receptor-peptide binding can associate attention with interpretable protein structural properties

Kyohei Koyama; Kosuke Hashimoto; Chioko Nagao; Kenji Mizuguchi

doi:10.3389/fbinf.2023.1274599

Attention network for predicting T-cell receptor-peptide binding can associate attention with interpretable protein structural properties

Front Bioinform. 2023 Dec 18:3:1274599. doi: 10.3389/fbinf.2023.1274599. eCollection 2023.

Authors

Kyohei Koyama^{1

2

3}, Kosuke Hashimoto¹, Chioko Nagao¹, Kenji Mizuguchi^{1

2

3}

Affiliations

¹ Laboratory for Computational Biology, Institute for Protein Research, Osaka University, Osaka, Japan.
² National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan.
³ Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan.

Abstract

Understanding how a T-cell receptor (TCR) recognizes its specific ligand peptide is crucial for gaining an insight into biological functions and disease mechanisms. Despite its importance, experimentally determining TCR-peptide-major histocompatibility complex (TCR-pMHC) interactions is expensive and time-consuming. To address this challenge, computational methods have been proposed, but they are typically evaluated by internal retrospective validation only, and few researchers have incorporated and tested an attention layer from language models into structural information. Therefore, in this study, we developed a machine learning model based on a modified version of Transformer, a source-target attention neural network, to predict the TCR-pMHC interaction solely from the amino acid sequences of the TCR complementarity-determining region (CDR) 3 and the peptide. This model achieved competitive performance on a benchmark dataset of the TCR-pMHC interaction, as well as on a truly new external dataset. Additionally, by analyzing the results of binding predictions, we associated the neural network weights with protein structural properties. By classifying the residues into large- and small-attention groups, we identified statistically significant properties associated with the largely attended residues such as hydrogen bonds within CDR3. The dataset that we created and the ability of our model to provide an interpretable prediction of TCR-peptide binding should increase our knowledge about molecular recognition and pave the way for designing new therapeutics.

Keywords: T-cell receptor; attention networks; binding prediction; hydrogen bonds; peptide; protein structure; transformer.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This research was supported in part by Japan Society for the Promotion of Science. Grant Number: 22H03687.