A modular protein language modelling approach to immunogenicity prediction

PLoS Comput Biol. 2024 Nov 11;20(11):e1012511. doi: 10.1371/journal.pcbi.1012511. eCollection 2024 Nov.

Abstract

Neoantigen immunogenicity prediction is a highly challenging problem in the development of personalised medicines. Low reactivity rates in called neoantigens result in a difficult prediction scenario with limited training datasets. Here we describe ImmugenX, a modular protein language modelling approach to immunogenicity prediction for CD8+ reactive epitopes. ImmugenX comprises of a pMHC encoding module trained on three pMHC prediction tasks, an optional TCR encoding module and a set of context specific immunogenicity prediction head modules. Compared with state-of-the-art models for each task, ImmugenX's encoding module performs comparably or better on pMHC binding affinity, eluted ligand prediction and stability tasks. ImmugenX outperforms all compared models on pMHC immunogenicity prediction (Area under the receiver operating characteristic curve = 0.619, average precision: 0.514), with a 7% increase in average precision compared to the next best model. ImmugenX shows further improved performance on immunogenicity prediction with the integration of TCR context information. ImmugenX performance is further analysed for interpretability, which locates areas of weakness found across existing immunogenicity models and highlight possible biases in public datasets.

MeSH terms

  • CD8-Positive T-Lymphocytes / immunology
  • Computational Biology* / methods
  • Humans
  • Models, Molecular
  • Receptors, Antigen, T-Cell / immunology

Substances

  • Receptors, Antigen, T-Cell

Grants and funding

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 964998 awarded to S.A.Q., C.S., M.R.M., Y.S. and S.R.H. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.