Single-residue linear and conformational B cell epitopes prediction using random and ESM-2 based projections

Brief Bioinform. 2024 Jan 22;25(2):bbae084. doi: 10.1093/bib/bbae084.

Abstract

B cell epitope prediction methods are separated into linear sequence-based predictors and conformational epitope predictions that typically use the measured or predicted protein structure. Most linear predictions rely on the translation of the sequence to biologically based representations and the applications of machine learning on these representations. We here present CALIBER 'Conformational And LInear B cell Epitopes pRediction', and show that a bidirectional long short-term memory with random projection produces a more accurate prediction (test set AUC=0.789) than all current linear methods. The same predictor when combined with an Evolutionary Scale Modeling-2 projection also improves on the state of the art in conformational epitopes (AUC = 0.776). The inclusion of the graph of the 3D distances between residues did not increase the prediction accuracy. However, the long-range sequence information was essential for high accuracy. While the same model structure was applicable for linear and conformational epitopes, separate training was required for each. Combining the two slightly increased the linear accuracy (AUC 0.775 versus 0.768) and reduced the conformational accuracy (AUC = 0.769).

Keywords: BiLSTM; embedding; epitope prediction; machine learning; protein structure.

MeSH terms

  • Epitopes, B-Lymphocyte* / chemistry
  • Molecular Conformation

Substances

  • Epitopes, B-Lymphocyte

Grants and funding