AbLEF: antibody language ensemble fusion for thermodynamically empowered property predictions

Bioinformatics. 2024 May 2;40(5):btae268. doi: 10.1093/bioinformatics/btae268.

Abstract

Motivation: Pre-trained protein language and/or structural models are often fine-tuned on drug development properties (i.e. developability properties) to accelerate drug discovery initiatives. However, these models generally rely on a single structural conformation and/or a single sequence as a molecular representation. We present a physics-based model, whereby 3D conformational ensemble representations are fused by a transformer-based architecture and concatenated to a language representation to predict antibody protein properties. Antibody language ensemble fusion enables the direct infusion of thermodynamic information into latent space and this enhances property prediction by explicitly infusing dynamic molecular behavior that occurs during experimental measurement.

Results: We showcase the antibody language ensemble fusion model on two developability properties: hydrophobic interaction chromatography retention time and temperature of aggregation (Tagg). We find that (i) 3D conformational ensembles that are generated from molecular simulation can further improve antibody property prediction for small datasets, (ii) the performance benefit from 3D conformational ensembles matches shallow machine learning methods in the small data regime, and (iii) fine-tuned large protein language models can match smaller antibody-specific language models at predicting antibody properties.

Availability and implementation: AbLEF codebase is available at https://github.com/merck/AbLEF.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Antibodies / chemistry
  • Computational Biology / methods
  • Hydrophobic and Hydrophilic Interactions
  • Machine Learning
  • Protein Conformation
  • Software
  • Thermodynamics*

Substances

  • Antibodies

Grants and funding