A machine learning strategy for the identification of key in silico descriptors and prediction models for IgG monoclonal antibody developability properties

MAbs. 2023 Jan-Dec;15(1):2248671. doi: 10.1080/19420862.2023.2248671.

Abstract

Identification of favorable biophysical properties for protein therapeutics as part of developability assessment is a crucial part of the preclinical development process. Successful prediction of such properties and bioassay results from calculated in silico features has potential to reduce the time and cost of delivering clinical-grade material to patients, but nevertheless has remained an ongoing challenge to the field. Here, we demonstrate an automated and flexible machine learning workflow designed to compare and identify the most powerful features from computationally derived physiochemical feature sets, generated from popular commercial software packages. We implement this workflow with medium-sized datasets of human and humanized IgG molecules to generate predictive regression models for two key developability endpoints, hydrophobicity and poly-specificity. The most important features discovered through the automated workflow corroborate several previous literature reports, and newly discovered features suggest directions for further research and potential model improvement.

Keywords: Biophysical; IgG1; computational; descriptors; developability; machine learning.

MeSH terms

  • Antibodies, Monoclonal* / chemistry
  • Humans
  • Immunoglobulin G*
  • Machine Learning

Substances

  • Immunoglobulin G
  • Antibodies, Monoclonal

Grants and funding

The author(s) reported there is no funding associated with the work featured in this article.