Machine Learning Feature Selection for Predicting High Concentration Therapeutic Antibody Aggregation

Pin-Kuang Lai; Amendra Fernando; Theresa K Cloutier; Jonathan S Kingsbury; Yatin Gokarn; Kevin T Halloran; Cesar Calero-Rubio; Bernhardt L Trout

doi:10.1016/j.xphs.2020.12.014

Machine Learning Feature Selection for Predicting High Concentration Therapeutic Antibody Aggregation

J Pharm Sci. 2021 Apr;110(4):1583-1591. doi: 10.1016/j.xphs.2020.12.014. Epub 2020 Dec 18.

Authors

Pin-Kuang Lai¹, Amendra Fernando¹, Theresa K Cloutier¹, Jonathan S Kingsbury², Yatin Gokarn², Kevin T Halloran², Cesar Calero-Rubio², Bernhardt L Trout³

Affiliations

¹ Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
² Biologics Development, Sanofi, Framingham, MA, USA.
³ Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. Electronic address: [email protected].

PMID: 33346034
DOI: 10.1016/j.xphs.2020.12.014

Abstract

Protein aggregation can hinder the development, safety and efficacy of therapeutic antibody-based drugs. Developing a predictive model that evaluates aggregation behaviors during early stage development is therefore desirable. Machine learning is a widely used tool to train models that predict data with different attributes. However, most machine learning techniques require more data than is typically available in antibody development. In this work, we describe a rational feature selection framework to develop accurate models with a small number of features. We applied this framework to predict aggregation behaviors of 21 approved monospecific monoclonal antibodies at high concentration (150 mg/mL), yielding a correlation coefficient of 0.71 on validation tests with only two features using a linear model. The nearest neighbors and support vector regression models further improved the performance, which have correlation coefficients of 0.86 and 0.80, respectively. This framework can be extended to train other models that predict different physical properties.

Keywords: Antibody aggregations; Feature selections; Machine learning; Molecular dynamics simulations.

MeSH terms

Machine Learning*
Support Vector Machine*