A deep learning model based on the BERT pre-trained model to predict the antiproliferative activity of anti-cancer chemical compounds

M Torabi; I Haririan; A Foroumadi; H Ghanbari; F Ghasemi

doi:10.1080/1062936X.2024.2431486

A deep learning model based on the BERT pre-trained model to predict the antiproliferative activity of anti-cancer chemical compounds

SAR QSAR Environ Res. 2024 Nov;35(11):971-992. doi: 10.1080/1062936X.2024.2431486. Epub 2024 Nov 28.

Authors

M Torabi¹, I Haririan^{2

3}, A Foroumadi^{4

5}, H Ghanbari⁶, F Ghasemi^{7

8}

Affiliations

¹ Biosensor Research Centre, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
² Department of Pharmaceutics, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran, Iran.
³ Department of Pharmaceutical Biomaterials and Medical Biomaterials Research Center (MBRC), Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran, Iran.
⁴ Department of Medicinal Chemistry, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran, Iran.
⁵ Drug Design and Development Research Center, The Institute of Pharmaceutical Sciences (TIPS), Tehran University of Medical Sciences, Tehran, Iran.
⁶ Department of Medical Nanotechnology, School of Advanced Technologies in Medicine, Tehran University of Medical Sciences, Tehran, Iran.
⁷ Department of Bioinformatics and Systems Biology, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
⁸ Bioinformatics Research Center, School of Pharmacy and Pharmaceutical Sciences, Isfahan University of Medical Sciences, Isfahan, Iran.

PMID: 39605280
DOI: 10.1080/1062936X.2024.2431486

Abstract

Identifying new compounds with minimal side effects to enhance patients' quality of life is the ultimate goal of drug discovery. Due to the expensive and time-consuming nature of experimental investigations and the scarcity of data in traditional QSAR studies, deep transfer learning models, such as the BERT model, have recently been suggested. This study evaluated the model's performance in predicting the anti-proliferative activity of five cancer cell lines (HeLa, MCF7, MDA-MB231, PC3, and MDA-MB) using over 3,000 synthesized molecules from PubChem. The results indicated that the model could predict the class of designed small molecules with acceptable accuracy for most cell lines, except for PC3 and MDA-MB. The model's performance was further tested on an in-house dataset of approximately 25 small molecules per cell line, based on IC50 values. The model accurately predicted the biological activity class for HeLa with an accuracy of $0.77 \pm 0.4$ and demonstrated acceptable performance for MCF7 and MDA-MB231, with accuracy between 0.56 and 0.66. However, the results were less reliable for PC3 and HepG2. In conclusion, the ChemBERTa fine-tuned model shows potential for predicting outcomes on in-house datasets.

Keywords: BERT model; Cancer therapy; antiproliferative activity; deep learning; in-house dataset.

MeSH terms

Antineoplastic Agents* / chemistry
Antineoplastic Agents* / pharmacology
Cell Line, Tumor
Cell Proliferation* / drug effects
Deep Learning*
Drug Discovery / methods
Humans
Quantitative Structure-Activity Relationship*

Substances

Antineoplastic Agents