Identifying High-Quality Leads among Screened Anticancerous Compounds Using SMILES Representations

ACS Omega. 2024 Jun 28;9(28):30645-30653. doi: 10.1021/acsomega.4c02801. eCollection 2024 Jul 16.

Abstract

Cancer is a lethal disease that affects numerous people worldwide. Chemotherapy stands as one of the most effective treatment regimens to combat cancer. Nevertheless, anticancer drugs face a high failure rate due to safety and efficacy issues. Drug failure could be subdued by instigating drug leads with reduced toxicity and enhanced efficacy. Computer-aided drug discovery endorses drug leads in manoeuvring protein and ligand structures or representations. Simplified molecular input line entry system (SMILES) is a linear notation representing the three-dimensional structure of a molecule using symbols and alphanumeric characters. SMILES representation hoards rings and scaffold structures in its depiction. Mining ring and scaffold patterns from molecular SMILES would assist in ascertaining biological properties based on molecular patterns. Moreover, the emergence of artificial intelligence (AI) technologies would accelerate identification of efficient anticancer drug leads. AI algorithms proclaimed for their pattern recognition ability could be employed for identifying molecular patterns from SMILES representation, thereby enabling property prediction. Consequently, we developed a multilayer perceptron (MLP) model for the prediction of anticancer activity using SMILES of NCI-60 cancer growth inhibition data. Furthermore, the top 8 frequent scaffolds were identified on preliminary analysis of cancer growth inhibition data and ChEMBL drugs. The developed MLP model classified anticancer and nonanticancer compounds with a classification accuracy of 0.92. Also, benchmarking of the developed model with machine learning algorithms exhibited better performance of the MLP model.