Machine learning assisted classification RASAR modeling for the nephrotoxicity potential of a curated set of orally active drugs

Arkaprava Banerjee; Kunal Roy

doi:10.1038/s41598-024-85063-y

Machine learning assisted classification RASAR modeling for the nephrotoxicity potential of a curated set of orally active drugs

Sci Rep. 2025 Jan 4;15(1):808. doi: 10.1038/s41598-024-85063-y.

Authors

Arkaprava Banerjee¹, Kunal Roy²

Affiliations

¹ Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700 032, India.
² Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700 032, India. [email protected].

Abstract

We have adopted the classification Read-Across Structure-Activity Relationship (c-RASAR) approach in the present study for machine-learning (ML)-based model development from a recently reported curated dataset of nephrotoxicity potential of orally active drugs. We initially developed ML models using nine different algorithms separately on topological descriptors (referred to as simply "descriptors" in the subsequent sections of the manuscript) and MACCS fingerprints (referred to as "fingerprints" in the subsequent sections of the manuscript), thus generating 18 different ML QSAR models. Using the chemical spaces defined by the modeling descriptors and fingerprints, the similarity and error-based RASAR descriptors were computed, and the most discriminating RASAR descriptors were used to develop another set of 18 different ML c-RASAR models. All 36 models were cross-validated 20 times with a fivefold cross-validation strategy, and their predictivity was checked on the test set data. A multi-criteria decision-making strategy - the Sum of Ranking Differences (SRD) approach-was adopted to identify the best-performing model based on robustness and external validation parameters. This statistical analysis suggested that the c-RASAR models had an overall good performance, while the best-performing model was also a c-RASAR model (LDA c-RASAR model derived from topological descriptors, with MCC values of 0.229 and 0.431 for the training and test sets, respectively). This model was used to screen a true external data set prepared from the known nephrotoxic compounds of DrugBankDB, demonstrating good predictivity.

Keywords: ARKA; Machine learning; Nephrotoxicity; QSAR; Sum of Ranking Differences (SRD); c-RASAR; t-SNE.

MeSH terms

Administration, Oral
Algorithms
Humans
Kidney / drug effects
Kidney / pathology
Kidney Diseases / chemically induced
Machine Learning*
Pharmaceutical Preparations / chemistry
Pharmaceutical Preparations / classification
Quantitative Structure-Activity Relationship*

Substances

Pharmaceutical Preparations

Abstract

MeSH terms

Substances

Grants and funding