Cathepsin K (CatK), a lysosomal cysteine protease, contributes to skeletal abnormalities, heart diseases, lung inflammation, and central nervous system and immune disorders. Currently, CatK inhibitors are associated with severe adverse effects, therefore limiting their clinical utility. This study focuses on exploring quantitative structure-activity relationships (QSAR) on a dataset of CatK inhibitors (1804) compiled from the ChEMBL database to predict the inhibitory activities. After data cleaning and pre-processing, a total of 1568 structures were selected for exploratory data analysis which revealed physicochemical properties, distributions and statistical significance between the two groups of inhibitors. PubChem fingerprinting with 11 different machine-learning classification models was computed. The comparative analysis showed the ET model performed well with accuracy values for the training set (0.999), cross-validation (0.970) and test set (0.977) in line with OECD guidelines. Moreover, to gain structural insights on the origin of CatK inhibition, 15 diverse molecules were selected for molecular docking. The CatK inhibitors (1 and 2) exhibited strong binding energies of -8.3 and -7.2 kcal/mol, respectively. MD simulation (300 ns) showed strong structural stability, flexibility and interactions in selected complexes. This synergy between QSAR, docking, MD simulation and machine learning models strengthen our evidence for developing novel and resilient CatK inhibitors.
Keywords: Cathepsin K; MD simulation; QSAR; machine learning; molecular docking; osteoporosis.