Synergizing GA-XGBoost and QSAR modeling: Breaking down activity aliffs in HDAC1 inhibitors

J Mol Graph Model. 2024 Dec 20:135:108915. doi: 10.1016/j.jmgm.2024.108915. Online ahead of print.

Abstract

The work being presented now combines severe gradient boosting with Shapley values, a thriving merger within the field of explainable artificial intelligence. We also use a genetic algorithm to analyse the HDAC1 inhibitory activity of a broad pool of 1274 molecules experimentally reported for HDAC1 inhibition. We conduct this analysis to ascertain the HDAC1 inhibitory activity of these molecules. Based on a rigorous investigation of extreme gradient boosting, the proposed method suggests using a genetic algorithm to identify pharmacophoric features. The statistical acceptability of extreme gradient boosting analysis is robust, with parameters such as R2tr = 0.8797, R2L10 % = 0.8831, Q2F1 = 0.9459, Q2F2 = 0.9452, and Q2F3 = 0.9474. This is the driving force behind the invention of nine Py-descriptor-containing genetic algorithms. Shapley additive explanations formed the basis for the interpretation, assigning a significant value to each variable in the model. This is followed by the use of counterfactual cases to analyse the impact of the discovered molecular descriptors on HDAC1 inhibition. An examination of the molecular descriptors, which include acc_N_3B, fsp2NringC8B, fsp3NC7B, and sp2N_sp3C_3B, demonstrates that these descriptors provide insight into the function that the nitrogen atom plays in influencing HDAC1's inhibitory activity. Furthermore, the investigation shed light on the significant role that the hybridized carbon atoms located in sp2 and sp3 play in HDAC1 inhibition. Thus, the QSAR results are in conformity with the reported findings. In addition, activity cliff analysis supports the QSAR findings. Thus, the genetic algorithm-extreme gradient-boosting GA-XGBoost model is easy to understand and makes decent predictions. Based on this, it indicates that "explainable AI" may prove to be beneficial in the future for the purpose of identifying and using structural features in the process of medication development.

Keywords: Activity cliff; Extreme gradient boosting analysis; Genetic algorithms; HDAC1; Shapley additive explanations.