The long short-term memory (LSTM) algorithm has provided solutions to the limitations of the descriptors-utilizing QSAR models in drug design. However, the direct application of LSTM remains scarce. The effectiveness of a descriptor-free QSAR (LSTM-SM) in modeling the FGFR1 inhibitors dataset while comparing with two conventional QSAR using descriptors (126 bits Morgan fingerprint and 2 D descriptors respectively) as a baseline model was investigated in this study. The validated descriptor-free QSAR model was thereafter used to screen for active FGFR1 inhibitors in the ChemDiv database and subjected to molecular docking, induced-fit docking, QM-MM optimization, and molecular dynamics simulations to filter for compounds with high binding affinity and suggest the putative mechanism of inhibition and specificity. The LSTM-SM model performed better than conventional QSAR; having accuracy, specificity, and sensitivity of 0.92, model loss of 0.025, and AUC of 0.95. Fifteen thousand compounds were predicted as actives from the ChemDiv database and four compounds were finally selected. Of the four, two showed putatively effective binding interactions with key active site residues. Molecular dynamics simulations on these compounds in complex with the receptor further give insight into the conformational dynamics of each compound bounded to the receptor. The complexes formed are stable and exhibit a similar degree of compactness. Our findings predicted the advent of self-feature extracting machine learning algorithms of compounds, and have provided the possibility of better predictive model quality that is not necessarily limited by compound descriptors. The putative FGFR1 inhibitors, with their mechanism of inhibition and specificity, were elucidated using this approachCommunicated by Ramaswamy H. Sarma.
Keywords: Descriptor-free QSAR; FGFR1; LSTM; QM-MM optimization; induced-fit docking.