Accurate prediction of protein-ligand interactions by combining physical energy functions and graph-neural networks

Yiyu Hong; Junsu Ha; Jaemin Sim; Chae Jo Lim; Kwang-Seok Oh; Ramakrishnan Chandrasekaran; Bomin Kim; Jieun Choi; Junsu Ko; Woong-Hee Shin; Juyong Lee

doi:10.1186/s13321-024-00912-2

Accurate prediction of protein-ligand interactions by combining physical energy functions and graph-neural networks

J Cheminform. 2024 Nov 4;16(1):121. doi: 10.1186/s13321-024-00912-2.

Authors

Yiyu Hong¹, Junsu Ha¹, Jaemin Sim², Chae Jo Lim³, Kwang-Seok Oh³, Ramakrishnan Chandrasekaran¹, Bomin Kim⁴, Jieun Choi⁴, Junsu Ko⁵, Woong-Hee Shin^{6

7}, Juyong Lee^{8

9

10

11}

Affiliations

¹ Arontier Co., 241, Gangnam-daero, Seocho-gu, Seoul, 06735, Republic of Korea.
² Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, 08826, Republic of Korea.
³ Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, Daejeon, 34114, Republic of Korea.
⁴ College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea.
⁵ Arontier Co., 241, Gangnam-daero, Seocho-gu, Seoul, 06735, Republic of Korea. [email protected].
⁶ Arontier Co., 241, Gangnam-daero, Seocho-gu, Seoul, 06735, Republic of Korea. [email protected].
⁷ Department of Medicine, Korea University College of Medicine, Seoul, 02841, Republic of Korea. [email protected].
⁸ Arontier Co., 241, Gangnam-daero, Seocho-gu, Seoul, 06735, Republic of Korea. [email protected].
⁹ Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, 08826, Republic of Korea. [email protected].
¹⁰ Research Institute of Pharmaceutical Science, College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea. [email protected].
¹¹ College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea. [email protected].

Abstract

We introduce an advanced model for predicting protein-ligand interactions. Our approach combines the strengths of graph neural networks with physics-based scoring methods. Existing structure-based machine-learning models for protein-ligand binding prediction often fall short in practical virtual screening scenarios, hindered by the intricacies of binding poses, the chemical diversity of drug-like molecules, and the scarcity of crystallographic data for protein-ligand complexes. To overcome the limitations of existing machine learning-based prediction models, we propose a novel approach that fuses three independent neural network models. One classification model is designed to perform binary prediction of a given protein-ligand complex pose. The other two regression models are trained to predict the binding affinity and root-mean-square deviation of a ligand conformation from an input complex structure. We trained the model to account for both deviations in experimental and predicted binding affinities and pose prediction uncertainties. By effectively integrating the outputs of the triplet neural networks with a physics-based scoring function, our model showed a significantly improved performance in hit identification. The benchmark results with three independent decoy sets demonstrate that our model outperformed existing models in forward screening. Our model achieved top 1% enrichment factors of 32.7 and 23.1 with the CASF2016 and DUD-E benchmark sets, respectively. The benchmark results using the LIT-PCBA set further confirmed its higher average enrichment factors, emphasizing the model's efficiency and generalizability. The model's efficiency was further validated by identifying 23 active compounds from 63 candidates in experimental screening for autotaxin inhibitors, demonstrating its practical applicability in hit discovery.Scientific contributionOur work introduces a novel training strategy for a protein-ligand binding affinity prediction model by integrating the outputs of three independent sub-models and utilizing expertly crafted decoy sets. The model showcases exceptional performance across multiple benchmarks. The high enrichment factors in the LIT-PCBA benchmark demonstrate its potential to accelerate hit discovery.

Keywords: Deep-learning; Graph neural network; Hit discovery; Physics-based scoring function; Protein–ligand binding pose prediction; Protein–ligand binding prediction; Protein–ligand docking; Virtual screening.

Abstract

Grants and funding