Improving Generalizability of Drug-Target Binding Prediction by Pre-trained Multi-view Molecular Representations

Xike Ouyang; Yannuo Feng; Chen Cui; Yunhe Li; Li Zhang; Han Wang

doi:10.1093/bioinformatics/btaf002

Improving Generalizability of Drug-Target Binding Prediction by Pre-trained Multi-view Molecular Representations

Bioinformatics. 2025 Jan 7:btaf002. doi: 10.1093/bioinformatics/btaf002. Online ahead of print.

Authors

Xike Ouyang¹, Yannuo Feng¹, Chen Cui², Yunhe Li¹, Li Zhang², Han Wang¹

Affiliations

¹ School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, 130117, Jilin China.
² School of Computer Science and Engineering, Changchun University of Technology, Changchun, Jilin 130051, China.

PMID: 39776159
DOI: 10.1093/bioinformatics/btaf002

Abstract

Motivation: Most drugs start on their journey inside the body by binding the right target proteins. This is the reason that numerous efforts have been devoted to predicting the drug-target binding during drug development. However, the inherent diversity among molecular properties, coupled with limited training data availability, poses challenges to the accuracy and generalizability of these methods beyond their training domain.

Results: In this work, we proposed a neural networks construction for high accurate and generalizable drug-target binding prediction, named Pre-trained Multi-view Molecular Representations (PMMR). The method uses pre-trained models to transfer representations of target proteins and drugs to the domain of drug-target binding prediction, mitigating the issue of poor generalizability stemming from limited data. Then, two typical representations of drug molecules, Graphs and SMILES strings, are learned respectively by a Graph Neural Network (GNN) and a Transformer to achieve complementarity between local and global features. PMMR was evaluated on drug-target affinity and interaction benchmark datasets, and it derived preponderant performance contrast to peer methods, especially generalizability in cold-start scenarios. Furthermore, our state-of-the-art method was indicated to have the potential for drug discovery by a case study of cyclin-dependent kinase 2 (CDK2).

Availability and implementation: https://github.com/NENUBioCompute/PMMR.

Supplementary information: Supplementary data are available at Bioinformatics online.