Search | arXiv e-print repository

ETDock: A Novel Equivariant Transformer for Protein-Ligand Docking

Authors: Yiqiang Yi, Xu Wan, Yatao Bian, Le Ou-Yang, Peilin Zhao

Abstract: Predicting the docking between proteins and ligands is a crucial and challenging task for drug discovery. However, traditional docking methods mainly rely on scoring functions, and deep learning-based docking approaches usually neglect the 3D spatial information of proteins and ligands, as well as the graph-level features of ligands, which limits their performance. To address these limitations, we… ▽ More Predicting the docking between proteins and ligands is a crucial and challenging task for drug discovery. However, traditional docking methods mainly rely on scoring functions, and deep learning-based docking approaches usually neglect the 3D spatial information of proteins and ligands, as well as the graph-level features of ligands, which limits their performance. To address these limitations, we propose an equivariant transformer neural network for protein-ligand docking pose prediction. Our approach involves the fusion of ligand graph-level features by feature processing, followed by the learning of ligand and protein representations using our proposed TAMformer module. Additionally, we employ an iterative optimization approach based on the predicted distance matrix to generate refined ligand poses. The experimental results on real datasets show that our model can achieve state-of-the-art performance. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2306.15890 [pdf, other]

A Unified View of Deep Learning for Reaction and Retrosynthesis Prediction: Current Status and Future Challenges

Authors: Ziqiao Meng, Peilin Zhao, Yang Yu, Irwin King

Abstract: Reaction and retrosynthesis prediction are fundamental tasks in computational chemistry that have recently garnered attention from both the machine learning and drug discovery communities. Various deep learning approaches have been proposed to tackle these problems, and some have achieved initial success. In this survey, we conduct a comprehensive investigation of advanced deep learning-based mode… ▽ More Reaction and retrosynthesis prediction are fundamental tasks in computational chemistry that have recently garnered attention from both the machine learning and drug discovery communities. Various deep learning approaches have been proposed to tackle these problems, and some have achieved initial success. In this survey, we conduct a comprehensive investigation of advanced deep learning-based models for reaction and retrosynthesis prediction. We summarize the design mechanisms, strengths, and weaknesses of state-of-the-art approaches. Then, we discuss the limitations of current solutions and open challenges in the problem itself. Finally, we present promising directions to facilitate future research. To our knowledge, this paper is the first comprehensive and systematic survey that seeks to provide a unified understanding of reaction and retrosynthesis prediction. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: Accepted as IJCAI 2023 Survey

arXiv:2306.11950 [pdf, other]

Mitigating Communication Costs in Neural Networks: The Role of Dendritic Nonlinearity

Authors: Xundong Wu, Pengfei Zhao, Zilin Yu, Lei Ma, Ka-Wa Yip, Huajin Tang, Gang Pan, Tiejun Huang

Abstract: Our comprehension of biological neuronal networks has profoundly influenced the evolution of artificial neural networks (ANNs). However, the neurons employed in ANNs exhibit remarkable deviations from their biological analogs, mainly due to the absence of complex dendritic trees encompassing local nonlinearity. Despite such disparities, previous investigations have demonstrated that point neurons… ▽ More Our comprehension of biological neuronal networks has profoundly influenced the evolution of artificial neural networks (ANNs). However, the neurons employed in ANNs exhibit remarkable deviations from their biological analogs, mainly due to the absence of complex dendritic trees encompassing local nonlinearity. Despite such disparities, previous investigations have demonstrated that point neurons can functionally substitute dendritic neurons in executing computational tasks. In this study, we scrutinized the importance of nonlinear dendrites within neural networks. By employing machine-learning methodologies, we assessed the impact of dendritic structure nonlinearity on neural network performance. Our findings reveal that integrating dendritic structures can substantially enhance model capacity and performance while keeping signal communication costs effectively restrained. This investigation offers pivotal insights that hold considerable implications for the development of future neural network accelerators. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2305.15156 [pdf, other]

SyNDock: N Rigid Protein Docking via Learnable Group Synchronization

Authors: Yuanfeng Ji, Yatao Bian, Guoji Fu, Peilin Zhao, Ping Luo

Abstract: The regulation of various cellular processes heavily relies on the protein complexes within a living cell, necessitating a comprehensive understanding of their three-dimensional structures to elucidate the underlying mechanisms. While neural docking techniques have exhibited promising outcomes in binary protein docking, the application of advanced neural architectures to multimeric protein docking… ▽ More The regulation of various cellular processes heavily relies on the protein complexes within a living cell, necessitating a comprehensive understanding of their three-dimensional structures to elucidate the underlying mechanisms. While neural docking techniques have exhibited promising outcomes in binary protein docking, the application of advanced neural architectures to multimeric protein docking remains uncertain. This study introduces SyNDock, an automated framework that swiftly assembles precise multimeric complexes within seconds, showcasing performance that can potentially surpass or be on par with recent advanced approaches. SyNDock possesses several appealing advantages not present in previous approaches. Firstly, SyNDock formulates multimeric protein docking as a problem of learning global transformations to holistically depict the placement of chain units of a complex, enabling a learning-centric solution. Secondly, SyNDock proposes a trainable two-step SE(3) algorithm, involving initial pairwise transformation and confidence estimation, followed by global transformation synchronization. This enables effective learning for assembling the complex in a globally consistent manner. Lastly, extensive experiments conducted on our proposed benchmark dataset demonstrate that SyNDock outperforms existing docking software in crucial performance metrics, including accuracy and runtime. For instance, it achieves a 4.5% improvement in performance and a remarkable millionfold acceleration in speed. △ Less

Submitted 24 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2302.00104 [pdf]

NetMoST: A network-based machine learning approach for subtyping schizophrenia using polygenic SNP allele biomarkers

Authors: Xinru Wei, Shuai Dong, Zhao Su, Lili Tang, Pengfei Zhao, Chunyu Pan, Fei Wang, Yanqing Tang, Weixiong Zhang, Xizhe Zhang

Abstract: Subtyping neuropsychiatric disorders like schizophrenia is essential for improving the diagnosis and treatment of complex diseases. Subtyping schizophrenia is challenging because it is polygenic and genetically heterogeneous, rendering the standard symptom-based diagnosis often unreliable and unrepeatable. We developed a novel network-based machine-learning approach, netMoST, to subtyping psychiat… ▽ More Subtyping neuropsychiatric disorders like schizophrenia is essential for improving the diagnosis and treatment of complex diseases. Subtyping schizophrenia is challenging because it is polygenic and genetically heterogeneous, rendering the standard symptom-based diagnosis often unreliable and unrepeatable. We developed a novel network-based machine-learning approach, netMoST, to subtyping psychiatric disorders. NetMoST identifies polygenic risk SNP-allele modules from genome-wide genotyping data as polygenic haplotype biomarkers (PHBs) for disease subtyping. We applied netMoST to subtype a cohort of schizophrenia subjects into three distinct biotypes with differentiable genetic, neuroimaging and functional characteristics. The PHBs of the first biotype (36.9% of all patients) were related to neurodevelopment and cognition, the PHBs of the second biotype (28.4%) were enriched for neuroimmune functions, and the PHBs of the third biotype (34.7%) were associated with the transport of calcium ions and neurotransmitters. Neuroimaging patterns provided additional support to the new biotypes, with unique regional homogeneity (ReHo) patterns observed in the brains of each biotype compared with healthy controls. Our findings demonstrated netMoST's capability for uncovering novel biotypes of complex diseases such as schizophrenia. The results also showed the power of exploring polygenic allelic patterns that transcend the conventional GWAS approaches. △ Less

Submitted 10 March, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

Comments: 21 pages,4 figures

arXiv:2210.16098 [pdf, other]

Predicting Protein-Ligand Binding Affinity with Equivariant Line Graph Network

Authors: Yiqiang Yi, Xu Wan, Kangfei Zhao, Le Ou-Yang, Peilin Zhao

Abstract: Binding affinity prediction of three-dimensional (3D) protein ligand complexes is critical for drug repositioning and virtual drug screening. Existing approaches transform a 3D protein-ligand complex to a two-dimensional (2D) graph, and then use graph neural networks (GNNs) to predict its binding affinity. However, the node and edge features of the 2D graph are extracted based on invariant local c… ▽ More Binding affinity prediction of three-dimensional (3D) protein ligand complexes is critical for drug repositioning and virtual drug screening. Existing approaches transform a 3D protein-ligand complex to a two-dimensional (2D) graph, and then use graph neural networks (GNNs) to predict its binding affinity. However, the node and edge features of the 2D graph are extracted based on invariant local coordinate systems of the 3D complex. As a result, the method can not fully learn the global information of the complex, such as, the physical symmetry and the topological information of bonds. To address these issues, we propose a novel Equivariant Line Graph Network (ELGN) for affinity prediction of 3D protein ligand complexes. The proposed ELGN firstly adds a super node to the 3D complex, and then builds a line graph based on the 3D complex. After that, ELGN uses a new E(3)-equivariant network layer to pass the messages between nodes and edges based on the global coordinate system of the 3D complex. Experimental results on two real datasets demonstrate the effectiveness of ELGN over several state-of-the-art baselines. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2209.15315 [pdf, other]

FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning

Authors: Songtao Liu, Zhengkai Tu, Minkai Xu, Zuobai Zhang, Lu Lin, Rex Ying, Jian Tang, Peilin Zhao, Dinghao Wu

Abstract: Retrosynthetic planning aims to devise a complete multi-step synthetic route from starting materials to a target molecule. Current strategies use a decoupled approach of single-step retrosynthesis models and search algorithms, taking only the product as the input to predict the reactants for each planning step and ignoring valuable context information along the synthetic route. In this work, we pr… ▽ More Retrosynthetic planning aims to devise a complete multi-step synthetic route from starting materials to a target molecule. Current strategies use a decoupled approach of single-step retrosynthesis models and search algorithms, taking only the product as the input to predict the reactants for each planning step and ignoring valuable context information along the synthetic route. In this work, we propose a novel framework that utilizes context information for improved retrosynthetic planning. We view synthetic routes as reaction graphs and propose to incorporate context through three principled steps: encode molecules into embeddings, aggregate information over routes, and readout to predict reactants. Our approach is the first attempt to utilize in-context learning for retrosynthesis prediction in retrosynthetic planning. The entire framework can be efficiently optimized in an end-to-end fashion and produce more practical and accurate predictions. Comprehensive experiments demonstrate that by fusing in the context information over routes, our model significantly improves the performance of retrosynthetic planning over baselines that are not context-aware, especially for long synthetic routes. Code is available at https://github.com/SongtaoLiu0823/FusionRetro. △ Less

Submitted 31 May, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: Accepted by ICML 2023

arXiv:2209.07921 [pdf, other]

ImDrug: A Benchmark for Deep Imbalanced Learning in AI-aided Drug Discovery

Authors: Lanqing Li, Liang Zeng, Ziqi Gao, Shen Yuan, Yatao Bian, Bingzhe Wu, Hengtong Zhang, Yang Yu, Chan Lu, Zhipeng Zhou, Hongteng Xu, Jia Li, Peilin Zhao, Pheng-Ann Heng

Abstract: The last decade has witnessed a prosperous development of computational methods and dataset curation for AI-aided drug discovery (AIDD). However, real-world pharmaceutical datasets often exhibit highly imbalanced distribution, which is overlooked by the current literature but may severely compromise the fairness and generalization of machine learning applications. Motivated by this observation, we… ▽ More The last decade has witnessed a prosperous development of computational methods and dataset curation for AI-aided drug discovery (AIDD). However, real-world pharmaceutical datasets often exhibit highly imbalanced distribution, which is overlooked by the current literature but may severely compromise the fairness and generalization of machine learning applications. Motivated by this observation, we introduce ImDrug, a comprehensive benchmark with an open-source Python library which consists of 4 imbalance settings, 11 AI-ready datasets, 54 learning tasks and 16 baseline algorithms tailored for imbalanced learning. It provides an accessible and customizable testbed for problems and solutions spanning a broad spectrum of the drug discovery pipeline such as molecular modeling, drug-target interaction and retrosynthesis. We conduct extensive empirical studies with novel evaluation metrics, to demonstrate that the existing algorithms fall short of solving medicinal and pharmaceutical challenges in the data imbalance scenario. We believe that ImDrug opens up avenues for future research and development, on real-world challenges at the intersection of AIDD and deep imbalanced learning. △ Less

Submitted 17 October, 2022; v1 submitted 16 September, 2022; originally announced September 2022.

Comments: 29 pages, 7 figures, 8 tables, a machine learning benchmark submission

arXiv:2201.09637 [pdf, other]

DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations

Authors: Yuanfeng Ji, Lu Zhang, Jiaxiang Wu, Bingzhe Wu, Long-Kai Huang, Tingyang Xu, Yu Rong, Lanqing Li, Jie Ren, Ding Xue, Houtim Lai, Shaoyong Xu, Jing Feng, Wei Liu, Ping Luo, Shuigeng Zhou, Junzhou Huang, Peilin Zhao, Yatao Bian

Abstract: AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise},… ▽ More AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise}, which is inevitable in real world AIDD applications. In this work, we present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery, which comes with an open-source Python package that fully automates the data curation and OOD benchmarking processes. We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction, which involves both macromolecule (protein target) and small-molecule (drug compound). In contrast to only providing fixed datasets, DrugOOD offers automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise annotations and rigorous benchmarking of state-of-the-art OOD algorithms. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for \emph{graph OOD learning} problems. Extensive empirical studies have shown a significant performance gap between in-distribution and out-of-distribution experiments, which highlights the need to develop better schemes that can allow for OOD generalization under noise for AIDD. △ Less

Submitted 24 January, 2022; originally announced January 2022.

Comments: 54 pages, 11 figures

arXiv:2112.11225 [pdf, other]

doi 10.3390/biom12091325

RetroComposer: Composing Templates for Template-Based Retrosynthesis Prediction

Authors: Chaochao Yan, Peilin Zhao, Chan Lu, Yang Yu, Junzhou Huang

Abstract: The main target of retrosynthesis is to recursively decompose desired molecules into available building blocks. Existing template-based retrosynthesis methods follow a template selection stereotype and suffer from limited training templates, which prevents them from discovering novel reactions. To overcome this limitation, we propose an innovative retrosynthesis prediction framework that can compo… ▽ More The main target of retrosynthesis is to recursively decompose desired molecules into available building blocks. Existing template-based retrosynthesis methods follow a template selection stereotype and suffer from limited training templates, which prevents them from discovering novel reactions. To overcome this limitation, we propose an innovative retrosynthesis prediction framework that can compose novel templates beyond training templates. As far as we know, this is the first method that uses machine learning to compose reaction templates for retrosynthesis prediction. Besides, we propose an effective reactant candidate scoring model that can capture atom-level transformations, which helps our method outperform previous methods on the USPTO-50K dataset. Experimental results show that our method can produce novel templates for 15 USPTO-50K test reactions that are not covered by training templates. We have released our source implementation. △ Less

Submitted 22 December, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

Comments: 15 pages; Accepted by the journal of Biomolecules

arXiv:2011.02893 [pdf, other]

RetroXpert: Decompose Retrosynthesis Prediction like a Chemist

Authors: Chaochao Yan, Qianggang Ding, Peilin Zhao, Shuangjia Zheng, Jinyu Yang, Yang Yu, Junzhou Huang

Abstract: Retrosynthesis is the process of recursively decomposing target molecules into available building blocks. It plays an important role in solving problems in organic synthesis planning. To automate or assist in the retrosynthesis analysis, various retrosynthesis prediction algorithms have been proposed. However, most of them are cumbersome and lack interpretability about their predictions. In this p… ▽ More Retrosynthesis is the process of recursively decomposing target molecules into available building blocks. It plays an important role in solving problems in organic synthesis planning. To automate or assist in the retrosynthesis analysis, various retrosynthesis prediction algorithms have been proposed. However, most of them are cumbersome and lack interpretability about their predictions. In this paper, we devise a novel template-free algorithm for automatic retrosynthetic expansion inspired by how chemists approach retrosynthesis prediction. Our method disassembles retrosynthesis into two steps: i) identify the potential reaction center of the target molecule through a novel graph neural network and generate intermediate synthons, and ii) generate the reactants associated with synthons via a robust reactant generation model. While outperforming the state-of-the-art baselines by a significant margin, our model also provides chemically reasonable interpretation. △ Less

Submitted 3 November, 2020; originally announced November 2020.

Comments: 17 pages, to appear in NeurIPS 2020

arXiv:2005.10248 [pdf]

Statistical Issues and Recommendations for Clinical Trials Conducted During the COVID-19 Pandemic

Authors: R. Daniel Meyer, Bohdana Ratitch, Marcel Wolbers, Olga Marchenko, Hui Quan, Daniel Li, Chrissie Fletcher, Xin Li, David Wright, Yue Shentu, Stefan Englert, Wei Shen, Jyotirmoy Dey, Thomas Liu, Ming Zhou, Norman Bohidar, Peng-Liang Zhao, Michael Hale

Abstract: The COVID-19 pandemic has had and continues to have major impacts on planned and ongoing clinical trials. Its effects on trial data create multiple potential statistical issues. The scale of impact is unprecedented, but when viewed individually, many of the issues are well defined and feasible to address. A number of strategies and recommendations are put forward to assess and address issues relat… ▽ More The COVID-19 pandemic has had and continues to have major impacts on planned and ongoing clinical trials. Its effects on trial data create multiple potential statistical issues. The scale of impact is unprecedented, but when viewed individually, many of the issues are well defined and feasible to address. A number of strategies and recommendations are put forward to assess and address issues related to estimands, missing data, validity and modifications of statistical analysis methods, need for additional analyses, ability to meet objectives and overall trial interpretability. △ Less

Submitted 20 May, 2020; originally announced May 2020.

Comments: Accepted for publication in Statistics in Biopharmaceutical Research. 40 pages

arXiv:2001.02132 [pdf]

Mitochondria in higher plants possess H2 evolving activity which is closely related to complex I

Authors: Xin Zhang, Zhao Zhang, Yanan Wei, Muhan Li, Pengxiang Zhao, Yao Mawulikplimi Adzavon, Mengyu Liu, Xiaokang Zhang, Fei Xie, Andong Wang, Jihong Sun, Yunlong Shao, Xiayan Wang, Xuejun Sun, Xuemei Ma

Abstract: Hydrogenase occupy a central place in the energy metabolism of anaerobic bacteria. Although the structure of mitochondrial complex I is similar to that of hydrogenase, whether it has hydrogen metabolic activity remain unclear. Here, we show that a H2 evolving activity exists in higher plants mitochondria and is closely related to complex I, especially around ubiquinone binding site. The H2 product… ▽ More Hydrogenase occupy a central place in the energy metabolism of anaerobic bacteria. Although the structure of mitochondrial complex I is similar to that of hydrogenase, whether it has hydrogen metabolic activity remain unclear. Here, we show that a H2 evolving activity exists in higher plants mitochondria and is closely related to complex I, especially around ubiquinone binding site. The H2 production could be inhibited by rotenone and ubiquinone. Hypoxia could simultaneously promote H2 evolution and succinate accumulation. Redox properties of quinone pool, adjusted by NADH or succinate according to oxygen concentration, acts as a valve to control the flow of protons and electrons and the production of H2. The coupling of H2 evolving activity of mitochondrial complex I with metabolic regulation reveals a more effective redox homeostasis regulation mechanism. Considering the ubiquity of mitochondria in eukaryotes, H2 metabolism might be the innate function of higher organisms. This may serve to explain, at least in part, the broad physiological effects of H2. △ Less

Submitted 7 January, 2020; originally announced January 2020.

arXiv:1512.06557 [pdf]

doi 10.1016/j.atmosenv.2015.12.051

Conversion of the chemical concentration of odorous mixtures into odour concentration and odour intensity: a comparison of methods

Authors: C. Wu, J. Liu, P. Zhao, M. Piringer, G. Schauberger

Abstract: Continuous odour measurements both of emissions as well as ambient concentrations are seldom realised, mainly because of their high costs. They are therefore often substituted by concentration measurements of odorous substances. Then a conversion of the chemical concentrations C (mg m-3) into odour concentrations COD (ouE m-3) and odour intensities OI is necessary. Four methods to convert the conc… ▽ More Continuous odour measurements both of emissions as well as ambient concentrations are seldom realised, mainly because of their high costs. They are therefore often substituted by concentration measurements of odorous substances. Then a conversion of the chemical concentrations C (mg m-3) into odour concentrations COD (ouE m-3) and odour intensities OI is necessary. Four methods to convert the concentrations of single substances to the odour concentrations and odour intensities of an odorous mixture are investigated: (1) direct use of measured concentrations, (2) the sum of the odour activity value SOAV, (3) the sum of the odour intensities SOI, and (4) the equivalent odour concentration EOC, as a new method. The methods are evaluated with olfactometric measurements of seven substances as well as their mixtures. The results indicate that the SOI and EOC conversion methods deliver reliable values. These methods use not only the odour threshold concentration but also the slope of the Weber-Fechner law to include the sensitivity of the odour perception of the individual substances. They fulfil the criteria of an objective conversion without the need of a further calibration by additional olfactometric measurements. △ Less

Submitted 21 December, 2015; originally announced December 2015.

Comments: accepted for publication on Dec. 20, 2015, Atmospheric Environment (2016)

Showing 1–14 of 14 results for author: Zhao, P