-
ETDock: A Novel Equivariant Transformer for Protein-Ligand Docking
Authors:
Yiqiang Yi,
Xu Wan,
Yatao Bian,
Le Ou-Yang,
Peilin Zhao
Abstract:
Predicting the docking between proteins and ligands is a crucial and challenging task for drug discovery. However, traditional docking methods mainly rely on scoring functions, and deep learning-based docking approaches usually neglect the 3D spatial information of proteins and ligands, as well as the graph-level features of ligands, which limits their performance. To address these limitations, we…
▽ More
Predicting the docking between proteins and ligands is a crucial and challenging task for drug discovery. However, traditional docking methods mainly rely on scoring functions, and deep learning-based docking approaches usually neglect the 3D spatial information of proteins and ligands, as well as the graph-level features of ligands, which limits their performance. To address these limitations, we propose an equivariant transformer neural network for protein-ligand docking pose prediction. Our approach involves the fusion of ligand graph-level features by feature processing, followed by the learning of ligand and protein representations using our proposed TAMformer module. Additionally, we employ an iterative optimization approach based on the predicted distance matrix to generate refined ligand poses. The experimental results on real datasets show that our model can achieve state-of-the-art performance.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
A Unified View of Deep Learning for Reaction and Retrosynthesis Prediction: Current Status and Future Challenges
Authors:
Ziqiao Meng,
Peilin Zhao,
Yang Yu,
Irwin King
Abstract:
Reaction and retrosynthesis prediction are fundamental tasks in computational chemistry that have recently garnered attention from both the machine learning and drug discovery communities. Various deep learning approaches have been proposed to tackle these problems, and some have achieved initial success. In this survey, we conduct a comprehensive investigation of advanced deep learning-based mode…
▽ More
Reaction and retrosynthesis prediction are fundamental tasks in computational chemistry that have recently garnered attention from both the machine learning and drug discovery communities. Various deep learning approaches have been proposed to tackle these problems, and some have achieved initial success. In this survey, we conduct a comprehensive investigation of advanced deep learning-based models for reaction and retrosynthesis prediction. We summarize the design mechanisms, strengths, and weaknesses of state-of-the-art approaches. Then, we discuss the limitations of current solutions and open challenges in the problem itself. Finally, we present promising directions to facilitate future research. To our knowledge, this paper is the first comprehensive and systematic survey that seeks to provide a unified understanding of reaction and retrosynthesis prediction.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Mitigating Communication Costs in Neural Networks: The Role of Dendritic Nonlinearity
Authors:
Xundong Wu,
Pengfei Zhao,
Zilin Yu,
Lei Ma,
Ka-Wa Yip,
Huajin Tang,
Gang Pan,
Tiejun Huang
Abstract:
Our comprehension of biological neuronal networks has profoundly influenced the evolution of artificial neural networks (ANNs). However, the neurons employed in ANNs exhibit remarkable deviations from their biological analogs, mainly due to the absence of complex dendritic trees encompassing local nonlinearity. Despite such disparities, previous investigations have demonstrated that point neurons…
▽ More
Our comprehension of biological neuronal networks has profoundly influenced the evolution of artificial neural networks (ANNs). However, the neurons employed in ANNs exhibit remarkable deviations from their biological analogs, mainly due to the absence of complex dendritic trees encompassing local nonlinearity. Despite such disparities, previous investigations have demonstrated that point neurons can functionally substitute dendritic neurons in executing computational tasks. In this study, we scrutinized the importance of nonlinear dendrites within neural networks. By employing machine-learning methodologies, we assessed the impact of dendritic structure nonlinearity on neural network performance. Our findings reveal that integrating dendritic structures can substantially enhance model capacity and performance while keeping signal communication costs effectively restrained. This investigation offers pivotal insights that hold considerable implications for the development of future neural network accelerators.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
SyNDock: N Rigid Protein Docking via Learnable Group Synchronization
Authors:
Yuanfeng Ji,
Yatao Bian,
Guoji Fu,
Peilin Zhao,
Ping Luo
Abstract:
The regulation of various cellular processes heavily relies on the protein complexes within a living cell, necessitating a comprehensive understanding of their three-dimensional structures to elucidate the underlying mechanisms. While neural docking techniques have exhibited promising outcomes in binary protein docking, the application of advanced neural architectures to multimeric protein docking…
▽ More
The regulation of various cellular processes heavily relies on the protein complexes within a living cell, necessitating a comprehensive understanding of their three-dimensional structures to elucidate the underlying mechanisms. While neural docking techniques have exhibited promising outcomes in binary protein docking, the application of advanced neural architectures to multimeric protein docking remains uncertain. This study introduces SyNDock, an automated framework that swiftly assembles precise multimeric complexes within seconds, showcasing performance that can potentially surpass or be on par with recent advanced approaches. SyNDock possesses several appealing advantages not present in previous approaches. Firstly, SyNDock formulates multimeric protein docking as a problem of learning global transformations to holistically depict the placement of chain units of a complex, enabling a learning-centric solution. Secondly, SyNDock proposes a trainable two-step SE(3) algorithm, involving initial pairwise transformation and confidence estimation, followed by global transformation synchronization. This enables effective learning for assembling the complex in a globally consistent manner. Lastly, extensive experiments conducted on our proposed benchmark dataset demonstrate that SyNDock outperforms existing docking software in crucial performance metrics, including accuracy and runtime. For instance, it achieves a 4.5% improvement in performance and a remarkable millionfold acceleration in speed.
△ Less
Submitted 24 May, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
NetMoST: A network-based machine learning approach for subtyping schizophrenia using polygenic SNP allele biomarkers
Authors:
Xinru Wei,
Shuai Dong,
Zhao Su,
Lili Tang,
Pengfei Zhao,
Chunyu Pan,
Fei Wang,
Yanqing Tang,
Weixiong Zhang,
Xizhe Zhang
Abstract:
Subtyping neuropsychiatric disorders like schizophrenia is essential for improving the diagnosis and treatment of complex diseases. Subtyping schizophrenia is challenging because it is polygenic and genetically heterogeneous, rendering the standard symptom-based diagnosis often unreliable and unrepeatable. We developed a novel network-based machine-learning approach, netMoST, to subtyping psychiat…
▽ More
Subtyping neuropsychiatric disorders like schizophrenia is essential for improving the diagnosis and treatment of complex diseases. Subtyping schizophrenia is challenging because it is polygenic and genetically heterogeneous, rendering the standard symptom-based diagnosis often unreliable and unrepeatable. We developed a novel network-based machine-learning approach, netMoST, to subtyping psychiatric disorders. NetMoST identifies polygenic risk SNP-allele modules from genome-wide genotyping data as polygenic haplotype biomarkers (PHBs) for disease subtyping. We applied netMoST to subtype a cohort of schizophrenia subjects into three distinct biotypes with differentiable genetic, neuroimaging and functional characteristics. The PHBs of the first biotype (36.9% of all patients) were related to neurodevelopment and cognition, the PHBs of the second biotype (28.4%) were enriched for neuroimmune functions, and the PHBs of the third biotype (34.7%) were associated with the transport of calcium ions and neurotransmitters. Neuroimaging patterns provided additional support to the new biotypes, with unique regional homogeneity (ReHo) patterns observed in the brains of each biotype compared with healthy controls. Our findings demonstrated netMoST's capability for uncovering novel biotypes of complex diseases such as schizophrenia. The results also showed the power of exploring polygenic allelic patterns that transcend the conventional GWAS approaches.
△ Less
Submitted 10 March, 2023; v1 submitted 31 January, 2023;
originally announced February 2023.
-
Predicting Protein-Ligand Binding Affinity with Equivariant Line Graph Network
Authors:
Yiqiang Yi,
Xu Wan,
Kangfei Zhao,
Le Ou-Yang,
Peilin Zhao
Abstract:
Binding affinity prediction of three-dimensional (3D) protein ligand complexes is critical for drug repositioning and virtual drug screening. Existing approaches transform a 3D protein-ligand complex to a two-dimensional (2D) graph, and then use graph neural networks (GNNs) to predict its binding affinity. However, the node and edge features of the 2D graph are extracted based on invariant local c…
▽ More
Binding affinity prediction of three-dimensional (3D) protein ligand complexes is critical for drug repositioning and virtual drug screening. Existing approaches transform a 3D protein-ligand complex to a two-dimensional (2D) graph, and then use graph neural networks (GNNs) to predict its binding affinity. However, the node and edge features of the 2D graph are extracted based on invariant local coordinate systems of the 3D complex. As a result, the method can not fully learn the global information of the complex, such as, the physical symmetry and the topological information of bonds. To address these issues, we propose a novel Equivariant Line Graph Network (ELGN) for affinity prediction of 3D protein ligand complexes. The proposed ELGN firstly adds a super node to the 3D complex, and then builds a line graph based on the 3D complex. After that, ELGN uses a new E(3)-equivariant network layer to pass the messages between nodes and edges based on the global coordinate system of the 3D complex. Experimental results on two real datasets demonstrate the effectiveness of ELGN over several state-of-the-art baselines.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning
Authors:
Songtao Liu,
Zhengkai Tu,
Minkai Xu,
Zuobai Zhang,
Lu Lin,
Rex Ying,
Jian Tang,
Peilin Zhao,
Dinghao Wu
Abstract:
Retrosynthetic planning aims to devise a complete multi-step synthetic route from starting materials to a target molecule. Current strategies use a decoupled approach of single-step retrosynthesis models and search algorithms, taking only the product as the input to predict the reactants for each planning step and ignoring valuable context information along the synthetic route. In this work, we pr…
▽ More
Retrosynthetic planning aims to devise a complete multi-step synthetic route from starting materials to a target molecule. Current strategies use a decoupled approach of single-step retrosynthesis models and search algorithms, taking only the product as the input to predict the reactants for each planning step and ignoring valuable context information along the synthetic route. In this work, we propose a novel framework that utilizes context information for improved retrosynthetic planning. We view synthetic routes as reaction graphs and propose to incorporate context through three principled steps: encode molecules into embeddings, aggregate information over routes, and readout to predict reactants. Our approach is the first attempt to utilize in-context learning for retrosynthesis prediction in retrosynthetic planning. The entire framework can be efficiently optimized in an end-to-end fashion and produce more practical and accurate predictions. Comprehensive experiments demonstrate that by fusing in the context information over routes, our model significantly improves the performance of retrosynthetic planning over baselines that are not context-aware, especially for long synthetic routes. Code is available at https://github.com/SongtaoLiu0823/FusionRetro.
△ Less
Submitted 31 May, 2023; v1 submitted 30 September, 2022;
originally announced September 2022.
-
ImDrug: A Benchmark for Deep Imbalanced Learning in AI-aided Drug Discovery
Authors:
Lanqing Li,
Liang Zeng,
Ziqi Gao,
Shen Yuan,
Yatao Bian,
Bingzhe Wu,
Hengtong Zhang,
Yang Yu,
Chan Lu,
Zhipeng Zhou,
Hongteng Xu,
Jia Li,
Peilin Zhao,
Pheng-Ann Heng
Abstract:
The last decade has witnessed a prosperous development of computational methods and dataset curation for AI-aided drug discovery (AIDD). However, real-world pharmaceutical datasets often exhibit highly imbalanced distribution, which is overlooked by the current literature but may severely compromise the fairness and generalization of machine learning applications. Motivated by this observation, we…
▽ More
The last decade has witnessed a prosperous development of computational methods and dataset curation for AI-aided drug discovery (AIDD). However, real-world pharmaceutical datasets often exhibit highly imbalanced distribution, which is overlooked by the current literature but may severely compromise the fairness and generalization of machine learning applications. Motivated by this observation, we introduce ImDrug, a comprehensive benchmark with an open-source Python library which consists of 4 imbalance settings, 11 AI-ready datasets, 54 learning tasks and 16 baseline algorithms tailored for imbalanced learning. It provides an accessible and customizable testbed for problems and solutions spanning a broad spectrum of the drug discovery pipeline such as molecular modeling, drug-target interaction and retrosynthesis. We conduct extensive empirical studies with novel evaluation metrics, to demonstrate that the existing algorithms fall short of solving medicinal and pharmaceutical challenges in the data imbalance scenario. We believe that ImDrug opens up avenues for future research and development, on real-world challenges at the intersection of AIDD and deep imbalanced learning.
△ Less
Submitted 17 October, 2022; v1 submitted 16 September, 2022;
originally announced September 2022.
-
DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations
Authors:
Yuanfeng Ji,
Lu Zhang,
Jiaxiang Wu,
Bingzhe Wu,
Long-Kai Huang,
Tingyang Xu,
Yu Rong,
Lanqing Li,
Jie Ren,
Ding Xue,
Houtim Lai,
Shaoyong Xu,
Jing Feng,
Wei Liu,
Ping Luo,
Shuigeng Zhou,
Junzhou Huang,
Peilin Zhao,
Yatao Bian
Abstract:
AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise},…
▽ More
AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise}, which is inevitable in real world AIDD applications.
In this work, we present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery, which comes with an open-source Python package that fully automates the data curation and OOD benchmarking processes. We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction, which involves both macromolecule (protein target) and small-molecule (drug compound). In contrast to only providing fixed datasets, DrugOOD offers automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise annotations and rigorous benchmarking of state-of-the-art OOD algorithms. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for \emph{graph OOD learning} problems. Extensive empirical studies have shown a significant performance gap between in-distribution and out-of-distribution experiments, which highlights the need to develop better schemes that can allow for OOD generalization under noise for AIDD.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
RetroComposer: Composing Templates for Template-Based Retrosynthesis Prediction
Authors:
Chaochao Yan,
Peilin Zhao,
Chan Lu,
Yang Yu,
Junzhou Huang
Abstract:
The main target of retrosynthesis is to recursively decompose desired molecules into available building blocks. Existing template-based retrosynthesis methods follow a template selection stereotype and suffer from limited training templates, which prevents them from discovering novel reactions. To overcome this limitation, we propose an innovative retrosynthesis prediction framework that can compo…
▽ More
The main target of retrosynthesis is to recursively decompose desired molecules into available building blocks. Existing template-based retrosynthesis methods follow a template selection stereotype and suffer from limited training templates, which prevents them from discovering novel reactions. To overcome this limitation, we propose an innovative retrosynthesis prediction framework that can compose novel templates beyond training templates. As far as we know, this is the first method that uses machine learning to compose reaction templates for retrosynthesis prediction. Besides, we propose an effective reactant candidate scoring model that can capture atom-level transformations, which helps our method outperform previous methods on the USPTO-50K dataset. Experimental results show that our method can produce novel templates for 15 USPTO-50K test reactions that are not covered by training templates. We have released our source implementation.
△ Less
Submitted 22 December, 2022; v1 submitted 20 December, 2021;
originally announced December 2021.
-
RetroXpert: Decompose Retrosynthesis Prediction like a Chemist
Authors:
Chaochao Yan,
Qianggang Ding,
Peilin Zhao,
Shuangjia Zheng,
Jinyu Yang,
Yang Yu,
Junzhou Huang
Abstract:
Retrosynthesis is the process of recursively decomposing target molecules into available building blocks. It plays an important role in solving problems in organic synthesis planning. To automate or assist in the retrosynthesis analysis, various retrosynthesis prediction algorithms have been proposed. However, most of them are cumbersome and lack interpretability about their predictions. In this p…
▽ More
Retrosynthesis is the process of recursively decomposing target molecules into available building blocks. It plays an important role in solving problems in organic synthesis planning. To automate or assist in the retrosynthesis analysis, various retrosynthesis prediction algorithms have been proposed. However, most of them are cumbersome and lack interpretability about their predictions. In this paper, we devise a novel template-free algorithm for automatic retrosynthetic expansion inspired by how chemists approach retrosynthesis prediction. Our method disassembles retrosynthesis into two steps: i) identify the potential reaction center of the target molecule through a novel graph neural network and generate intermediate synthons, and ii) generate the reactants associated with synthons via a robust reactant generation model. While outperforming the state-of-the-art baselines by a significant margin, our model also provides chemically reasonable interpretation.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Statistical Issues and Recommendations for Clinical Trials Conducted During the COVID-19 Pandemic
Authors:
R. Daniel Meyer,
Bohdana Ratitch,
Marcel Wolbers,
Olga Marchenko,
Hui Quan,
Daniel Li,
Chrissie Fletcher,
Xin Li,
David Wright,
Yue Shentu,
Stefan Englert,
Wei Shen,
Jyotirmoy Dey,
Thomas Liu,
Ming Zhou,
Norman Bohidar,
Peng-Liang Zhao,
Michael Hale
Abstract:
The COVID-19 pandemic has had and continues to have major impacts on planned and ongoing clinical trials. Its effects on trial data create multiple potential statistical issues. The scale of impact is unprecedented, but when viewed individually, many of the issues are well defined and feasible to address. A number of strategies and recommendations are put forward to assess and address issues relat…
▽ More
The COVID-19 pandemic has had and continues to have major impacts on planned and ongoing clinical trials. Its effects on trial data create multiple potential statistical issues. The scale of impact is unprecedented, but when viewed individually, many of the issues are well defined and feasible to address. A number of strategies and recommendations are put forward to assess and address issues related to estimands, missing data, validity and modifications of statistical analysis methods, need for additional analyses, ability to meet objectives and overall trial interpretability.
△ Less
Submitted 20 May, 2020;
originally announced May 2020.
-
Mitochondria in higher plants possess H2 evolving activity which is closely related to complex I
Authors:
Xin Zhang,
Zhao Zhang,
Yanan Wei,
Muhan Li,
Pengxiang Zhao,
Yao Mawulikplimi Adzavon,
Mengyu Liu,
Xiaokang Zhang,
Fei Xie,
Andong Wang,
Jihong Sun,
Yunlong Shao,
Xiayan Wang,
Xuejun Sun,
Xuemei Ma
Abstract:
Hydrogenase occupy a central place in the energy metabolism of anaerobic bacteria. Although the structure of mitochondrial complex I is similar to that of hydrogenase, whether it has hydrogen metabolic activity remain unclear. Here, we show that a H2 evolving activity exists in higher plants mitochondria and is closely related to complex I, especially around ubiquinone binding site. The H2 product…
▽ More
Hydrogenase occupy a central place in the energy metabolism of anaerobic bacteria. Although the structure of mitochondrial complex I is similar to that of hydrogenase, whether it has hydrogen metabolic activity remain unclear. Here, we show that a H2 evolving activity exists in higher plants mitochondria and is closely related to complex I, especially around ubiquinone binding site. The H2 production could be inhibited by rotenone and ubiquinone. Hypoxia could simultaneously promote H2 evolution and succinate accumulation. Redox properties of quinone pool, adjusted by NADH or succinate according to oxygen concentration, acts as a valve to control the flow of protons and electrons and the production of H2. The coupling of H2 evolving activity of mitochondrial complex I with metabolic regulation reveals a more effective redox homeostasis regulation mechanism. Considering the ubiquity of mitochondria in eukaryotes, H2 metabolism might be the innate function of higher organisms. This may serve to explain, at least in part, the broad physiological effects of H2.
△ Less
Submitted 7 January, 2020;
originally announced January 2020.
-
Conversion of the chemical concentration of odorous mixtures into odour concentration and odour intensity: a comparison of methods
Authors:
C. Wu,
J. Liu,
P. Zhao,
M. Piringer,
G. Schauberger
Abstract:
Continuous odour measurements both of emissions as well as ambient concentrations are seldom realised, mainly because of their high costs. They are therefore often substituted by concentration measurements of odorous substances. Then a conversion of the chemical concentrations C (mg m-3) into odour concentrations COD (ouE m-3) and odour intensities OI is necessary. Four methods to convert the conc…
▽ More
Continuous odour measurements both of emissions as well as ambient concentrations are seldom realised, mainly because of their high costs. They are therefore often substituted by concentration measurements of odorous substances. Then a conversion of the chemical concentrations C (mg m-3) into odour concentrations COD (ouE m-3) and odour intensities OI is necessary. Four methods to convert the concentrations of single substances to the odour concentrations and odour intensities of an odorous mixture are investigated: (1) direct use of measured concentrations, (2) the sum of the odour activity value SOAV, (3) the sum of the odour intensities SOI, and (4) the equivalent odour concentration EOC, as a new method. The methods are evaluated with olfactometric measurements of seven substances as well as their mixtures. The results indicate that the SOI and EOC conversion methods deliver reliable values. These methods use not only the odour threshold concentration but also the slope of the Weber-Fechner law to include the sensitivity of the odour perception of the individual substances. They fulfil the criteria of an objective conversion without the need of a further calibration by additional olfactometric measurements.
△ Less
Submitted 21 December, 2015;
originally announced December 2015.