-
Enhancing Ligand Pose Sampling for Molecular Docking
Authors:
Patricia Suriana,
Ron O. Dror
Abstract:
Deep learning promises to dramatically improve scoring functions for molecular docking, leading to substantial advances in binding pose prediction and virtual screening. To train scoring functions-and to perform molecular docking-one must generate a set of candidate ligand binding poses. Unfortunately, the sampling protocols currently used to generate candidate poses frequently fail to produce any…
▽ More
Deep learning promises to dramatically improve scoring functions for molecular docking, leading to substantial advances in binding pose prediction and virtual screening. To train scoring functions-and to perform molecular docking-one must generate a set of candidate ligand binding poses. Unfortunately, the sampling protocols currently used to generate candidate poses frequently fail to produce any poses close to the correct, experimentally determined pose, unless information about the correct pose is provided. This limits the accuracy of learned scoring functions and molecular docking. Here, we describe two improved protocols for pose sampling: GLOW (auGmented sampLing with sOftened vdW potential) and a novel technique named IVES (IteratiVe Ensemble Sampling). Our benchmarking results demonstrate the effectiveness of our methods in improving the likelihood of sampling accurate poses, especially for binding pockets whose shape changes substantially when different ligands bind. This improvement is observed across both experimentally determined and AlphaFold-generated protein structures. Additionally, we present datasets of candidate ligand poses generated using our methods for each of around 5,000 protein-ligand cross-docking pairs, for training and testing scoring functions. To benefit the research community, we provide these cross-docking datasets and an open-source Python implementation of GLOW and IVES at https://github.com/drorlab/GLOW_IVES .
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
A Comparative Analysis Between the Additive and the Multiplicative Extended Kalman Filter for Satellite Attitude Determination
Authors:
Hamza A. Hassan,
William Tolstrup,
Johanes P. Suriana,
Ibrahim D. Kiziloklu
Abstract:
The general consensus is that the Multiplicative Extended Kalman Filter (MEKF) is superior to the Additive Extended Kalman Filter (AEKF) based on a wealth of theoretical evidence. This paper deals with a practical comparison between the two filters in simulation with the goal of verifying if the previous theoretical foundations are true. The AEKF and MEKF are two variants of the Extended Kalman Fi…
▽ More
The general consensus is that the Multiplicative Extended Kalman Filter (MEKF) is superior to the Additive Extended Kalman Filter (AEKF) based on a wealth of theoretical evidence. This paper deals with a practical comparison between the two filters in simulation with the goal of verifying if the previous theoretical foundations are true. The AEKF and MEKF are two variants of the Extended Kalman Filter that differ in their approach to linearizing the system dynamics. The AEKF uses an additive correction term to update the state estimate, while the MEKF uses a multiplicative correction term. The two also differ in the state of which they use. The AEKF uses the quaternion as its state while the MEKF uses the Gibbs vector as its state. The results show that the MEKF consistently outperforms the AEKF in terms of estimation accuracy with lower uncertainty. The AEKF is more computationally efficient, but the difference is so low that it is almost negligible and it has no effect on a real-time application. Overall, the results suggest that the MEKF is a better choise for satellite attitude estimation due to its superior estimation accuracy and lower uncertainty, which agrees with the statements from previous work
△ Less
Submitted 13 July, 2023; v1 submitted 12 July, 2023;
originally announced July 2023.
-
FlexVDW: A machine learning approach to account for protein flexibility in ligand docking
Authors:
Patricia Suriana,
Joseph M. Paggi,
Ron O. Dror
Abstract:
Most widely used ligand docking methods assume a rigid protein structure. This leads to problems when the structure of the target protein deforms upon ligand binding. In particular, the ligand's true binding pose is often scored very unfavorably due to apparent clashes between ligand and protein atoms, which lead to extremely high values of the calculated van der Waals energy term. Traditionally,…
▽ More
Most widely used ligand docking methods assume a rigid protein structure. This leads to problems when the structure of the target protein deforms upon ligand binding. In particular, the ligand's true binding pose is often scored very unfavorably due to apparent clashes between ligand and protein atoms, which lead to extremely high values of the calculated van der Waals energy term. Traditionally, this problem has been addressed by explicitly searching for receptor conformations to account for the flexibility of the receptor in ligand binding. Here we present a deep learning model trained to take receptor flexibility into account implicitly when predicting van der Waals energy. We show that incorporating this machine-learned energy term into a state-of-the-art physics-based scoring function improves small molecule ligand pose prediction results in cases with substantial protein deformation, without degrading performance in cases with minimal protein deformation. This work demonstrates the feasibility of learning effects of protein flexibility on ligand binding without explicitly modeling changes in protein structure.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
ATOM3D: Tasks On Molecules in Three Dimensions
Authors:
Raphael J. L. Townshend,
Martin Vögele,
Patricia Suriana,
Alexander Derry,
Alexander Powers,
Yianni Laloudakis,
Sidhika Balachandar,
Bowen Jing,
Brandon Anderson,
Stephan Eismann,
Risi Kondor,
Russ B. Altman,
Ron O. Dror
Abstract:
Computational methods that operate on three-dimensional molecular structure have the potential to solve important questions in biology and chemistry. In particular, deep neural networks have gained significant attention, but their widespread adoption in the biomolecular domain has been limited by a lack of either systematic performance benchmarks or a unified toolkit for interacting with molecular…
▽ More
Computational methods that operate on three-dimensional molecular structure have the potential to solve important questions in biology and chemistry. In particular, deep neural networks have gained significant attention, but their widespread adoption in the biomolecular domain has been limited by a lack of either systematic performance benchmarks or a unified toolkit for interacting with molecular data. To address this, we present ATOM3D, a collection of both novel and existing benchmark datasets spanning several key classes of biomolecules. We implement several classes of three-dimensional molecular learning methods for each of these tasks and show that they consistently improve performance relative to methods based on one- and two-dimensional representations. The specific choice of architecture proves to be critical for performance, with three-dimensional convolutional networks excelling at tasks involving complex geometries, graph networks performing well on systems requiring detailed positional information, and the more recently developed equivariant networks showing significant promise. Our results indicate that many molecular problems stand to gain from three-dimensional molecular learning, and that there is potential for improvement on many tasks which remain underexplored. To lower the barrier to entry and facilitate further developments in the field, we also provide a comprehensive suite of tools for dataset processing, model training, and evaluation in our open-source atom3d Python package. All datasets are available for download from https://www.atom3d.ai .
△ Less
Submitted 15 January, 2022; v1 submitted 7 December, 2020;
originally announced December 2020.
-
Protein model quality assessment using rotation-equivariant, hierarchical neural networks
Authors:
Stephan Eismann,
Patricia Suriana,
Bowen Jing,
Raphael J. L. Townshend,
Ron O. Dror
Abstract:
Proteins are miniature machines whose function depends on their three-dimensional (3D) structure. Determining this structure computationally remains an unsolved grand challenge. A major bottleneck involves selecting the most accurate structural model among a large pool of candidates, a task addressed in model quality assessment. Here, we present a novel deep learning approach to assess the quality…
▽ More
Proteins are miniature machines whose function depends on their three-dimensional (3D) structure. Determining this structure computationally remains an unsolved grand challenge. A major bottleneck involves selecting the most accurate structural model among a large pool of candidates, a task addressed in model quality assessment. Here, we present a novel deep learning approach to assess the quality of a protein model. Our network builds on a point-based representation of the atomic structure and rotation-equivariant convolutions at different levels of structural resolution. These combined aspects allow the network to learn end-to-end from entire protein structures. Our method achieves state-of-the-art results in scoring protein models submitted to recent rounds of CASP, a blind prediction community experiment. Particularly striking is that our method does not use physics-inspired energy terms and does not rely on the availability of additional information (beyond the atomic structure of the individual protein model), such as sequence alignments of multiple proteins.
△ Less
Submitted 27 November, 2020;
originally announced November 2020.
-
Learning from Protein Structure with Geometric Vector Perceptrons
Authors:
Bowen Jing,
Stephan Eismann,
Patricia Suriana,
Raphael J. L. Townshend,
Ron Dror
Abstract:
Learning on 3D structures of large biomolecules is emerging as a distinct area in machine learning, but there has yet to emerge a unifying network architecture that simultaneously leverages the graph-structured and geometric aspects of the problem domain. To address this gap, we introduce geometric vector perceptrons, which extend standard dense layers to operate on collections of Euclidean vector…
▽ More
Learning on 3D structures of large biomolecules is emerging as a distinct area in machine learning, but there has yet to emerge a unifying network architecture that simultaneously leverages the graph-structured and geometric aspects of the problem domain. To address this gap, we introduce geometric vector perceptrons, which extend standard dense layers to operate on collections of Euclidean vectors. Graph neural networks equipped with such layers are able to perform both geometric and relational reasoning on efficient and natural representations of macromolecular structure. We demonstrate our approach on two important problems in learning from protein structure: model quality assessment and computational protein design. Our approach improves over existing classes of architectures, including state-of-the-art graph-based and voxel-based methods. We release our code at https://github.com/drorlab/gvp.
△ Less
Submitted 15 May, 2021; v1 submitted 2 September, 2020;
originally announced September 2020.
-
End-to-End Learning on 3D Protein Structure for Interface Prediction
Authors:
Raphael J. L. Townshend,
Rishi Bedi,
Patricia A. Suriana,
Ron O. Dror
Abstract:
Despite an explosion in the number of experimentally determined, atomically detailed structures of biomolecules, many critical tasks in structural biology remain data-limited. Whether performance in such tasks can be improved by using large repositories of tangentially related structural data remains an open question. To address this question, we focused on a central problem in biology: predicting…
▽ More
Despite an explosion in the number of experimentally determined, atomically detailed structures of biomolecules, many critical tasks in structural biology remain data-limited. Whether performance in such tasks can be improved by using large repositories of tangentially related structural data remains an open question. To address this question, we focused on a central problem in biology: predicting how proteins interact with one another---that is, which surfaces of one protein bind to those of another protein. We built a training dataset, the Database of Interacting Protein Structures (DIPS), that contains biases but is two orders of magnitude larger than those used previously. We found that these biases significantly degrade the performance of existing methods on gold-standard data. Hypothesizing that assumptions baked into the hand-crafted features on which these methods depend were the source of the problem, we developed the first end-to-end learning model for protein interface prediction, the Siamese Atomic Surfacelet Network (SASNet). Using only spatial coordinates and identities of atoms, SASNet outperforms state-of-the-art methods trained on gold-standard structural data, even when trained on only 3% of our new dataset. Code and data available at https://github.com/drorlab/DIPS.
△ Less
Submitted 26 December, 2019; v1 submitted 3 July, 2018;
originally announced July 2018.
-
Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code
Authors:
Riyadh Baghdadi,
Jessica Ray,
Malek Ben Romdhane,
Emanuele Del Sozzo,
Abdurrahman Akkas,
Yunming Zhang,
Patricia Suriana,
Shoaib Kamil,
Saman Amarasinghe
Abstract:
This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu introduces a scheduling language with novel extensions to explicitly manage the complexities that arise when targeting these systems. The framework is designed for the areas of image processing, stencils, linear alge…
▽ More
This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu introduces a scheduling language with novel extensions to explicitly manage the complexities that arise when targeting these systems. The framework is designed for the areas of image processing, stencils, linear algebra and deep learning. Tiramisu has two main features: it relies on a flexible representation based on the polyhedral model and it has a rich scheduling language allowing fine-grained control of optimizations. Tiramisu uses a four-level intermediate representation that allows full separation between the algorithms, loop transformations, data layouts, and communication. This separation simplifies targeting multiple hardware architectures with the same algorithm. We evaluate Tiramisu by writing a set of image processing, deep learning, and linear algebra benchmarks and compare them with state-of-the-art compilers and hand-tuned libraries. We show that Tiramisu matches or outperforms existing compilers and libraries on different hardware architectures, including multicore CPUs, GPUs, and distributed machines.
△ Less
Submitted 20 December, 2018; v1 submitted 27 April, 2018;
originally announced April 2018.
-
Technical Report about Tiramisu: a Three-Layered Abstraction for Hiding Hardware Complexity from DSL Compilers
Authors:
Riyadh Baghdadi,
Jessica Ray,
Malek Ben Romdhane,
Emanuele Del Sozzo,
Patricia Suriana,
Shoaib Kamil,
Saman Amarasinghe
Abstract:
High-performance DSL developers work hard to take advantage of modern hardware. The DSL compilers have to build their own complex middle-ends before they can target a common back-end such as LLVM, which only handles single instruction streams with SIMD instructions. We introduce Tiramisu, a common middle-end that can generate efficient code for modern processors and accelerators such as multicores…
▽ More
High-performance DSL developers work hard to take advantage of modern hardware. The DSL compilers have to build their own complex middle-ends before they can target a common back-end such as LLVM, which only handles single instruction streams with SIMD instructions. We introduce Tiramisu, a common middle-end that can generate efficient code for modern processors and accelerators such as multicores, GPUs, FPGAs and distributed clusters. Tiramisu introduces a novel three-level IR that separates the algorithm, how that algorithm is executed, and where intermediate data are stored. This separation simplifies optimization and makes targeting multiple hardware architectures from the same algorithm easier. As a result, DSL compilers can be made considerably less complex with no loss of performance while immediately targeting multiple hardware or hardware combinations such as distributed nodes with both CPUs and GPUs. We evaluated Tiramisu by creating a new middle-end for the Halide and Julia compilers. We show that Tiramisu extends Halide and Julia with many new capabilities including the ability to: express new algorithms (such as recurrent filters and non-rectangular iteration spaces), perform new complex loop nest transformations (such as wavefront parallelization, loop shifting and loop fusion) and generate efficient code for more architectures (such as combinations of distributed clusters, multicores, GPUs and FPGAs). Finally, we demonstrate that Tiramisu can generate very efficient code that matches the highly optimized Intel MKL gemm (generalized matrix multiplication) implementation, we also show speedups reaching 4X in Halide and 16X in Julia due to optimizations enabled by Tiramisu.
△ Less
Submitted 28 May, 2018; v1 submitted 28 February, 2018;
originally announced March 2018.