Zum Hauptinhalt springen

Showing 1–22 of 22 results for author: Clyde, A

.
  1. arXiv:2310.04610  [pdf, other

    cs.AI cs.LG

    DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

    Authors: Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri , et al. (67 additional authors not shown)

    Abstract: In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique… ▽ More

    Submitted 11 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  2. arXiv:2308.01921  [pdf, other

    q-bio.BM cs.AI cs.LG

    Transferable Graph Neural Fingerprint Models for Quick Response to Future Bio-Threats

    Authors: Wei Chen, Yihui Ren, Ai Kagawa, Matthew R. Carbone, Samuel Yen-Chi Chen, Xiaohui Qu, Shinjae Yoo, Austin Clyde, Arvind Ramanathan, Rick L. Stevens, Hubertus J. J. van Dam, Deyu Lu

    Abstract: Fast screening of drug molecules based on the ligand binding affinity is an important step in the drug discovery pipeline. Graph neural fingerprint is a promising method for developing molecular docking surrogates with high throughput and great fidelity. In this study, we built a COVID-19 drug docking dataset of about 300,000 drug candidates on 23 coronavirus protein targets. With this dataset, we… ▽ More

    Submitted 14 September, 2023; v1 submitted 17 July, 2023; originally announced August 2023.

    Comments: 8 pages, 5 figures, 2 tables, accepted by ICLMA2023

    ACM Class: I.2.1

  3. arXiv:2211.10442  [pdf, other

    q-bio.QM cs.LG

    Deep learning methods for drug response prediction in cancer: predominant and emerging trends

    Authors: Alexander Partin, Thomas S. Brettin, Yitan Zhu, Oleksandr Narykov, Austin Clyde, Jamie Overbeek, Rick L. Stevens

    Abstract: Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients.… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

  4. arXiv:2211.02720  [pdf, other

    cs.LG

    Deep Surrogate Docking: Accelerating Automated Drug Discovery with Graph Neural Networks

    Authors: Ryien Hosseini, Filippo Simini, Austin Clyde, Arvind Ramanathan

    Abstract: The process of screening molecules for desirable properties is a key step in several applications, ranging from drug discovery to material design. During the process of drug discovery specifically, protein-ligand docking, or chemical docking, is a standard in-silico scoring technique that estimates the binding affinity of molecules with a specific protein target. Recently, however, as the number o… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: Published as workshop paper at NeurIPS 2022 (AI for Science)

  5. arXiv:2109.05012  [pdf, other

    q-bio.QM q-bio.BM

    Scaffold-Induced Molecular Graph (SIMG): Effective Graph Sampling Methods for High-Throughput Computational Drug Discovery

    Authors: Austin Clyde, Ashka Shah, Max Zvyagin, Arvind Ramanathan, Rick Stevens

    Abstract: Scaffold based drug discovery (SBDD) is a technique for drug discovery which pins chemical scaffolds as the framework of design. Scaffolds, or molecular frameworks, organize the design of compounds into local neighborhoods. We formalize scaffold based drug discovery into a network design. Utilizing docking data from SARS-CoV-2 virtual screening studies and JAK2 kinase assay data, we showcase how a… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

  6. arXiv:2106.07036  [pdf, other

    q-bio.BM cs.LG

    Protein-Ligand Docking Surrogate Models: A SARS-CoV-2 Benchmark for Deep Learning Accelerated Virtual Screening

    Authors: Austin Clyde, Thomas Brettin, Alexander Partin, Hyunseung Yoo, Yadu Babuji, Ben Blaiszik, Andre Merzky, Matteo Turilli, Shantenu Jha, Arvind Ramanathan, Rick Stevens

    Abstract: We propose a benchmark to study surrogate model accuracy for protein-ligand docking. We share a dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million "in-stock" molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. Our work shows surrogate docking models have six orders of magnitude more throughput than standa… ▽ More

    Submitted 30 June, 2021; v1 submitted 13 June, 2021; originally announced June 2021.

  7. arXiv:2106.02190  [pdf, other

    cs.LG cs.AI q-bio.BM

    Spatial Graph Attention and Curiosity-driven Policy for Antiviral Drug Discovery

    Authors: Yulun Wu, Mikaela Cashman, Nicholas Choma, Érica T. Prates, Verónica G. Melesse Vergara, Manesh Shah, Andrew Chen, Austin Clyde, Thomas S. Brettin, Wibe A. de Jong, Neeraj Kumar, Martha S. Head, Rick L. Stevens, Peter Nugent, Daniel A. Jacobson, James B. Brown

    Abstract: We developed Distilled Graph Attention Policy Network (DGAPN), a reinforcement learning model to generate novel graph-structured chemical representations that optimize user-defined objectives by efficiently navigating a physically constrained domain. The framework is examined on the task of generating molecules that are designed to bind, noncovalently, to functional sites of SARS-CoV-2 proteins. W… ▽ More

    Submitted 11 May, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

  8. arXiv:2104.08961  [pdf, other

    q-bio.QM

    A cross-study analysis of drug response prediction in cancer cell lines

    Authors: Fangfang Xia, Jonathan Allen, Prasanna Balaprakash, Thomas Brettin, Cristina Garcia-Cardona, Austin Clyde, Judith Cohn, James Doroshow, Xiaotian Duan, Veronika Dubinkina, Yvonne Evrard, Ya Ju Fan, Jason Gans, Stewart He, Pinyi Lu, Sergei Maslov, Alexander Partin, Maulik Shukla, Eric Stahlberg, Justin M. Wozniak, Hyunseung Yoo, George Zaki, Yitan Zhu, Rick Stevens

    Abstract: To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross validation within a single study to assess model accuracy. While an essential first step, cross validation within a biological data set typically provides an overly optimistic estimat… ▽ More

    Submitted 13 August, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: Accepted by Briefings in Bioinformatics

  9. arXiv:2103.06867  [pdf, other

    cs.LG

    Scaffold Embeddings: Learning the Structure Spanned by Chemical Fragments, Scaffolds and Compounds

    Authors: Austin Clyde, Arvind Ramanathan, Rick Stevens

    Abstract: Molecules have seemed like a natural fit to deep learning's tendency to handle a complex structure through representation learning, given enough data. However, this often continuous representation is not natural for understanding chemical space as a domain and is particular to samples and their differences. We focus on exploring a natural structure for representing chemical space as a structured d… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

  10. arXiv:2103.02843  [pdf

    cs.DC cs.CE cs.LG physics.bio-ph q-bio.QM

    Pandemic Drugs at Pandemic Speed: Infrastructure for Accelerating COVID-19 Drug Discovery with Hybrid Machine Learning- and Physics-based Simulations on High Performance Computers

    Authors: Agastya P. Bhati, Shunzhou Wan, Dario Alfè, Austin R. Clyde, Mathis Bode, Li Tan, Mikhail Titov, Andre Merzky, Matteo Turilli, Shantenu Jha, Roger R. Highfield, Walter Rocchia, Nicola Scafuri, Sauro Succi, Dieter Kranzlmüller, Gerald Mathias, David Wifling, Yann Donon, Alberto Di Meglio, Sofia Vallecorsa, Heng Ma, Anda Trifan, Arvind Ramanathan, Tom Brettin, Alexander Partin , et al. (4 additional authors not shown)

    Abstract: The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods… ▽ More

    Submitted 4 September, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

    Journal ref: Interface Focus. 2021. 11 (6): 20210018

  11. arXiv:2011.12466  [pdf, other

    q-bio.QM cs.LG

    Learning Curves for Drug Response Prediction in Cancer Cell Lines

    Authors: Alexander Partin, Thomas Brettin, Yvonne A. Evrard, Yitan Zhu, Hyunseung Yoo, Fangfang Xia, Songhao Jiang, Austin Clyde, Maulik Shukla, Michael Fonstein, James H. Doroshow, Rick Stevens

    Abstract: Motivated by the size of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating data, a common question is whether the proposed predictors can further improve the generalization performance with more training data. We utilize empirical learning curves fo… ▽ More

    Submitted 24 November, 2020; originally announced November 2020.

    Comments: 14 pages, 7 figures

  12. arXiv:2010.10517  [pdf, other

    cs.DC cs.CE

    Scalable HPC and AI Infrastructure for COVID-19 Therapeutics

    Authors: Hyungro Lee, Andre Merzky, Li Tan, Mikhail Titov, Matteo Turilli, Dario Alfe, Agastya Bhati, Alex Brace, Austin Clyde, Peter Coveney, Heng Ma, Arvind Ramanathan, Rick Stevens, Anda Trifan, Hubertus Van Dam, Shunzhou Wan, Sean Wilkinson, Shantenu Jha

    Abstract: COVID-19 has claimed more 1 million lives and resulted in over 40 million infections. There is an urgent need to identify drugs that can inhibit SARS-CoV-2. In response, the DOE recently established the Medical Therapeutics project as part of the National Virtual Biotechnology Laboratory, and tasked it with creating the computational infrastructure and methods necessary to advance therapeutics dev… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

  13. arXiv:2010.06574  [pdf, other

    cs.DC cs.CE q-bio.QM

    IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads

    Authors: Aymen Al Saadi, Dario Alfe, Yadu Babuji, Agastya Bhati, Ben Blaiszik, Thomas Brettin, Kyle Chard, Ryan Chard, Peter Coveney, Anda Trifan, Alex Brace, Austin Clyde, Ian Foster, Tom Gibbs, Shantenu Jha, Kristopher Keipert, Thorsten Kurth, Dieter Kranzlmüller, Hyungro Lee, Zhuozhao Li, Heng Ma, Andre Merzky, Gerald Mathias, Alexander Partin, Junqi Yin , et al. (11 additional authors not shown)

    Abstract: The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2-3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silicomethodologies need to be improved to better select lead compounds that can proceed to later stages of the drug discovery protocol accelerating… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

  14. arXiv:2006.02431  [pdf, other

    q-bio.BM cs.LG q-bio.QM stat.ML

    Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data Release

    Authors: Yadu Babuji, Ben Blaiszik, Tom Brettin, Kyle Chard, Ryan Chard, Austin Clyde, Ian Foster, Zhi Hong, Shantenu Jha, Zhuozhao Li, Xuefeng Liu, Arvind Ramanathan, Yi Ren, Nicholaus Saint, Marcus Schwarting, Rick Stevens, Hubertus van Dam, Rick Wagner

    Abstract: Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of small molecules. As a contribution to that effort,… ▽ More

    Submitted 27 May, 2020; originally announced June 2020.

    Comments: 11 pages, 5 figures

  15. arXiv:2006.01171  [pdf, other

    q-bio.QM cs.LG stat.ML

    Regression Enrichment Surfaces: a Simple Analysis Technique for Virtual Drug Screening Models

    Authors: Austin Clyde, Xiaotian Duan, Rick Stevens

    Abstract: We present a new method for understanding the performance of a model in virtual drug screening tasks. While most virtual screening problems present as a mix between ranking and classification, the models are typically trained as regression models presenting a problem requiring either a choice of a cutoff or ranking measure. Our method, regression enrichment surfaces (RES), is based on the goal of… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

  16. arXiv:2005.00095  [pdf, other

    cs.LG q-bio.GN q-bio.QM

    A Systematic Approach to Featurization for Cancer Drug Sensitivity Predictions with Deep Learning

    Authors: Austin Clyde, Tom Brettin, Alexander Partin, Maulik Shaulik, Hyunseung Yoo, Yvonne Evrard, Yitan Zhu, Fangfang Xia, Rick Stevens

    Abstract: By combining various cancer cell line (CCL) drug screening panels, the size of the data has grown significantly to begin understanding how advances in deep learning can advance drug response predictions. In this paper we train >35,000 neural network models, sweeping over common featurization techniques. We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 f… ▽ More

    Submitted 4 May, 2020; v1 submitted 30 April, 2020; originally announced May 2020.

  17. Mixtures of g-priors in Generalized Linear Models

    Authors: Yingbo Li, Merlise A. Clyde

    Abstract: Mixtures of Zellner's g-priors have been studied extensively in linear models and have been shown to have numerous desirable properties for Bayesian variable selection and model averaging. Several extensions of g-priors to Generalized Linear Models (GLMs) have been proposed in the literature; however, the choice of prior distribution of g and resulting properties for inference have received consid… ▽ More

    Submitted 4 May, 2018; v1 submitted 24 March, 2015; originally announced March 2015.

  18. Stochastic expansions using continuous dictionaries: Lévy adaptive regression kernels

    Authors: Robert L. Wolpert, Merlise A. Clyde, Chong Tu

    Abstract: This article describes a new class of prior distributions for nonparametric function estimation. The unknown function is modeled as a limit of weighted sums of kernels or generator functions indexed by continuous parameters that control local and global features such as their translation, dilation, modulation and shape. Lévy random fields and their stochastic integrals are employed to induce prior… ▽ More

    Submitted 14 December, 2011; originally announced December 2011.

    Comments: Published in at http://dx.doi.org/10.1214/11-AOS889 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS889

    Journal ref: Annals of Statistics 2011, Vol. 39, No. 4, 1916-1962

  19. arXiv:1108.0020  [pdf, other

    astro-ph.IM astro-ph.EP stat.AP stat.CO

    Bayesian Methods for Analysis and Adaptive Scheduling of Exoplanet Observations

    Authors: Thomas J. Loredo, James O. Berger, David F. Chernoff, Merlise A. Clyde, Bin Liu

    Abstract: We describe work in progress by a collaboration of astronomers and statisticians developing a suite of Bayesian data analysis tools for extrasolar planet (exoplanet) detection, planetary orbit estimation, and adaptive scheduling of observations. Our work addresses analysis of stellar reflex motion data, where a planet is detected by observing the "wobble" of its host star as it responds to the gra… ▽ More

    Submitted 10 May, 2018; v1 submitted 29 July, 2011; originally announced August 2011.

    Comments: 29 pages, 11 figures. An abridged version is accepted for publication in Statistical Methodology for a special issue on astrostatistics, with selected (refereed) papers presented at the Astronomical Data Analysis Conference (ADA VI) held in Monastir, Tunisia, in May 2010. Update corrects equation (3)

    Journal ref: Statistical Methodology 9 (2012) 101-114

  20. Bayesian nonparametric models for peak identification in MALDI-TOF mass spectroscopy

    Authors: Leanna L. House, Merlise A. Clyde, Robert L. Wolpert

    Abstract: We present a novel nonparametric Bayesian approach based on Lévy Adaptive Regression Kernels (LARK) to model spectral data arising from MALDI-TOF (Matrix Assisted Laser Desorption Ionization Time-of-Flight) mass spectrometry. This model-based approach provides identification and quantification of proteins through model parameters that are directly interpretable as the number of proteins, mass and… ▽ More

    Submitted 27 July, 2011; originally announced July 2011.

    Comments: Published in at http://dx.doi.org/10.1214/10-AOAS450 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS450

    Journal ref: Annals of Applied Statistics 2011, Vol. 5, No. 2B, 1488-1511

  21. Statistical methods for automated drug susceptibility testing: Bayesian minimum inhibitory concentration prediction from growth curves

    Authors: Xi Kathy Zhou, Merlise A. Clyde, James Garrett, Viridiana Lourdes, Michael O'Connell, Giovanni Parmigiani, David J. Turner, Tim Wiles

    Abstract: Determination of the minimum inhibitory concentration (MIC) of a drug that prevents microbial growth is an important step for managing patients with infections. In this paper we present a novel probabilistic approach that accurately estimates MICs based on a panel of multiple curves reflecting features of bacterial growth. We develop a probabilistic model for determining whether a given dilution… ▽ More

    Submitted 20 August, 2009; originally announced August 2009.

    Comments: Published in at http://dx.doi.org/10.1214/08-AOAS217 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS217

    Journal ref: Annals of Applied Statistics 2009, Vol. 3, No. 2, 710-730

  22. Bayesian model search and multilevel inference for SNP association studies

    Authors: Melanie A. Wilson, Edwin S. Iversen, Merlise A. Clyde, Scott C. Schmidler, Joellen M. Schildkraut

    Abstract: Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the… ▽ More

    Submitted 12 November, 2010; v1 submitted 7 August, 2009; originally announced August 2009.

    Comments: Published in at http://dx.doi.org/10.1214/09-AOAS322 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS322

    Journal ref: Annals of Applied Statistics 2010, Vol. 4, No. 3, 1342-1364