Search | arXiv e-print repository

ExaWorks Software Development Kit: A Robust and Scalable Collection of Interoperable Workflow Technologies

Authors: Matteo Turilli, Mihael Hategan-Marandiuc, Mikhail Titov, Ketan Maheshwari, Aymen Alsaadi, Andre Merzky, Ramon Arambula, Mikhail Zakharchanka, Matt Cowan, Justin M. Wozniak, Andreas Wilke, Ozgur Ozan Kilic, Kyle Chard, Rafael Ferreira da Silva, Shantenu Jha, Daniel Laney

Abstract: Scientific discovery increasingly requires executing heterogeneous scientific workflows on high-performance computing (HPC) platforms. Heterogeneous workflows contain different types of tasks (e.g., simulation, analysis, and learning) that need to be mapped, scheduled, and launched on different computing. That requires a software stack that enables users to code their workflows and automate resour… ▽ More Scientific discovery increasingly requires executing heterogeneous scientific workflows on high-performance computing (HPC) platforms. Heterogeneous workflows contain different types of tasks (e.g., simulation, analysis, and learning) that need to be mapped, scheduled, and launched on different computing. That requires a software stack that enables users to code their workflows and automate resource management and workflow execution. Currently, there are many workflow technologies with diverse levels of robustness and capabilities, and users face difficult choices of software that can effectively and efficiently support their use cases on HPC machines, especially when considering the latest exascale platforms. We contributed to addressing this issue by developing the ExaWorks Software Development Kit (SDK). The SDK is a curated collection of workflow technologies engineered following current best practices and specifically designed to work on HPC platforms. We present our experience with (1) curating those technologies, (2) integrating them to provide users with new capabilities, (3) developing a continuous integration platform to test the SDK on DOE HPC platforms, (4) designing a dashboard to publish the results of those tests, and (5) devising an innovative documentation platform to help users to use those technologies. Our experience details the requirements and the best practices needed to curate workflow technologies, and it also serves as a blueprint for the capabilities and services that DOE will have to offer to support a variety of scientific heterogeneous workflows on the newly available exascale HPC platforms. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2403.18073 [pdf, other]

Workflow Mini-Apps: Portable, Scalable, Tunable & Faithful Representations of Scientific Workflows

Authors: Ozgur Ozan Kilic, Tianle Wang, Matteo Turilli, Mikhail Titov, Andre Merzky, Line Pouchard, Shantenu Jha

Abstract: Workflows are critical for scientific discovery. However, the sophistication, heterogeneity, and scale of workflows make building, testing, and optimizing them increasingly challenging. Furthermore, their complexity and heterogeneity make performance reproducibility hard. In this paper, we propose workflow mini-apps as a tool to address the challenges in building and testing workflows while contro… ▽ More Workflows are critical for scientific discovery. However, the sophistication, heterogeneity, and scale of workflows make building, testing, and optimizing them increasingly challenging. Furthermore, their complexity and heterogeneity make performance reproducibility hard. In this paper, we propose workflow mini-apps as a tool to address the challenges in building and testing workflows while controlling the fidelity of representing realworld workflows. Workflow mini-apps are deployed and run on various HPC systems and architectures without workflow-specific constraints. We offer insight into their design and implementation, providing an analysis of their performance and reproducibility. Workflow mini-apps thus advance the science of workflows by providing simple, portable, and managed (fidelity) representations of otherwise complex and difficult-to-control real workflows. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.15721 [pdf, other]

Design and Implementation of an Analysis Pipeline for Heterogeneous Data

Authors: Arup Kumar Sarker, Aymen Alsaadi, Niranda Perera, Mills Staylor, Gregor von Laszewski, Matteo Turilli, Ozgur Ozan Kilic, Mikhail Titov, Andre Merzky, Shantenu Jha, Geoffrey Fox

Abstract: Managing and preparing complex data for deep learning, a prevalent approach in large-scale data science can be challenging. Data transfer for model training also presents difficulties, impacting scientific fields like genomics, climate modeling, and astronomy. A large-scale solution like Google Pathways with a distributed execution environment for deep learning models exists but is proprietary. In… ▽ More Managing and preparing complex data for deep learning, a prevalent approach in large-scale data science can be challenging. Data transfer for model training also presents difficulties, impacting scientific fields like genomics, climate modeling, and astronomy. A large-scale solution like Google Pathways with a distributed execution environment for deep learning models exists but is proprietary. Integrating existing open-source, scalable runtime tools and data frameworks on high-performance computing (HPC) platforms is crucial to address these challenges. Our objective is to establish a smooth and unified method of combining data engineering and deep learning frameworks with diverse execution capabilities that can be deployed on various high-performance computing platforms, including cloud and supercomputers. We aim to support heterogeneous systems with accelerators, where Cylon and other data engineering and deep learning frameworks can utilize heterogeneous execution. To achieve this, we propose Radical-Cylon, a heterogeneous runtime system with a parallel and distributed data framework to execute Cylon as a task of Radical Pilot. We thoroughly explain Radical-Cylon's design and development and the execution process of Cylon tasks using Radical Pilot. This approach enables the use of heterogeneous MPI-communicators across multiple nodes. Radical-Cylon achieves better performance than Bare-Metal Cylon with minimal and constant overhead. Radical-Cylon achieves (4~15)% faster execution time than batch execution while performing similar join and sort operations with 35 million and 3.5 billion rows with the same resources. The approach aims to excel in both scientific and engineering research HPC systems while demonstrating robust performance on cloud infrastructures. This dual capability fosters collaboration and innovation within the open-source scientific research community. △ Less

Submitted 7 April, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

Comments: 14 pages, 16 figures, 2 tables

ACM Class: H.2.4; D.2.7; D.2.2

arXiv:2307.07895 [pdf, other]

doi 10.1109/e-Science58273.2023.10254912

PSI/J: A Portable Interface for Submitting, Monitoring, and Managing Jobs

Authors: Mihael Hategan-Marandiuc, Andre Merzky, Nicholson Collier, Ketan Maheshwari, Jonathan Ozik, Matteo Turilli, Andreas Wilke, Justin M. Wozniak, Kyle Chard, Ian Foster, Rafael Ferreira da Silva, Shantenu Jha, Daniel Laney

Abstract: It is generally desirable for high-performance computing (HPC) applications to be portable between HPC systems, for example to make use of more performant hardware, make effective use of allocations, and to co-locate compute jobs with large datasets. Unfortunately, moving scientific applications between HPC systems is challenging for various reasons, most notably that HPC systems have different HP… ▽ More It is generally desirable for high-performance computing (HPC) applications to be portable between HPC systems, for example to make use of more performant hardware, make effective use of allocations, and to co-locate compute jobs with large datasets. Unfortunately, moving scientific applications between HPC systems is challenging for various reasons, most notably that HPC systems have different HPC schedulers. We introduce PSI/J, a job management abstraction API intended to simplify the construction of software components and applications that are portable over various HPC scheduler implementations. We argue that such a system is both necessary and that no viable alternative currently exists. We analyze similar notable APIs and attempt to determine the factors that influenced their evolution and adoption by the HPC community. We base the design of PSI/J on that analysis. We describe how PSI/J has been integrated in three workflow systems and one application, and also show via experiments that PSI/J imposes minimal overhead. △ Less

Submitted 20 September, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

arXiv:2209.00114 [pdf, other]

doi 10.1109/CCGrid54584.2022.00069

RAPTOR: Ravenous Throughput Computing

Authors: Andre Merzky, Matteo Turilli, Shantenu Jha

Abstract: We describe the design, implementation and performance of the RADICAL-Pilot task overlay (RAPTOR). RAPTOR enables the execution of heterogeneous tasks -- i.e., functions and executables with arbitrary duration -- on HPC platforms, providing high throughput and high resource utilization. RAPTOR supports the high throughput virtual screening requirements of DOE's National Virtual Biotechnology Labor… ▽ More We describe the design, implementation and performance of the RADICAL-Pilot task overlay (RAPTOR). RAPTOR enables the execution of heterogeneous tasks -- i.e., functions and executables with arbitrary duration -- on HPC platforms, providing high throughput and high resource utilization. RAPTOR supports the high throughput virtual screening requirements of DOE's National Virtual Biotechnology Laboratory effort to find therapeutic solutions for COVID-19. RAPTOR has been used on $>8000$ compute nodes to sustain 144M/hour docking hits, and to screen $\sim$10$^{11}$ ligands. To the best of our knowledge, both the throughput rate and aggregated number of executed tasks are a factor of two greater than previously reported in literature. RAPTOR represents important progress towards improvement of computational drug discovery, in terms of size of libraries screened, and for the possibility of generating training data fast enough to serve the last generation of docking surrogate models. △ Less

Submitted 31 August, 2022; originally announced September 2022.

Comments: 10 pages, 9 figures. 22nd International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2022)

arXiv:2208.00056 [pdf]

Pipeline for Automating Compliance-based Elimination and Extension (PACE2): A Systematic Framework for High-throughput Biomolecular Material Simulation Workflows

Authors: Srinivas C. Mushnoori, Ethan Zang, Akash Banerjee, Mason Hooten, Andre Merzky, Matteo Turilli, Shantenu Jha, Meenakshi Dutt

Abstract: The formation of biomolecular materials via dynamical interfacial processes such as self-assembly and fusion, for diverse compositions and external conditions, can be efficiently probed using ensemble Molecular Dynamics. However, this approach requires a large number of simulations when investigating a large composition phase space. In addition, there is difficulty in predicting whether each simul… ▽ More The formation of biomolecular materials via dynamical interfacial processes such as self-assembly and fusion, for diverse compositions and external conditions, can be efficiently probed using ensemble Molecular Dynamics. However, this approach requires a large number of simulations when investigating a large composition phase space. In addition, there is difficulty in predicting whether each simulation is yielding biomolecular materials with the desired properties or outcomes and how long each simulation will run for. These difficulties can be overcome by rules-based management systems which include intermittent inspection, variable sampling, premature termination and extension of the individual Molecular Dynamics simulations. The automation of such a management system can significantly reduce the overhead of managing large ensembles of Molecular Dynamics simulations. To this end, a high-throughput workflows-based computational framework, Pipeline for Automating Compliance-based Elimination and Extension (PACE2), for biomolecular materials simulations is proposed. The PACE2 framework encompasses Simulation-Analysis Pipelines. Each Pipeline includes temporally separated simulation and analysis tasks. When a Molecular Dynamics simulation completes, an analysis task is triggered which evaluates the Molecular Dynamics trajectory for compliance. Compliant Molecular Dynamics simulations are extended to the next Molecular Dynamics phase with a suitable sample rate to allow additional, detailed analysis. Non-compliant Molecular Dynamics simulations are eliminated, and their computational resources are either reallocated or released. The framework is designed to run on local desktop computers and high performance computing resources. In the future, the framework will be extended to address generalized workflows and investigate other classes of materials. △ Less

Submitted 29 July, 2022; originally announced August 2022.

Comments: 25 pages, 9 figures, 4 tables

arXiv:2201.06962 [pdf, other]

A Scalable Solution for Running Ensemble Simulations for Photovoltaic Energy

Authors: Weiming Hu, Guido Cervone, Matteo Turilli, Andre Merzky, Shantenu Jha

Abstract: This chapter proposes and provides an in-depth discussion of a scalable solution for running ensemble simulation for solar energy production. Generating a forecast ensemble is computationally expensive. But with the help of Analog Ensemble, forecast ensembles can be generated with a single deterministic run of a weather forecast model. Weather ensembles are then used to simulate 11 10 KW photovolt… ▽ More This chapter proposes and provides an in-depth discussion of a scalable solution for running ensemble simulation for solar energy production. Generating a forecast ensemble is computationally expensive. But with the help of Analog Ensemble, forecast ensembles can be generated with a single deterministic run of a weather forecast model. Weather ensembles are then used to simulate 11 10 KW photovoltaic solar power systems to study the simulation uncertainty under a wide range of panel configuration and weather conditions. This computational workflow has been deployed onto the NCAR supercomputer, Cheyenne, with more than 7,000 cores. Results show that, spring and summer are typically associated with a larger simulation uncertainty. Optimizing the panel configuration based on their individual performance under changing weather conditions can improve the simulation accuracy by more than 12%. This work also shows how panel configuration can be optimized based on geographic locations. △ Less

Submitted 10 January, 2022; originally announced January 2022.

arXiv:2108.13521 [pdf, other]

ExaWorks: Workflows for Exascale

Authors: Aymen Al-Saadi, Dong H. Ahn, Yadu Babuji, Kyle Chard, James Corbett, Mihael Hategan, Stephen Herbein, Shantenu Jha, Daniel Laney, Andre Merzky, Todd Munson, Michael Salim, Mikhail Titov, Matteo Turilli, Justin M. Wozniak

Abstract: Exascale computers will offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. These software combinations and integrations, however, are difficult to achieve due to challenges of coordination and deployment of heterogeneous software components on diverse and massive platforms.… ▽ More Exascale computers will offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. These software combinations and integrations, however, are difficult to achieve due to challenges of coordination and deployment of heterogeneous software components on diverse and massive platforms. We present the ExaWorks project, which can address many of these challenges: ExaWorks is leading a co-design process to create a workflow software development Toolkit (SDK) consisting of a wide range of workflow management tools that can be composed and interoperate through common interfaces. We describe the initial set of tools and interfaces supported by the SDK, efforts to make them easier to apply to complex science challenges, and examples of their application to exemplar cases. Furthermore, we discuss how our project is working with the workflows community, large computing facilities as well as HPC platform vendors to sustainably address the requirements of workflows at the exascale. △ Less

Submitted 30 August, 2021; originally announced August 2021.

arXiv:2106.07036 [pdf, other]

Protein-Ligand Docking Surrogate Models: A SARS-CoV-2 Benchmark for Deep Learning Accelerated Virtual Screening

Authors: Austin Clyde, Thomas Brettin, Alexander Partin, Hyunseung Yoo, Yadu Babuji, Ben Blaiszik, Andre Merzky, Matteo Turilli, Shantenu Jha, Arvind Ramanathan, Rick Stevens

Abstract: We propose a benchmark to study surrogate model accuracy for protein-ligand docking. We share a dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million "in-stock" molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. Our work shows surrogate docking models have six orders of magnitude more throughput than standa… ▽ More We propose a benchmark to study surrogate model accuracy for protein-ligand docking. We share a dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million "in-stock" molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. Our work shows surrogate docking models have six orders of magnitude more throughput than standard docking protocols on the same supercomputer node types. We demonstrate the power of high-speed surrogate models by running each target against 1 billion molecules in under a day (50k predictions per GPU seconds). We showcase a workflow for docking utilizing surrogate ML models as a pre-filter. Our workflow is ten times faster at screening a library of compounds than the standard technique, with an error rate less than 0.01\% of detecting the underlying best scoring 0.1\% of compounds. Our analysis of the speedup explains that to screen more molecules under a docking paradigm, another order of magnitude speedup must come from model accuracy rather than computing speed (which, if increased, will not anymore alter our throughput to screen molecules). We believe this is strong evidence for the community to begin focusing on improving the accuracy of surrogate models to improve the ability to screen massive compound libraries 100x or even 1000x faster than current techniques. △ Less

Submitted 30 June, 2021; v1 submitted 13 June, 2021; originally announced June 2021.

arXiv:2106.05177 [pdf, other]

doi 10.5281/zenodo.4915801

Workflows Community Summit: Advancing the State-of-the-art of Scientific Workflows Management Systems Research and Development

Authors: Rafael Ferreira da Silva, Henri Casanova, Kyle Chard, Tainã Coleman, Dan Laney, Dong Ahn, Shantenu Jha, Dorran Howell, Stian Soiland-Reys, Ilkay Altintas, Douglas Thain, Rosa Filgueira, Yadu Babuji, Rosa M. Badia, Bartosz Balis, Silvina Caino-Lores, Scott Callaghan, Frederik Coppens, Michael R. Crusoe, Kaushik De, Frank Di Natale, Tu M. A. Do, Bjoern Enders, Thomas Fahringer, Anne Fouilloux , et al. (33 additional authors not shown)

Abstract: Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms. Workflows will play a crucial role i… ▽ More Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms. Workflows will play a crucial role in the data-oriented and post-Moore's computing landscape as they democratize the application of cutting-edge research techniques, computationally intensive methods, and use of new computing platforms. As workflows continue to be adopted by scientific projects and user communities, they are becoming more complex. Workflows are increasingly composed of tasks that perform computations such as short machine learning inference, multi-node simulations, long-running machine learning model training, amongst others, and thus increasingly rely on heterogeneous architectures that include CPUs but also GPUs and accelerators. The workflow management system (WMS) technology landscape is currently segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. Another fundamental problem is that there are conflicting theoretical bases and abstractions for a WMS. Systems that use the same underlying abstractions can likely be translated between, which is not the case for systems that use different abstractions. More information: https://workflowsri.org/summits/technical △ Less

Submitted 9 June, 2021; originally announced June 2021.

arXiv:2105.13185 [pdf, other]

RADICAL-Pilot and Parsl: Executing Heterogeneous Workflows on HPC Platforms

Authors: Aymen Alsaadi, Logan Ward, Andre Merzky, Kyle Chard, Ian Foster, Shantenu Jha, Matteo Turilli

Abstract: Workflows applications are becoming increasingly important to support scientific discovery. That is leading to a proliferation of workflow management systems and, thus, to a fragmented software ecosystem. Integration among existing workflow tools can improve development efficiency and, ultimately, increase the sustainability of scientific workflow software. We describe our experience with integrat… ▽ More Workflows applications are becoming increasingly important to support scientific discovery. That is leading to a proliferation of workflow management systems and, thus, to a fragmented software ecosystem. Integration among existing workflow tools can improve development efficiency and, ultimately, increase the sustainability of scientific workflow software. We describe our experience with integrating RADICAL-Pilot (RP) and Parsl as a way to enable users to develop and execute workflow applications with heterogeneous tasks on heterogeneous high-performance computing resources. We describe our approach to the integration of the two systems and detail the development of RPEX, a Parsl executor which uses RP as its workload manager. We develop an RP executor that executes heterogeneous MPI Python functions on CPU cores and GPUs. We measure the weak and strong scaling of RPEX, RP, and Parsl when providing new capabilities to two paradigmatic use cases: Colmena and Ice Wedge Polygons △ Less

Submitted 30 August, 2022; v1 submitted 27 May, 2021; originally announced May 2021.

arXiv:2103.02843 [pdf]

doi 10.1098/rsfs.2021.0018

Pandemic Drugs at Pandemic Speed: Infrastructure for Accelerating COVID-19 Drug Discovery with Hybrid Machine Learning- and Physics-based Simulations on High Performance Computers

Authors: Agastya P. Bhati, Shunzhou Wan, Dario Alfè, Austin R. Clyde, Mathis Bode, Li Tan, Mikhail Titov, Andre Merzky, Matteo Turilli, Shantenu Jha, Roger R. Highfield, Walter Rocchia, Nicola Scafuri, Sauro Succi, Dieter Kranzlmüller, Gerald Mathias, David Wifling, Yann Donon, Alberto Di Meglio, Sofia Vallecorsa, Heng Ma, Anda Trifan, Arvind Ramanathan, Tom Brettin, Alexander Partin , et al. (4 additional authors not shown)

Abstract: The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods… ▽ More The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative infrastructural development that combines both approaches to accelerate drug discovery. The scale of the potential resulting workflow is such that it is dependent on supercomputing to achieve extremely high throughput. We have demonstrated the viability of this workflow for the study of inhibitors for four COVID-19 target proteins and our ability to perform the required large-scale calculations to identify lead antiviral compounds through repurposing on a variety of supercomputers. △ Less

Submitted 4 September, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

Journal ref: Interface Focus. 2021. 11 (6): 20210018

arXiv:2103.00091 [pdf, other]

Design and Performance Characterization of RADICAL-Pilot on Leadership-class Platforms

Authors: Andre Merzky, Matteo Turilli, Mikhail Titov, Aymen Al-Saadi, Shantenu Jha

Abstract: Many extreme scale scientific applications have workloads comprised of a large number of individual high-performance tasks. The Pilot abstraction decouples workload specification, resource management, and task execution via job placeholders and late-binding. As such, suitable implementations of the Pilot abstraction can support the collective execution of large number of tasks on supercomputers. W… ▽ More Many extreme scale scientific applications have workloads comprised of a large number of individual high-performance tasks. The Pilot abstraction decouples workload specification, resource management, and task execution via job placeholders and late-binding. As such, suitable implementations of the Pilot abstraction can support the collective execution of large number of tasks on supercomputers. We introduce RADICAL-Pilot (RP) as a portable, modular and extensible pilot-enabled runtime system. We describe RP's design, architecture and implementation. We characterize its performance and show its ability to scalably execute workloads comprised of tens of thousands heterogeneous tasks on DOE and NSF leadership-class HPC platforms. Specifically, we investigate RP's weak/strong scaling with CPU/GPU, single/multi core, (non)MPI tasks and Python functions when using most of ORNL Summit and TACC Frontera. RADICAL-Pilot can be used stand-alone, as well as the runtime for third-party workflow systems. △ Less

Submitted 2 November, 2021; v1 submitted 26 February, 2021; originally announced March 2021.

Comments: arXiv admin note: text overlap with arXiv:1801.01843

arXiv:2010.10517 [pdf, other]

Scalable HPC and AI Infrastructure for COVID-19 Therapeutics

Authors: Hyungro Lee, Andre Merzky, Li Tan, Mikhail Titov, Matteo Turilli, Dario Alfe, Agastya Bhati, Alex Brace, Austin Clyde, Peter Coveney, Heng Ma, Arvind Ramanathan, Rick Stevens, Anda Trifan, Hubertus Van Dam, Shunzhou Wan, Sean Wilkinson, Shantenu Jha

Abstract: COVID-19 has claimed more 1 million lives and resulted in over 40 million infections. There is an urgent need to identify drugs that can inhibit SARS-CoV-2. In response, the DOE recently established the Medical Therapeutics project as part of the National Virtual Biotechnology Laboratory, and tasked it with creating the computational infrastructure and methods necessary to advance therapeutics dev… ▽ More COVID-19 has claimed more 1 million lives and resulted in over 40 million infections. There is an urgent need to identify drugs that can inhibit SARS-CoV-2. In response, the DOE recently established the Medical Therapeutics project as part of the National Virtual Biotechnology Laboratory, and tasked it with creating the computational infrastructure and methods necessary to advance therapeutics development. We discuss innovations in computational infrastructure and methods that are accelerating and advancing drug design. Specifically, we describe several methods that integrate artificial intelligence and simulation-based approaches, and the design of computational infrastructure to support these methods at scale. We discuss their implementation and characterize their performance, and highlight science advances that these capabilities have enabled. △ Less

Submitted 20 October, 2020; originally announced October 2020.

arXiv:2010.06574 [pdf, other]

IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads

Authors: Aymen Al Saadi, Dario Alfe, Yadu Babuji, Agastya Bhati, Ben Blaiszik, Thomas Brettin, Kyle Chard, Ryan Chard, Peter Coveney, Anda Trifan, Alex Brace, Austin Clyde, Ian Foster, Tom Gibbs, Shantenu Jha, Kristopher Keipert, Thorsten Kurth, Dieter Kranzlmüller, Hyungro Lee, Zhuozhao Li, Heng Ma, Andre Merzky, Gerald Mathias, Alexander Partin, Junqi Yin , et al. (11 additional authors not shown)

Abstract: The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2-3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silicomethodologies need to be improved to better select lead compounds that can proceed to later stages of the drug discovery protocol accelerating… ▽ More The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2-3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silicomethodologies need to be improved to better select lead compounds that can proceed to later stages of the drug discovery protocol accelerating the entire process. No single methodological approach can achieve the necessary accuracy with required efficiency. Here we describe multiple algorithmic innovations to overcome this fundamental limitation, development and deployment of computational infrastructure at scale integrates multiple artificial intelligence and simulation-based approaches. Three measures of performance are:(i) throughput, the number of ligands per unit time; (ii) scientific performance, the number of effective ligands sampled per unit time and (iii) peak performance, in flop/s. The capabilities outlined here have been used in production for several months as the workhorse of the computational infrastructure to support the capabilities of the US-DOE National Virtual Biotechnology Laboratory in combination with resources from the EU Centre of Excellence in Computational Biomedicine. △ Less

Submitted 13 October, 2020; originally announced October 2020.

arXiv:1909.03057 [pdf, other]

Characterizing the Performance of Executing Many-tasks on Summit

Authors: Matteo Turilli, Andre Merzky, Thomas Naughton, Wael Elwasif, Shantenu Jha

Abstract: Many scientific workloads are comprised of many tasks, where each task is an independent simulation or analysis of data. The execution of millions of tasks on heterogeneous HPC platforms requires scalable dynamic resource management and multi-level scheduling. RADICAL-Pilot (RP) -- an implementation of the Pilot abstraction, addresses these challenges and serves as an effective runtime system to e… ▽ More Many scientific workloads are comprised of many tasks, where each task is an independent simulation or analysis of data. The execution of millions of tasks on heterogeneous HPC platforms requires scalable dynamic resource management and multi-level scheduling. RADICAL-Pilot (RP) -- an implementation of the Pilot abstraction, addresses these challenges and serves as an effective runtime system to execute workloads comprised of many tasks. In this paper, we characterize the performance of executing many tasks using RP when interfaced with JSM and PRRTE on Summit: RP is responsible for resource management and task scheduling on acquired resource; JSM or PRRTE enact the placement of launching of scheduled tasks. Our experiments provide lower bounds on the performance of RP when integrated with JSM and PRRTE. Specifically, for workloads comprised of homogeneous single-core, 15 minutes-long tasks we find that: PRRTE scales better than JSM for > O(1000) tasks; PRRTE overheads are negligible; and PRRTE supports optimizations that lower the impact of overheads and enable resource utilization of 63% when executing O(16K), 1-core tasks over 404 compute nodes. △ Less

Submitted 8 September, 2019; originally announced September 2019.

arXiv:1904.03085 [pdf, other]

RADICAL-Cybertools: Middleware Building Blocks for Scalable Science

Authors: Vivek Balasubramanian, Shantenu Jha, Andre Merzky, Matteo Turilli

Abstract: RADICAL-Cybertools (RCT) are a set of software systems that serve as middleware to develop efficient and effective tools for scientific computing. Specifically, RCT enable executing many-task applications at extreme scale and on a variety of computing infrastructures. RCT are building blocks, designed to work as stand-alone systems, integrated among themselves or integrated with third-party system… ▽ More RADICAL-Cybertools (RCT) are a set of software systems that serve as middleware to develop efficient and effective tools for scientific computing. Specifically, RCT enable executing many-task applications at extreme scale and on a variety of computing infrastructures. RCT are building blocks, designed to work as stand-alone systems, integrated among themselves or integrated with third-party systems. RCT enables innovative science in multiple domains, including but not limited to biophysics, climate science and particle physics, consuming hundreds of millions of core hours. This paper provides an overview of RCT systems, their impact, and the architectural principles and software engineering underlying RCT △ Less

Submitted 5 April, 2019; originally announced April 2019.

arXiv:1903.10057 [pdf, other]

doi 10.1109/MCSE.2019.2920048

Middleware Building Blocks for Workflow Systems

Authors: Matteo Turilli, Vivek Balasubramanian, Andre Merzky, Ioannis Paraskevakos, Shantenu Jha

Abstract: This paper describes a building blocks approach to the design of scientific workflow systems. We discuss RADICAL-Cybertools as one implementation of the building blocks concept, showing how they are designed and developed in accordance with this approach. This paper offers three main contributions: (i) showing the relevance of the design principles underlying the building blocks approach to suppor… ▽ More This paper describes a building blocks approach to the design of scientific workflow systems. We discuss RADICAL-Cybertools as one implementation of the building blocks concept, showing how they are designed and developed in accordance with this approach. This paper offers three main contributions: (i) showing the relevance of the design principles underlying the building blocks approach to support scientific workflows on high performance computing platforms; (ii) illustrating a set of building blocks that enable multiple points of integration, "unifying" conceptual reasoning across otherwise very different tools and systems; and (iii) case studies discussing how RADICAL-Cybertools are integrated with existing workflow, workload, and general purpose computing systems and used to develop domain-specific workflow systems. △ Less

Submitted 27 June, 2019; v1 submitted 24 March, 2019; originally announced March 2019.

arXiv:1808.00684 [pdf, other]

doi 10.1016/j.jocs.2018.06.012

Synapse: Synthetic Application Profiler and Emulator

Authors: Andre Merzky, Ming Tai Ha, Matteo Turilli, Shantenu Jha

Abstract: Motivated by the need to emulate workload execution characteristics on high-performance and distributed heterogeneous resources, we introduce Synapse. Synapse is used as a proxy application (or "representative application") for real workloads, with the advantage that it can be tuned in different ways and dimensions, and also at levels of granularity that are not possible with real applications. Sy… ▽ More Motivated by the need to emulate workload execution characteristics on high-performance and distributed heterogeneous resources, we introduce Synapse. Synapse is used as a proxy application (or "representative application") for real workloads, with the advantage that it can be tuned in different ways and dimensions, and also at levels of granularity that are not possible with real applications. Synapse has a platform-independent application profiler, and has the ability to emulate profiled workloads on a variety of resources. Experiments show that the automated profiling performed using Synapse captures an application's characteristics with high fidelity. The emulation of an application using Synapse can reproduce the application's execution behavior in the original run-time environment, and can also reproduce those behaviors on different run-time environments. △ Less

Submitted 2 August, 2018; originally announced August 2018.

Comments: Large portions of this work originally appeared as arXiv:1506.00272, which was subsequently published as a workshop paper. This is an extended version published in the "Journal of Computational Science"

Report number: 01

Journal ref: Journal of Computational Science, 27C (2018) pp. 329-344

arXiv:1801.02651 [pdf, other]

Towards General Distributed Resource Selection

Authors: Ming Tai Ha, Matteo Turilli, Andre Merzky, Shantenu Jha

Abstract: The advantages of distributing workloads and utilizing multiple distributed resources are now well established. The type and degree of heterogeneity of distributed resources is increasing, and thus determining how to distribute the workloads becomes increasingly difficult, in particular with respect to the selection of suitable resources. We formulate and investigate the resource selection problem… ▽ More The advantages of distributing workloads and utilizing multiple distributed resources are now well established. The type and degree of heterogeneity of distributed resources is increasing, and thus determining how to distribute the workloads becomes increasingly difficult, in particular with respect to the selection of suitable resources. We formulate and investigate the resource selection problem in a way that it is agnostic of specific task and resource properties, and which is generalizable to range of metrics. Specifically, we developed a model to describe the requirements of tasks and to estimate the cost of running that task on an arbitrary resource using baseline measurements from a reference machine. We integrated our cost model with the Condor matchmaking algorithm to enable resource selection. Experimental validation of our model shows that it provides execution time estimates with 157-171% error on XSEDE resources and 18-31% on OSG resources. We use the task execution cost model to select resources for a bag-of-tasks of up to 1024 GROMACS MD simulations across the target resources. Experiments show that using the model's estimates reduces the workload's time-to-completion up to ~85% when compared to the random distribution of workload across the same resources. △ Less

Submitted 8 January, 2018; originally announced January 2018.

arXiv:1801.01843 [pdf, other]

Design and Performance Characterization of RADICAL-Pilot on Titan

Authors: Andre Merzky, Matteo Turilli, Manuel Maldonado, Shantenu Jha

Abstract: Many extreme scale scientific applications have workloads comprised of a large number of individual high-performance tasks. The Pilot abstraction decouples workload specification, resource management, and task execution via job placeholders and late-binding. As such, suitable implementations of the Pilot abstraction can support the collective execution of large number of tasks on supercomputers. W… ▽ More Many extreme scale scientific applications have workloads comprised of a large number of individual high-performance tasks. The Pilot abstraction decouples workload specification, resource management, and task execution via job placeholders and late-binding. As such, suitable implementations of the Pilot abstraction can support the collective execution of large number of tasks on supercomputers. We introduce RADICAL-Pilot (RP) as a portable, modular and extensible Python-based Pilot system. We describe RP's design, architecture and implementation. We characterize its performance and show its ability to scalably execute workloads comprised of thousands of MPI tasks on Titan--a DOE leadership class facility. Specifically, we investigate RP's weak (strong) scaling properties up to 131K (65K) cores and 4096 (16384) 32 core tasks. RADICAL-Pilot can be used stand-alone, as well as integrated with other tools as a runtime system. △ Less

Submitted 5 January, 2018; originally announced January 2018.

arXiv:1609.03484 [pdf, other]

Designing Workflow Systems Using Building Blocks

Authors: Matteo Turilli, Andre Merzky, Vivek Balasubramanian, Manuel Maldonado, Shantenu Jha

Abstract: We suggest there is a need for a fresh perspective on the design and development of workflow systems and argue for a building blocks approach. We outline a description of this approach and define the properties of software building blocks. We discuss RADICAL-Cybertools as one implementation of the building blocks concept, showing how they have been designed and developed in accordance with this ap… ▽ More We suggest there is a need for a fresh perspective on the design and development of workflow systems and argue for a building blocks approach. We outline a description of this approach and define the properties of software building blocks. We discuss RADICAL-Cybertools as one implementation of the building blocks concept, showing how they have been designed and developed in accordance with this approach. Four case studies are presented, covering a dozen science problems. We discuss how RADICAL-Cybertools have been used to develop new workflow systems capabilities and integrated to enhance existing ones, illustrating the applicability and potential of software building blocks. In doing so, we have begun an investigation of an alternative approach to thinking about the design and implementation of workflow systems. △ Less

Submitted 8 April, 2019; v1 submitted 12 September, 2016; originally announced September 2016.

arXiv:1605.09513 [pdf, other]

doi 10.1109/eScience.2017.41

Evaluating Distributed Execution of Workloads

Authors: Matteo Turilli, Yadu Nand Babuji, Andre Merzky, Ming Tai Ha, Michael Wilde, Daniel S. Katz, Shantenu Jha

Abstract: Resource selection and task placement for distributed execution poses conceptual and implementation difficulties. Although resource selection and task placement are at the core of many tools and workflow systems, the methods are ad hoc rather than being based on models. Consequently, partial and non-interoperable implementations proliferate. We address both the conceptual and implementation diffic… ▽ More Resource selection and task placement for distributed execution poses conceptual and implementation difficulties. Although resource selection and task placement are at the core of many tools and workflow systems, the methods are ad hoc rather than being based on models. Consequently, partial and non-interoperable implementations proliferate. We address both the conceptual and implementation difficulties by experimentally characterizing diverse modalities of resource selection and task placement. We compare the architectures and capabilities of two systems: the AIMES middleware and Swift workflow scripting language and runtime. We integrate these systems to enable the distributed execution of Swift workflows on Pilot-Jobs managed by the AIMES middleware. Our experiments characterize and compare alternative execution strategies by measuring the time to completion of heterogeneous uncoupled workloads executed at diverse scale and on multiple resources. We measure the adverse effects of pilot fragmentation and early binding of tasks to resources and the benefits of backfill scheduling across pilots on multiple resources. We then use this insight to execute a multi-stage workflow across five production-grade resources. We discuss the importance and implications for other tools and workflow systems. △ Less

Submitted 2 November, 2021; v1 submitted 31 May, 2016; originally announced May 2016.

arXiv:1601.05439 [pdf, other]

RepEx: A Flexible Framework for Scalable Replica Exchange Molecular Dynamics Simulations

Authors: Antons Treikalis, Andre Merzky, Haoyuan Chen, Tai-Sung Lee, Darrin M. York, Shantenu Jha

Abstract: Replica Exchange (RE) simulations have emerged as an important algorithmic tool for the molecular sciences. RE simulations involve the concurrent execution of independent simulations which infrequently interact and exchange information. The next set of simulation parameters are based upon the outcome of the exchanges. Typically RE functionality is integrated into the molecular simulation softwar… ▽ More Replica Exchange (RE) simulations have emerged as an important algorithmic tool for the molecular sciences. RE simulations involve the concurrent execution of independent simulations which infrequently interact and exchange information. The next set of simulation parameters are based upon the outcome of the exchanges. Typically RE functionality is integrated into the molecular simulation software package. A primary motivation of the tight integration of RE functionality with simulation codes has been performance. This is limiting at multiple levels. First, advances in the RE methodology are tied to the molecular simulation code. Consequently these advances remain confined to the molecular simulation code for which they were developed. Second, it is difficult to extend or experiment with novel RE algorithms, since expertise in the molecular simulation code is typically required. In this paper, we propose the RepEx framework which address these aforementioned shortcomings of existing approaches, while striking the balance between flexibility (any RE scheme) and scalability (tens of thousands of replicas) over a diverse range of platforms. RepEx is designed to use a pilot-job based runtime system and support diverse RE Patterns and Execution Modes. RE Patterns are concerned with synchronization mechanisms in RE simulation, and Execution Modes with spatial and temporal mapping of workload to the CPU cores. We discuss how the design and implementation yield the following primary contributions of the RepEx framework: (i) its ability to support different RE schemes independent of molecular simulation codes, (ii) provide the ability to execute different exchange schemes and replica counts independent of the specific availability of resources, (iii) provide a runtime system that has first-class support for task-level parallelism, and (iv) required scalability along multiple dimensions. △ Less

Submitted 20 January, 2016; originally announced January 2016.

Comments: 12 pages, 13 figures

arXiv:1512.08194 [pdf, other]

Using Pilot Systems to Execute Many Task Workloads on Supercomputers

Authors: Andre Merzky, Matteo Turilli, Manuel Maldonado, Mark Santcroos, Shantenu Jha

Abstract: High performance computing systems have historically been designed to support applications comprised of mostly monolithic, single-job workloads. Pilot systems decouple workload specification, resource selection, and task execution via job placeholders and late-binding. Pilot systems help to satisfy the resource requirements of workloads comprised of multiple tasks. RADICAL-Pilot (RP) is a modular… ▽ More High performance computing systems have historically been designed to support applications comprised of mostly monolithic, single-job workloads. Pilot systems decouple workload specification, resource selection, and task execution via job placeholders and late-binding. Pilot systems help to satisfy the resource requirements of workloads comprised of multiple tasks. RADICAL-Pilot (RP) is a modular and extensible Python-based pilot system. In this paper we describe RP's design, architecture and implementation, and characterize its performance. RP is capable of spawning more than 100 tasks/second and supports the steady-state execution of up to 16K concurrent tasks. RP can be used stand-alone, as well as integrated with other application-level tools as a runtime system. △ Less

Submitted 30 July, 2018; v1 submitted 27 December, 2015; originally announced December 2015.

arXiv:1506.00272 [pdf, other]

Synapse: Synthetic Application Profiler and Emulator

Authors: Andre Merzky, Shantenu Jha

Abstract: We introduce Synapse motivated by the needs to estimate and emulate workload execution characteristics on high-performance and distributed heterogeneous resources. Synapse has a platform independent application profiler, and the ability to emulate profiled workloads on a variety of heterogeneous resources. Synapse is used as a proxy application (or "representative application") for real workloads,… ▽ More We introduce Synapse motivated by the needs to estimate and emulate workload execution characteristics on high-performance and distributed heterogeneous resources. Synapse has a platform independent application profiler, and the ability to emulate profiled workloads on a variety of heterogeneous resources. Synapse is used as a proxy application (or "representative application") for real workloads, with the added advantage that it can be tuned at arbitrary levels of granularity in ways that are simply not possible using real applications. Experiments show that automated profiling using Synapse represents application characteristics with high fidelity. Emulation using Synapse can reproduce the application behavior in the original runtime environment, as well as reproducing properties when used in a different run-time environments. △ Less

Submitted 15 February, 2016; v1 submitted 31 May, 2015; originally announced June 2015.

Journal ref: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, Chicago, IL, USA, May 23-27, 2016

arXiv:1504.04720 [pdf, other]

Integrating Abstractions to Enhance the Execution of Distributed Applications

Authors: Matteo Turilli, Feng Liu, Zhao Zhang, Andre Merzky, Michael Wilde, Jon Weissman, Daniel S. Katz, Shantenu Jha

Abstract: One of the factors that limits the scale, performance, and sophistication of distributed applications is the difficulty of concurrently executing them on multiple distributed computing resources. In part, this is due to a poor understanding of the general properties and performance of the coupling between applications and dynamic resources. This paper addresses this issue by integrating abstractio… ▽ More One of the factors that limits the scale, performance, and sophistication of distributed applications is the difficulty of concurrently executing them on multiple distributed computing resources. In part, this is due to a poor understanding of the general properties and performance of the coupling between applications and dynamic resources. This paper addresses this issue by integrating abstractions representing distributed applications, resources, and execution processes into a pilot-based middleware. The middleware provides a platform that can specify distributed applications, execute them on multiple resource and for different configurations, and is instrumented to support investigative analysis. We analyzed the execution of distributed applications using experiments that measure the benefits of using multiple resources, the late-binding of scheduling decisions, and the use of backfill scheduling. △ Less

Submitted 18 February, 2016; v1 submitted 18 April, 2015; originally announced April 2015.

arXiv:1210.3271 [pdf]

Grid Computing: The Next Decade -- Report and Summary

Authors: Jarek Nabrzyski, Krzysztof Kurowski, Daniel S. Katz, Andre Merzky

Abstract: The evolution of the global scientific cyberinfrastructure (CI) has, over the last 10+ years, led to a large diversity of CI instances. While specialized, competing and alternative CI building blocks are inherent to a healthy ecosystem, it also becomes apparent that the increasing degree of fragmentation is hindering interoperation, and thus limiting collaboration, which is essential for modern sc… ▽ More The evolution of the global scientific cyberinfrastructure (CI) has, over the last 10+ years, led to a large diversity of CI instances. While specialized, competing and alternative CI building blocks are inherent to a healthy ecosystem, it also becomes apparent that the increasing degree of fragmentation is hindering interoperation, and thus limiting collaboration, which is essential for modern science communities often spanning international groups and multiple disciplines (but even 'small sciences', with smaller and localized communities, are often embedded into the larger scientific ecosystem, and are increasingly dependent on the availability of CI.) There are different reasons why fragmentation occurs, on technical and social level. But also, it is apparent that the current funding model for creating CI components largely fails to aid the transition from research to production, by mixing CS research and IT engineering challenges into the same funding strategies. The 10th anniversary of the EU funded project 'Grid Lab' (which was an early and ambitious attempt on providing a consolidated and science oriented cyberinfrastructure software stack to a specific science community) was taken as an opportunity to invite international leaders and early stage researchers in grid computing and e-Science from Europe, America and Asia, and, together with representatives of the EU and US funding agencies, to discuss the fundamental aspects of CI evolution, and to contemplate the options for a more coherent, more coordinated approach to the global evolution of CI. This open document represents the results of that workshop - including a draft of a mission statement and a proposal for a blueprint process - to inform the wider community as well as to encourage external experts to provide their feedback and comments. △ Less

Submitted 11 October, 2012; originally announced October 2012.

Comments: 17 pages, 1 figure

arXiv:1207.6644 [pdf, other]

P*: A Model of Pilot-Abstractions

Authors: Andre Luckow, Mark Santcroos, Ole Weidner, Andre Merzky, Pradeep Mantha, Shantenu Jha

Abstract: Pilot-Jobs support effective distributed resource utilization, and are arguably one of the most widely-used distributed computing abstractions - as measured by the number and types of applications that use them, as well as the number of production distributed cyberinfrastructures that support them. In spite of broad uptake, there does not exist a well-defined, unifying conceptual model of Pilot-Jo… ▽ More Pilot-Jobs support effective distributed resource utilization, and are arguably one of the most widely-used distributed computing abstractions - as measured by the number and types of applications that use them, as well as the number of production distributed cyberinfrastructures that support them. In spite of broad uptake, there does not exist a well-defined, unifying conceptual model of Pilot-Jobs which can be used to define, compare and contrast different implementations. Often Pilot-Job implementations are strongly coupled to the distributed cyber-infrastructure they were originally designed for. These factors present a barrier to extensibility and interoperability. This pa- per is an attempt to (i) provide a minimal but complete model (P*) of Pilot-Jobs, (ii) establish the generality of the P* Model by mapping various existing and well known Pilot-Job frameworks such as Condor and DIANE to P*, (iii) derive an interoperable and extensible API for the P* Model (Pilot-API), (iv) validate the implementation of the Pilot-API by concurrently using multiple distinct Pilot-Job frameworks on distinct production distributed cyberinfrastructures, and (v) apply the P* Model to Pilot-Data. △ Less

Submitted 27 July, 2012; originally announced July 2012.

Comments: 10 pages

Showing 1–29 of 29 results for author: Merzky, A