-
Accelerating Time-to-Science by Streaming Detector Data Directly into Perlmutter Compute Nodes
Authors:
Samuel S. Welborn,
Bjoern Enders,
Chris Harris,
Peter Ercius,
Deborah J. Bard
Abstract:
Recent advancements in detector technology have significantly increased the size and complexity of experimental data, and high-performance computing (HPC) provides a path towards more efficient and timely data processing. However, movement of large data sets from acquisition systems to HPC centers introduces bottlenecks owing to storage I/O at both ends. This manuscript introduces a streaming work…
▽ More
Recent advancements in detector technology have significantly increased the size and complexity of experimental data, and high-performance computing (HPC) provides a path towards more efficient and timely data processing. However, movement of large data sets from acquisition systems to HPC centers introduces bottlenecks owing to storage I/O at both ends. This manuscript introduces a streaming workflow designed for an high data rate electron detector that streams data directly to compute node memory at the National Energy Research Scientific Computing Center (NERSC), thereby avoiding storage I/O. The new workflow deploys ZeroMQ-based services for data production, aggregation, and distribution for on-the-fly processing, all coordinated through a distributed key-value store. The system is integrated with the detector's science gateway and utilizes the NERSC Superfacility API to initiate streaming jobs through a web-based frontend. Our approach achieves up to a 14-fold increase in data throughput and enhances predictability and reliability compared to a I/O-heavy file-based transfer workflow. Our work highlights the transformative potential of streaming workflows to expedite data analysis for time-sensitive experiments.
△ Less
Submitted 2 July, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Workflows Community Summit 2022: A Roadmap Revolution
Authors:
Rafael Ferreira da Silva,
Rosa M. Badia,
Venkat Bala,
Debbie Bard,
Peer-Timo Bremer,
Ian Buckley,
Silvina Caino-Lores,
Kyle Chard,
Carole Goble,
Shantenu Jha,
Daniel S. Katz,
Daniel Laney,
Manish Parashar,
Frederic Suter,
Nick Tyler,
Thomas Uram,
Ilkay Altintas,
Stefan Andersson,
William Arndt,
Juan Aznar,
Jonathan Bader,
Bartosz Balis,
Chris Blanton,
Kelly Rosa Braghetto,
Aharon Brodutch
, et al. (80 additional authors not shown)
Abstract:
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and t…
▽ More
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and the evolving needs of emerging scientific applications, it is paramount that the development of novel scientific workflows and system functionalities seek to increase the efficiency, resilience, and pervasiveness of existing systems and applications. Specifically, the proliferation of machine learning/artificial intelligence (ML/AI) workflows, need for processing large scale datasets produced by instruments at the edge, intensification of near real-time data processing, support for long-term experiment campaigns, and emergence of quantum computing as an adjunct to HPC, have significantly changed the functional and operational requirements of workflow systems. Workflow systems now need to, for example, support data streams from the edge-to-cloud-to-HPC enable the management of many small-sized files, allow data reduction while ensuring high accuracy, orchestrate distributed services (workflows, instruments, data movement, provenance, publication, etc.) across computing and user facilities, among others. Further, to accelerate science, it is also necessary that these systems implement specifications/standards and APIs for seamless (horizontal and vertical) integration between systems and applications, as well as enabling the publication of workflows and their associated products according to the FAIR principles. This document reports on discussions and findings from the 2022 international edition of the Workflows Community Summit that took place on November 29 and 30, 2022.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
The LBNL Superfacility Project Report
Authors:
Deborah Bard,
Cory Snavely,
Lisa Gerhardt,
Jason Lee,
Becci Totzke,
Katie Antypas,
William Arndt,
Johannes Blaschke,
Suren Byna,
Ravi Cheema,
Shreyas Cholia,
Mark Day,
Bjoern Enders,
Aditi Gaur,
Annette Greiner,
Taylor Groves,
Mariam Kiran,
Quincey Koziol,
Tom Lehman,
Kelly Rowland,
Chris Samuel,
Ashwin Selvarajan,
Alex Sim,
David Skinner,
Laurie Stephey
, et al. (2 additional authors not shown)
Abstract:
The Superfacility model is designed to leverage HPC for experimental science. It is more than simply a model of connected experiment, network, and HPC facilities; it encompasses the full ecosystem of infrastructure, software, tools, and expertise needed to make connected facilities easy to use. The three-year Lawrence Berkeley National Laboratory (LBNL) Superfacility project was initiated in 2019…
▽ More
The Superfacility model is designed to leverage HPC for experimental science. It is more than simply a model of connected experiment, network, and HPC facilities; it encompasses the full ecosystem of infrastructure, software, tools, and expertise needed to make connected facilities easy to use. The three-year Lawrence Berkeley National Laboratory (LBNL) Superfacility project was initiated in 2019 to coordinate work being performed at LBNL to support this model, and to provide a coherent and comprehensive set of science requirements to drive existing and new work.
A key component of the project was the in-depth engagements with eight science teams that represent challenging use cases across the DOE Office of Science. By the close of the project, we met our project goal by enabling our science application engagements to demonstrate automated pipelines that analyze data from remote facilities at large scale, without routine human intervention. In several cases, we have gone beyond demonstrations and now provide production-level services. To achieve this goal, the Superfacility team developed tools, infrastructure, and policies for near-real-time computing support, dynamic high-performance networking, data management and movement tools, API-driven automation, HPC-scale notebooks via Jupyter, authentication using Federated Identity and container-based edge services supported.
The lessons we learned during this project provide a valuable model for future large, complex, cross-disciplinary collaborations. There is a pressing need for a coherent computing infrastructure across national facilities, and LBNL's Superfacility project is a unique model for success in tackling the challenges that will be faced in hardware, software, policies, and services across multiple science domains.
△ Less
Submitted 27 June, 2022; v1 submitted 23 June, 2022;
originally announced June 2022.
-
Accelerating X-Ray Tracing for Exascale Systems using Kokkos
Authors:
Felix Wittwer,
Nicholas K. Sauter,
Derek Mendez,
Billy K. Poon,
Aaron S. Brewster,
James M. Holton,
Michael E. Wall,
William E. Hart,
Deborah J. Bard,
Johannes P. Blaschke
Abstract:
The upcoming exascale computing systems Frontier and Aurora will draw much of their computing power from GPU accelerators. The hardware for these systems will be provided by AMD and Intel, respectively, each supporting their own GPU programming model. The challenge for applications that harness one of these exascale systems will be to avoid lock-in and to preserve performance portability.
We rep…
▽ More
The upcoming exascale computing systems Frontier and Aurora will draw much of their computing power from GPU accelerators. The hardware for these systems will be provided by AMD and Intel, respectively, each supporting their own GPU programming model. The challenge for applications that harness one of these exascale systems will be to avoid lock-in and to preserve performance portability.
We report here on our results of using Kokkos to accelerate a real-world application on NERSC's Perlmutter Phase 1 (using NVIDIA A100 accelerators) and the testbed system for OLCF's Frontier (using AMD MI250X). By porting to Kokkos, we were able to successfully run the same X-ray tracing code on both systems and achieved speed-ups between 13% and 66% compared to the original CUDA code. These results are a highly encouraging demonstration of using Kokkos to accelerate production science code.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
Real-Time XFEL Data Analysis at SLAC and NERSC: a Trial Run of Nascent Exascale Experimental Data Analysis
Authors:
Johannes P. Blaschke,
Aaron S. Brewster,
Daniel W. Paley,
Derek Mendez,
Asmit Bhowmick,
Nicholas K. Sauter,
Wilko Kröger,
Murali Shankar,
Bjoern Enders,
Deborah Bard
Abstract:
X-ray scattering experiments using Free Electron Lasers (XFELs) are a powerful tool to determine the molecular structure and function of unknown samples (such as COVID-19 viral proteins). XFEL experiments are a challenge to computing in two ways: i) due to the high cost of running XFELs, a fast turnaround time from data acquisition to data analysis is essential to make informed decisions on experi…
▽ More
X-ray scattering experiments using Free Electron Lasers (XFELs) are a powerful tool to determine the molecular structure and function of unknown samples (such as COVID-19 viral proteins). XFEL experiments are a challenge to computing in two ways: i) due to the high cost of running XFELs, a fast turnaround time from data acquisition to data analysis is essential to make informed decisions on experimental protocols; ii) data collection rates are growing exponentially, requiring new scalable algorithms. Here we report our experiences analyzing data from two experiments at the Linac Coherent Light Source (LCLS) during September 2020. Raw data were analyzed on NERSC's Cori XC40 system, using the Superfacility paradigm: our workflow automatically moves raw data between LCLS and NERSC, where it is analyzed using the software package CCTBX. We achieved real time data analysis with a turnaround time from data acquisition to full molecular reconstruction in as little as 10 min -- sufficient time for the experiment's operators to make informed decisions. By hosting the data analysis on Cori, and by automating LCLS-NERSC interoperability, we achieved a data analysis rate which matches the data acquisition rate. Completing data analysis with 10 mins is a first for XFEL experiments and an important milestone if we are to keep up with data collection trends.
△ Less
Submitted 31 December, 2023; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Quantum Advantage on Proof of Work
Authors:
Dan A. Bard,
Joseph J. Kearney,
Carlos A. Perez-Delgado
Abstract:
Proof-of-Work (PoW) is a fundamental underlying technology behind most major blockchain cryptocurrencies. It has been previously pointed out that quantum devices provide a computational advantage in performing PoW in the context of Bitcoin. Here we make the case that this quantum advantage extends not only to all existing PoW mechanisms, but to any possible PoW as well. This has strong consequence…
▽ More
Proof-of-Work (PoW) is a fundamental underlying technology behind most major blockchain cryptocurrencies. It has been previously pointed out that quantum devices provide a computational advantage in performing PoW in the context of Bitcoin. Here we make the case that this quantum advantage extends not only to all existing PoW mechanisms, but to any possible PoW as well. This has strong consequences regarding both quantum-based attacks on the integrity of the entirety of the blockchain, as well as more legitimate uses of quantum computation for the purpose of mining Bitcoin and other cryptocurrencies. For the first case, we estimate when these quantum attacks will become feasible, for various cryptocurrencies, and discuss the impact of such attacks. For the latter, we derive a precise formula to calculate the economic incentive for switching to quantum-based cryptocurrency miners. Using this formula, we analyze several test scenarios, and conclude that investing in quantum hardware for cryptocurrency mining has the potential to pay off immensely.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
CosmoFlow: Using Deep Learning to Learn the Universe at Scale
Authors:
Amrita Mathuriya,
Deborah Bard,
Peter Mendygral,
Lawrence Meadows,
James Arnemann,
Lei Shao,
Siyu He,
Tuomas Karna,
Daina Moise,
Simon J. Pennycook,
Kristyn Maschoff,
Jason Sewall,
Nalini Kumar,
Shirley Ho,
Mike Ringenburg,
Prabhat,
Victor Lee
Abstract:
Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading for many el…
▽ More
Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading for many element-wise operations, to improve training performance on Intel(C) Xeon Phi(TM) processors. We also utilize the Cray PE Machine Learning Plugin for efficient scaling to multiple nodes. We demonstrate fully synchronous data-parallel training on 8192 nodes of Cori with 77% parallel efficiency, achieving 3.5 Pflop/s sustained performance. To our knowledge, this is the first large-scale science application of the TensorFlow framework at supercomputer scale with fully-synchronous training. These enhancements enable us to process large 3D dark matter distribution and predict the cosmological parameters $Ω_M$, $σ_8$ and n$_s$ with unprecedented accuracy.
△ Less
Submitted 9 November, 2018; v1 submitted 14 August, 2018;
originally announced August 2018.
-
Scaling GRPC Tensorflow on 512 nodes of Cori Supercomputer
Authors:
Amrita Mathuriya,
Thorsten Kurth,
Vivek Rane,
Mustafa Mustafa,
Lei Shao,
Debbie Bard,
Prabhat,
Victor W Lee
Abstract:
We explore scaling of the standard distributed Tensorflow with GRPC primitives on up to 512 Intel Xeon Phi (KNL) nodes of Cori supercomputer with synchronous stochastic gradient descent (SGD), and identify causes of scaling inefficiency at higher node counts. To our knowledge, this is the first exploration of distributed GRPC Tensorflow scalability on a HPC supercomputer at such large scale with s…
▽ More
We explore scaling of the standard distributed Tensorflow with GRPC primitives on up to 512 Intel Xeon Phi (KNL) nodes of Cori supercomputer with synchronous stochastic gradient descent (SGD), and identify causes of scaling inefficiency at higher node counts. To our knowledge, this is the first exploration of distributed GRPC Tensorflow scalability on a HPC supercomputer at such large scale with synchronous SGD. We studied scaling of two convolution neural networks - ResNet-50, a state-of-the-art deep network for classification with roughly 25.5 million parameters, and HEP-CNN, a shallow topology with less than 1 million parameters for common scientific usages. For ResNet-50, we achieve >80% scaling efficiency on up to 128 workers, using 32 parameter servers (PS tasks) with a steep decline down to 23% for 512 workers using 64 PS tasks. Our analysis of the efficiency drop points to low network bandwidth utilization due to combined effect of three factors. (a) Heterogeneous distributed parallelization algorithm which uses PS tasks as centralized servers for gradient averaging is suboptimal for utilizing interconnect bandwidth. (b) Load imbalance among PS tasks hinders their efficient scaling. (c) Underlying communication primitive GRPC is currently inefficient on Cori high-speed interconnect. The HEP-CNN demands less interconnect bandwidth, and shows >80% weak scaling efficiency for up to 256 nodes with only 1 PS task. Our findings are applicable to other deep learning networks. Big networks with millions of parameters stumble upon the issues discussed here. Shallower networks like HEP-CNN with relatively lower number of parameters can efficiently enjoy weak scaling even with a single parameter server.
△ Less
Submitted 26 December, 2017;
originally announced December 2017.
-
Galactos: Computing the Anisotropic 3-Point Correlation Function for 2 Billion Galaxies
Authors:
Brian Friesen,
Md. Mostofa Ali Patwary,
Brian Austin,
Nadathur Satish,
Zachary Slepian,
Narayanan Sundaram,
Deborah Bard,
Daniel J Eisenstein,
Jack Deslippe,
Pradeep Dubey,
Prabhat
Abstract:
The nature of dark energy and the complete theory of gravity are two central questions currently facing cosmology. A vital tool for addressing them is the 3-point correlation function (3PCF), which probes deviations from a spatially random distribution of galaxies. However, the 3PCF's formidable computational expense has prevented its application to astronomical surveys comprising millions to bill…
▽ More
The nature of dark energy and the complete theory of gravity are two central questions currently facing cosmology. A vital tool for addressing them is the 3-point correlation function (3PCF), which probes deviations from a spatially random distribution of galaxies. However, the 3PCF's formidable computational expense has prevented its application to astronomical surveys comprising millions to billions of galaxies. We present Galactos, a high-performance implementation of a novel, O(N^2) algorithm that uses a load-balanced k-d tree and spherical harmonic expansions to compute the anisotropic 3PCF. Our implementation is optimized for the Intel Xeon Phi architecture, exploiting SIMD parallelism, instruction and thread concurrency, and significant L1 and L2 cache reuse, reaching 39% of peak performance on a single node. Galactos scales to the full Cori system, achieving 9.8PF (peak) and 5.06PF (sustained) across 9636 nodes, making the 3PCF easily computable for all galaxies in the observable universe.
△ Less
Submitted 31 August, 2017;
originally announced September 2017.
-
CosmoGAN: creating high-fidelity weak lensing convergence maps using Generative Adversarial Networks
Authors:
Mustafa Mustafa,
Deborah Bard,
Wahid Bhimji,
Zarija Lukić,
Rami Al-Rfou,
Jan M. Kratochvil
Abstract:
Inferring model parameters from experimental data is a grand challenge in many sciences, including cosmology. This often relies critically on high fidelity numerical simulations, which are prohibitively computationally expensive. The application of deep learning techniques to generative modeling is renewing interest in using high dimensional density estimators as computationally inexpensive emulat…
▽ More
Inferring model parameters from experimental data is a grand challenge in many sciences, including cosmology. This often relies critically on high fidelity numerical simulations, which are prohibitively computationally expensive. The application of deep learning techniques to generative modeling is renewing interest in using high dimensional density estimators as computationally inexpensive emulators of fully-fledged simulations. These generative models have the potential to make a dramatic shift in the field of scientific simulations, but for that shift to happen we need to study the performance of such generators in the precision regime needed for science applications. To this end, in this work we apply Generative Adversarial Networks to the problem of generating weak lensing convergence maps. We show that our generator network produces maps that are described by, with high statistical confidence, the same summary statistics as the fully simulated maps.
△ Less
Submitted 22 May, 2019; v1 submitted 7 June, 2017;
originally announced June 2017.