Search | arXiv e-print repository

Understanding Data Movement in Tightly Coupled Heterogeneous Systems: A Case Study with the Grace Hopper Superchip

Authors: Luigi Fusco, Mikhail Khalilov, Marcin Chrapek, Giridhar Chukkapalli, Thomas Schulthess, Torsten Hoefler

Abstract: Heterogeneous supercomputers have become the standard in HPC. GPUs in particular have dominated the accelerator landscape, offering unprecedented performance in parallel workloads and unlocking new possibilities in fields like AI and climate modeling. With many workloads becoming memory-bound, improving the communication latency and bandwidth within the system has become a main driver in the devel… ▽ More Heterogeneous supercomputers have become the standard in HPC. GPUs in particular have dominated the accelerator landscape, offering unprecedented performance in parallel workloads and unlocking new possibilities in fields like AI and climate modeling. With many workloads becoming memory-bound, improving the communication latency and bandwidth within the system has become a main driver in the development of new architectures. The Grace Hopper Superchip (GH200) is a significant step in the direction of tightly coupled heterogeneous systems, in which all CPUs and GPUs share a unified address space and support transparent fine grained access to all main memory on the system. We characterize both intra- and inter-node memory operations on the Quad GH200 nodes of the new Swiss National Supercomputing Centre Alps supercomputer, and show the importance of careful memory placement on example workloads, highlighting tradeoffs and opportunities. △ Less

Submitted 26 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

arXiv:2405.13043 [pdf]

Towards Specialized Supercomputers for Climate Sciences: Computational Requirements of the Icosahedral Nonhydrostatic Weather and Climate Model

Authors: Torsten Hoefler, Alexandru Calotoiu, Anurag Dipankar, Thomas Schulthess, Xavier Lapillonne, Oliver Fuhrer

Abstract: We discuss the computational challenges and requirements for high-resolution climate simulations using the Icosahedral Nonhydrostatic Weather and Climate Model (ICON). We define a detailed requirements model for ICON which emphasizes the need for specialized supercomputers to accurately predict climate change impacts and extreme weather events. Based on the requirements model, we outline computati… ▽ More We discuss the computational challenges and requirements for high-resolution climate simulations using the Icosahedral Nonhydrostatic Weather and Climate Model (ICON). We define a detailed requirements model for ICON which emphasizes the need for specialized supercomputers to accurately predict climate change impacts and extreme weather events. Based on the requirements model, we outline computational demands for km-scale simulations, and suggests machine learning techniques to enhance model accuracy and efficiency. Our findings aim to guide the design of future supercomputers for advanced climate science. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2401.04552 [pdf, other]

XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing

Authors: Torsten Hoefler, Marcin Copik, Pete Beckman, Andrew Jones, Ian Foster, Manish Parashar, Daniel Reed, Matthias Troyer, Thomas Schulthess, Dan Ernst, Jack Dongarra

Abstract: HPC and Cloud have evolved independently, specializing their innovations into performance or productivity. Acceleration as a Service (XaaS) is a recipe to empower both fields with a shared execution platform that provides transparent access to computing resources, regardless of the underlying cloud or HPC service provider. Bridging HPC and cloud advancements, XaaS presents a unified architecture b… ▽ More HPC and Cloud have evolved independently, specializing their innovations into performance or productivity. Acceleration as a Service (XaaS) is a recipe to empower both fields with a shared execution platform that provides transparent access to computing resources, regardless of the underlying cloud or HPC service provider. Bridging HPC and cloud advancements, XaaS presents a unified architecture built on performance-portable containers. Our converged model concentrates on low-overhead, high-performance communication and computing, targeting resource-intensive workloads from climate simulations to machine learning. XaaS lifts the restricted allocation model of Function-as-a-Service (FaaS), allowing users to benefit from the flexibility and efficient resource utilization of serverless while supporting long-running and performance-sensitive workloads from HPC. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2311.08322 [pdf, other]

GT4Py: High Performance Stencils for Weather and Climate Applications using Python

Authors: Enrique G. Paredes, Linus Groner, Stefano Ubbiali, Hannes Vogt, Alberto Madonna, Kean Mariotti, Felipe Cruz, Lucas Benedicic, Mauro Bianco, Joost VandeVondele, Thomas C. Schulthess

Abstract: All major weather and climate applications are currently developed using languages such as Fortran or C++. This is typical in the domain of high performance computing (HPC), where efficient execution is an important concern. Unfortunately, this approach leads to implementations that intermix optimizations for specific hardware architectures with the high-level numerical methods that are typical fo… ▽ More All major weather and climate applications are currently developed using languages such as Fortran or C++. This is typical in the domain of high performance computing (HPC), where efficient execution is an important concern. Unfortunately, this approach leads to implementations that intermix optimizations for specific hardware architectures with the high-level numerical methods that are typical for the domain. This leads to code that is verbose, difficult to extend and maintain, and difficult to port to different hardware architectures. Here, we propose a different strategy based on GT4Py (GridTools for Python). GT4Py is a Python framework to write weather and climate applications that includes a high-level embedded domain specific language (DSL) to write stencil computations. The toolchain integrated in GT4Py enables automatic code-generation,to obtain the performance of state-of-the-art C++ and CUDA implementations. The separation of concerns between the mathematical definitions and the actual implementations allows for performance portability of the computations on a wide range of computing architectures, while being embedded in Python allows easy access to the tools of the Python ecosystem to enhance the productivity of the scientists and facilitate integration in complex workflows. Here, the initial release of GT4Py is described, providing an overview of the current state of the framework and performance results showing how GT4Py can outperform pure Python implementations by orders of magnitude. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 12 pages

MSC Class: 68 ACM Class: I.6.5; I.6.5

arXiv:2309.09002 [pdf]

Earth Virtualization Engines -- A Technical Perspective

Authors: Torsten Hoefler, Bjorn Stevens, Andreas F. Prein, Johanna Baehr, Thomas Schulthess, Thomas F. Stocker, John Taylor, Daniel Klocke, Pekka Manninen, Piers M. Forster, Tobias Kölling, Nicolas Gruber, Hartwig Anzt, Claudia Frauen, Florian Ziemen, Milan Klöwer, Karthik Kashinath, Christoph Schär, Oliver Fuhrer, Bryan N. Lawrence

Abstract: Participants of the Berlin Summit on Earth Virtualization Engines (EVEs) discussed ideas and concepts to improve our ability to cope with climate change. EVEs aim to provide interactive and accessible climate simulations and data for a wide range of users. They combine high-resolution physics-based models with machine learning techniques to improve the fidelity, efficiency, and interpretability of… ▽ More Participants of the Berlin Summit on Earth Virtualization Engines (EVEs) discussed ideas and concepts to improve our ability to cope with climate change. EVEs aim to provide interactive and accessible climate simulations and data for a wide range of users. They combine high-resolution physics-based models with machine learning techniques to improve the fidelity, efficiency, and interpretability of climate projections. At their core, EVEs offer a federated data layer that enables simple and fast access to exabyte-sized climate data through simple interfaces. In this article, we summarize the technical challenges and opportunities for developing EVEs, and argue that they are essential for addressing the consequences of climate change. △ Less

Submitted 16 September, 2023; originally announced September 2023.

arXiv:2205.04148 [pdf, other]

Productive Performance Engineering for Weather and Climate Modeling with Python

Authors: Tal Ben-Nun, Linus Groner, Florian Deconinck, Tobias Wicky, Eddie Davis, Johann Dahm, Oliver D. Elbert, Rhea George, Jeremy McGibbon, Lukas Trümper, Elynn Wu, Oliver Fuhrer, Thomas Schulthess, Torsten Hoefler

Abstract: Earth system models are developed with a tight coupling to target hardware, often containing specialized code predicated on processor characteristics. This coupling stems from using imperative languages that hard-code computation schedules and layout. We present a detailed account of optimizing the Finite Volume Cubed-Sphere Dynamical Core (FV3), improving productivity and performance. By using a… ▽ More Earth system models are developed with a tight coupling to target hardware, often containing specialized code predicated on processor characteristics. This coupling stems from using imperative languages that hard-code computation schedules and layout. We present a detailed account of optimizing the Finite Volume Cubed-Sphere Dynamical Core (FV3), improving productivity and performance. By using a declarative Python-embedded stencil domain-specific language and data-centric optimization, we abstract hardware-specific details and define a semi-automated workflow for analyzing and optimizing weather and climate applications. The workflow utilizes both local and full-program optimization, as well as user-guided fine-tuning. To prune the infeasible global optimization space, we automatically utilize repeating code motifs via a novel transfer tuning approach. On the Piz Daint supercomputer, we scale to 2,400 GPUs, achieving speedups of up to 3.92x over the tuned production implementation at a fraction of the original code. △ Less

Submitted 25 August, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

arXiv:1902.03154 [pdf, other]

SimFS: A Simulation Data Virtualizing File System Interface

Authors: Salvatore Di Girolamo, Pirmin Schmid, Thomas Schulthess, Torsten Hoefler

Abstract: Nowadays simulations can produce petabytes of data to be stored in parallel filesystems or large-scale databases. This data is accessed over the course of decades often by thousands of analysts and scientists. However, storing these volumes of data for long periods of time is not cost effective and, in some cases, practically impossible. We propose to transparently virtualize the simulation data,… ▽ More Nowadays simulations can produce petabytes of data to be stored in parallel filesystems or large-scale databases. This data is accessed over the course of decades often by thousands of analysts and scientists. However, storing these volumes of data for long periods of time is not cost effective and, in some cases, practically impossible. We propose to transparently virtualize the simulation data, relaxing the storage requirements by not storing the full output and re-simulating the missing data on demand. We develop SimFS, a file system interface that exposes a virtualized view of the simulation output to the analysis applications and manages the re-simulations. SimFS monitors the access patterns of the analysis applications in order to (1) decide the data to keep stored for faster accesses and (2) to employ prefetching strategies to reduce the access time of missing data. Virtualizing simulation data allows us to trade storage for computation: this paradigm becomes similar to traditional on-disk analysis (all data is stored) or in situ (no data is stored) according with the storage resources that are assigned to SimFS. Overall, by exploiting the growing computing power and relaxing the storage capacity requirements, SimFS offers a viable path towards exa-scale simulations. △ Less

Submitted 24 January, 2019; originally announced February 2019.

arXiv:1408.2657 [pdf, other]

First Experiences With Validating and Using the Cray Power Management Database Tool

Authors: Gilles Fourestey, Ben Cumming, Ladina Gilly, Thomas C. Schulthess

Abstract: In October 2013 CSCS installed the first hybrid Cray XC-30 system, dubbed Piz Daint. This system features the power management database (PMDB), that was recently introduced by Cray to collect detailed power consumption information in a non-intrusive manner. Power measurements are taken on each node, with additional measurements for the Aries network and blowers, and recorded in a database. This en… ▽ More In October 2013 CSCS installed the first hybrid Cray XC-30 system, dubbed Piz Daint. This system features the power management database (PMDB), that was recently introduced by Cray to collect detailed power consumption information in a non-intrusive manner. Power measurements are taken on each node, with additional measurements for the Aries network and blowers, and recorded in a database. This enables fine-grained reporting of power consumption that is not possible with external power meters, and is useful to both application developers and facility operators. This paper will show how benchmarks of representative applications at CSCS were used to validate the PMDB on Piz Daint. Furthermore we will elaborate, with the well-known HPL benchmark serving as prototypical application, on how the PMDB streamlines the tuning for optimal power efficiency in production, which lead to Piz Daint being recognised as the most energy efficient petascale supercomputer presently in operation. △ Less

Submitted 12 August, 2014; originally announced August 2014.

Comments: This paper was presented at the 2014 Cray User Group (CUG) user meeting in Lugano, Switzerland,First Experiences With Validating and Using the Cray Power Management Database Tool, Gilles Fourestey and Ben Cumming and Ladina Gilly, Proceedings of the CUG meeting, 2014

Showing 1–8 of 8 results for author: Schulthess, T