-
FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs
Authors:
Philipp Schaad,
Timo Schneider,
Tal Ben-Nun,
Alexandru Calotoiu,
Alexandros Nikolaos Ziogas,
Torsten Hoefler
Abstract:
The current hardware landscape and application scale is driving performance engineers towards writing bespoke optimizations. Verifying such optimizations, and generating minimal failing cases, is important for robustness in the face of changing program conditions, such as inputs and sizes. However, isolation of minimal test-cases from existing applications and generating new configurations are oft…
▽ More
The current hardware landscape and application scale is driving performance engineers towards writing bespoke optimizations. Verifying such optimizations, and generating minimal failing cases, is important for robustness in the face of changing program conditions, such as inputs and sizes. However, isolation of minimal test-cases from existing applications and generating new configurations are often difficult due to side effects on the system state, mostly related to dataflow. This paper introduces FuzzyFlow: a fault localization and test case extraction framework designed to test program optimizations. We leverage dataflow program representations to capture a fully reproducible system state and area-of-effect for optimizations to enable fast checking for semantic equivalence. To reduce testing time, we design an algorithm for minimizing test inputs, trading off memory for recomputation. We demonstrate FuzzyFlow on example use cases in real-world applications where the approach provides up to 528 times faster optimization testing and debugging compared to traditional approaches.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization
Authors:
Lukas Trümper,
Tal Ben-Nun,
Philipp Schaad,
Alexandru Calotoiu,
Torsten Hoefler
Abstract:
Performance optimization is an increasingly challenging but often repetitive task. While each platform has its quirks, the underlying code transformations rely on data movement and computational characteristics that recur across applications. This paper proposes to leverage those similarities by constructing an embedding space for subprograms. The continuous space captures both static and dynamic…
▽ More
Performance optimization is an increasingly challenging but often repetitive task. While each platform has its quirks, the underlying code transformations rely on data movement and computational characteristics that recur across applications. This paper proposes to leverage those similarities by constructing an embedding space for subprograms. The continuous space captures both static and dynamic properties of loop nests via symbolic code analysis and performance profiling, respectively. Performance embeddings enable direct knowledge transfer of performance tuning between applications, which can result from autotuning or tailored improvements. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils. Transfer tuning reduces the search complexity by up to four orders of magnitude and outperforms the MKL library in sparse-dense matrix multiplication. The results exhibit clear correspondences between program characteristics and optimizations, outperforming prior specialized state-of-the-art approaches and generalizing beyond their capabilities.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Medical Diffusion: Denoising Diffusion Probabilistic Models for 3D Medical Image Generation
Authors:
Firas Khader,
Gustav Mueller-Franzes,
Soroosh Tayebi Arasteh,
Tianyu Han,
Christoph Haarburger,
Maximilian Schulze-Hagen,
Philipp Schad,
Sandy Engelhardt,
Bettina Baessler,
Sebastian Foersch,
Johannes Stegmaier,
Christiane Kuhl,
Sven Nebelung,
Jakob Nikolas Kather,
Daniel Truhn
Abstract:
Recent advances in computer vision have shown promising results in image generation. Diffusion probabilistic models in particular have generated realistic images from textual input, as demonstrated by DALL-E 2, Imagen and Stable Diffusion. However, their use in medicine, where image data typically comprises three-dimensional volumes, has not been systematically evaluated. Synthetic images may play…
▽ More
Recent advances in computer vision have shown promising results in image generation. Diffusion probabilistic models in particular have generated realistic images from textual input, as demonstrated by DALL-E 2, Imagen and Stable Diffusion. However, their use in medicine, where image data typically comprises three-dimensional volumes, has not been systematically evaluated. Synthetic images may play a crucial role in privacy preserving artificial intelligence and can also be used to augment small datasets. Here we show that diffusion probabilistic models can synthesize high quality medical imaging data, which we show for Magnetic Resonance Images (MRI) and Computed Tomography (CT) images. We provide quantitative measurements of their performance through a reader study with two medical experts who rated the quality of the synthesized images in three categories: Realistic image appearance, anatomical correctness and consistency between slices. Furthermore, we demonstrate that synthetic images can be used in a self-supervised pre-training and improve the performance of breast segmentation models when data is scarce (dice score 0.91 vs. 0.95 without vs. with synthetic data). The code is publicly available on GitHub: https://github.com/FirasGit/medicaldiffusion.
△ Less
Submitted 3 January, 2023; v1 submitted 7 November, 2022;
originally announced November 2022.
-
Boosting Performance Optimization with Interactive Data Movement Visualization
Authors:
Philipp Schaad,
Tal Ben-Nun,
Torsten Hoefler
Abstract:
Optimizing application performance in today's hardware architecture landscape is an important, but increasingly complex task, often requiring detailed performance analyses. In particular, data movement and reuse play a crucial role in optimization and are often hard to improve without detailed program inspection. Performance visualizations can assist in the diagnosis of performance problems, but g…
▽ More
Optimizing application performance in today's hardware architecture landscape is an important, but increasingly complex task, often requiring detailed performance analyses. In particular, data movement and reuse play a crucial role in optimization and are often hard to improve without detailed program inspection. Performance visualizations can assist in the diagnosis of performance problems, but generally rely on data gathered through lengthy program executions. In this paper, we present a performance visualization geared towards analyzing data movement and reuse to inform impactful optimization decisions, without requiring program execution. We propose an approach that combines static dataflow analysis with parameterized program simulations to analyze both global data movement and fine-grained data access and reuse behavior, and visualize insights in-situ on the program representation. Case studies analyzing and optimizing real-world applications demonstrate our tool's effectiveness in guiding optimization decisions and making the performance tuning process more interactive.
△ Less
Submitted 24 August, 2022; v1 submitted 21 June, 2022;
originally announced July 2022.
-
Lifting C Semantics for Dataflow Optimization
Authors:
Alexandru Calotoiu,
Tal Ben-Nun,
Grzegorz Kwasniewski,
Johannes de Fine Licht,
Timo Schneider,
Philipp Schaad,
Torsten Hoefler
Abstract:
C is the lingua franca of programming and almost any device can be programmed using C. However, programming mod-ern heterogeneous architectures such as multi-core CPUs and GPUs requires explicitly expressing parallelism as well as device-specific properties such as memory hierarchies. The resulting code is often hard to understand, debug, and modify for different architectures. We propose to lift…
▽ More
C is the lingua franca of programming and almost any device can be programmed using C. However, programming mod-ern heterogeneous architectures such as multi-core CPUs and GPUs requires explicitly expressing parallelism as well as device-specific properties such as memory hierarchies. The resulting code is often hard to understand, debug, and modify for different architectures. We propose to lift C programs to a parametric dataflow representation that lends itself to static data-centric analysis and enables automatic high-performance code generation. We separate writing code from optimizing for different hardware: simple, portable C source code is used to generate efficient specialized versions with a click of a button. Our approach can identify parallelism when no other compiler can, and outperforms a bespoke parallelized version of a scientific proxy application by up to 21%.
△ Less
Submitted 24 May, 2022; v1 submitted 22 December, 2021;
originally announced December 2021.