-
Preparing Reproducible Scientific Artifacts using Docker
Authors:
Michael Canesche,
Roland Leissa,
Fernando Magno Quintão Pereira
Abstract:
The pursuit of scientific knowledge strongly depends on the ability to reproduce and validate research results. It is a well-known fact that the scientific community faces challenges related to transparency, reliability, and the reproducibility of empirical published results. Consequently, the design and preparation of reproducible artifacts has a fundamental role in the development of science. Re…
▽ More
The pursuit of scientific knowledge strongly depends on the ability to reproduce and validate research results. It is a well-known fact that the scientific community faces challenges related to transparency, reliability, and the reproducibility of empirical published results. Consequently, the design and preparation of reproducible artifacts has a fundamental role in the development of science. Reproducible artifacts comprise comprehensive documentation, data, and code that enable replication and validation of research findings by others. In this work, we discuss a methodology to construct reproducible artifacts based on Docker. Our presentation centers around the preparation of an artifact to be submitted to scientific venues that encourage or require this process. This report's primary audience are scientists working with empirical computer science; however, we believe that the presented methodology can be extended to other technology-oriented empirical disciplines.
△ Less
Submitted 27 August, 2023;
originally announced August 2023.
-
AnySeq/GPU: A Novel Approach for Faster Sequence Alignment on GPUs
Authors:
André Müller,
Bertil Schmidt,
Richard Membarth,
Roland Leißa,
Sebastian Hack
Abstract:
In recent years, the rapidly increasing number of reads produced by next-generation sequencing (NGS) technologies has driven the demand for efficient implementations of sequence alignments in bioinformatics. However, current state-of-the-art approaches are not able to leverage the massively parallel processing capabilities of modern GPUs with close-to-peak performance.
We present AnySeq/GPU-a se…
▽ More
In recent years, the rapidly increasing number of reads produced by next-generation sequencing (NGS) technologies has driven the demand for efficient implementations of sequence alignments in bioinformatics. However, current state-of-the-art approaches are not able to leverage the massively parallel processing capabilities of modern GPUs with close-to-peak performance.
We present AnySeq/GPU-a sequence alignment library that augments the AnySeq1 library with a novel approach for accelerating dynamic programming (DP) alignment on GPUs by minimizing memory accesses using warp shuffles and half-precision arithmetic. Our implementation is based on the AnyDSL compiler framework which allows for convenient zero-cost abstractions through guaranteed partial evaluation. We show that our approach achieves over 80% of the peak performance on both NVIDIA and AMD GPUs thereby outperforming the GPU-based alignment libraries AnySeq1, GASAL2, ADEPT, and NVBIO by a factor of at least 3.6 while achieving a median speedup of 19.2x over these tools across different alignment scenarios and sequence lengths when running on the same hardware.
This leads to throughputs of up to 1.7 TCUPS (tera cell updates per second) on an NVIDIA GV100, up to 3.3 TCUPS with half-precision arithmetic on a single NVIDIA A100, and up to 3.8 TCUPS on an AMD MI100.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
FLOWER: A comprehensive dataflow compiler for high-level synthesis
Authors:
Puya Amiri,
Arsène Pérard-Gayot,
Richard Membarth,
Philipp Slusallek,
Roland Leißa,
Sebastian Hack
Abstract:
FPGAs have found their way into data centers as accelerator cards, making reconfigurable computing more accessible for high-performance applications. At the same time, new high-level synthesis compilers like Xilinx Vitis and runtime libraries such as XRT attract software programmers into the reconfigurable domain. While software programmers are familiar with task-level and data-parallel programmin…
▽ More
FPGAs have found their way into data centers as accelerator cards, making reconfigurable computing more accessible for high-performance applications. At the same time, new high-level synthesis compilers like Xilinx Vitis and runtime libraries such as XRT attract software programmers into the reconfigurable domain. While software programmers are familiar with task-level and data-parallel programming, FPGAs often require different types of parallelism. For example, data-driven parallelism is mandatory to obtain satisfactory hardware designs for pipelined dataflow architectures. However, software programmers are often not acquainted with dataflow architectures - resulting in poor hardware designs.
In this work we present FLOWER, a comprehensive compiler infrastructure that provides automatic canonical transformations for high-level synthesis from a domain-specific library. This allows programmers to focus on algorithm implementations rather than low-level optimizations for dataflow architectures. We show that FLOWER allows to synthesize efficient implementations for high-performance streaming applications targeting System-on-Chip and FPGA accelerator cards, in the context of image processing and computer vision.
△ Less
Submitted 14 December, 2021;
originally announced December 2021.
-
tinyMD: A Portable and Scalable Implementation for Pairwise Interactions Simulations
Authors:
Rafael Ravedutti L. Machado,
Jonas Schmitt,
Sebastian Eibl,
Jan Eitzinger,
Roland Leißa,
Sebastian Hack,
Arsène Pérard-Gayot,
Richard Membarth,
Harald Köstler
Abstract:
This paper investigates the suitability of the AnyDSL partial evaluation framework to implement tinyMD: an efficient, scalable, and portable simulation of pairwise interactions among particles. We compare tinyMD with the miniMD proxy application that scales very well on parallel supercomputers. We discuss the differences between both implementations and contrast miniMD's performance for single-nod…
▽ More
This paper investigates the suitability of the AnyDSL partial evaluation framework to implement tinyMD: an efficient, scalable, and portable simulation of pairwise interactions among particles. We compare tinyMD with the miniMD proxy application that scales very well on parallel supercomputers. We discuss the differences between both implementations and contrast miniMD's performance for single-node CPU and GPU targets, as well as its scalability on SuperMUC-NG and Piz Daint supercomputers. Additionaly, we demonstrate tinyMD's flexibility by coupling it with the waLBerla multi-physics framework. This allow us to execute tinyMD simulations using the load-balancing mechanism implemented in waLBerla.
△ Less
Submitted 15 September, 2020;
originally announced September 2020.
-
AnyHLS: High-Level Synthesis with Partial Evaluation
Authors:
M. Akif Özkan,
Arsène Pérard-Gayot,
Richard Membarth,
Philipp Slusallek,
Roland Leissa,
Sebastian Hack,
Jürgen Teich,
Frank Hannig
Abstract:
FPGAs excel in low power and high throughput computations, but they are challenging to program. Traditionally, developers rely on hardware description languages like Verilog or VHDL to specify the hardware behavior at the register-transfer level. High-Level Synthesis (HLS) raises the level of abstraction, but still requires FPGA design knowledge. Programmers usually write pragma-annotated C/C++ pr…
▽ More
FPGAs excel in low power and high throughput computations, but they are challenging to program. Traditionally, developers rely on hardware description languages like Verilog or VHDL to specify the hardware behavior at the register-transfer level. High-Level Synthesis (HLS) raises the level of abstraction, but still requires FPGA design knowledge. Programmers usually write pragma-annotated C/C++ programs to define the hardware architecture of an application. However, each hardware vendor extends its own C dialect using its own vendor-specific set of pragmas. This prevents portability across different vendors. Furthermore, pragmas are not first-class citizens in the language. This makes it hard to use them in a modular way or design proper abstractions. In this paper, we present AnyHLS, an approach to synthesize FPGA designs in a modular and abstract way. AnyHLS is able to raise the abstraction level of existing HLS tools by resorting to programming language features such as types and higher-order functions as follows: It relies on partial evaluation to specialize and to optimize the user application based on a library of abstractions. Then, vendor-specific HLS code is generated for Intel and Xilinx FPGAs. Portability is obtained by avoiding any vendor-specific pragmas at the source code. In order to validate achievable gains in productivity, a library for the domain of image processing is introduced as a case study, and its synthesis results are compared with several state-of-theart Domain-Specific Language (DSL) approaches for this domain.
△ Less
Submitted 21 July, 2020; v1 submitted 13 February, 2020;
originally announced February 2020.
-
AnySeq: A High Performance Sequence Alignment Library based on Partial Evaluation
Authors:
André Müller,
Bertil Schmidt,
Andreas Hildebrandt,
Richard Membarth,
Roland Leißa,
Matthis Kruse,
Sebastian Hack
Abstract:
Sequence alignments are fundamental to bioinformatics which has resulted in a variety of optimized implementations. Unfortunately, the vast majority of them are hand-tuned and specific to certain architectures and execution models. This not only makes them challenging to understand and extend, but also difficult to port to other platforms. We present AnySeq - a novel library for computing differen…
▽ More
Sequence alignments are fundamental to bioinformatics which has resulted in a variety of optimized implementations. Unfortunately, the vast majority of them are hand-tuned and specific to certain architectures and execution models. This not only makes them challenging to understand and extend, but also difficult to port to other platforms. We present AnySeq - a novel library for computing different types of pairwise alignments of DNA sequences. Our approach combines high performance with an intuitively understandable implementation, which is achieved through the concept of partial evaluation. Using the AnyDSL compiler framework, AnySeq enables the compilation of algorithmic variants that are highly optimized for specific usage scenarios and hardware targets with a single, uniform codebase. The resulting domain-specific library thus allows the variation of alignment parameters (such as alignment type, scoring scheme, and traceback vs.~plain score) by simple function composition rather than metaprogramming techniques which are often hard to understand. Our implementation supports multithreading and SIMD vectorization on CPUs, CUDA-enabled GPUs, and FPGAs. AnySeq is at most 7% slower and in many cases faster (up to 12%) than state-of-the art manually optimized alignment libraries on CPUs (SeqAn) and on GPUs (NVBio).
△ Less
Submitted 11 February, 2020;
originally announced February 2020.