Zum Hauptinhalt springen

Showing 1–5 of 5 results for author: Rolinger, T B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.05639  [pdf, other

    cs.DC cs.PF cs.PL

    JITSPMM: Just-in-Time Instruction Generation for Accelerated Sparse Matrix-Matrix Multiplication

    Authors: Qiang Fu, Thomas B. Rolinger, H. Howie Huang

    Abstract: Achieving high performance for Sparse MatrixMatrix Multiplication (SpMM) has received increasing research attention, especially on multi-core CPUs, due to the large input data size in applications such as graph neural networks (GNNs). Most existing solutions for SpMM computation follow the aheadof-time (AOT) compilation approach, which compiles a program entirely before it is executed. AOT compila… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  2. arXiv:2303.13954  [pdf, other

    cs.DC cs.PL

    Compiler Optimization for Irregular Memory Access Patterns in PGAS Programs

    Authors: Thomas B. Rolinger, Christopher D. Krieger, Alan Sussman

    Abstract: Irregular memory access patterns pose performance and user productivity challenges on distributed-memory systems. They can lead to fine-grained remote communication and the data access patterns are often not known until runtime. The Partitioned Global Address Space (PGAS) programming model addresses these challenges by providing users with a view of a distributed-memory system that resembles a sin… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: Accepted to the 35th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2022)

  3. An Empirical Evaluation of Allgatherv on Multi-GPU Systems

    Authors: Thomas B. Rolinger, Tyler A. Simon, Christopher D. Krieger

    Abstract: Applications for deep learning and big data analytics have compute and memory requirements that exceed the limits of a single GPU. However, effectively scaling out an application to multiple GPUs is challenging due to the complexities of communication between the GPUs, particularly for collective communication with irregular message sizes. In this work, we provide a performance evaluation of the A… ▽ More

    Submitted 14 December, 2018; originally announced December 2018.

    Comments: 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

  4. Parallel Sparse Tensor Decomposition in Chapel

    Authors: Thomas B. Rolinger, Tyler A. Simon, Christopher D. Krieger

    Abstract: In big-data analytics, using tensor decomposition to extract patterns from large, sparse multivariate data is a popular technique. Many challenges exist for designing parallel, high performance tensor decomposition algorithms due to irregular data accesses and the growing size of tensors that are processed. There have been many efforts at implementing shared-memory algorithms for tensor decomposit… ▽ More

    Submitted 14 December, 2018; originally announced December 2018.

    Comments: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 5th Annual Chapel Implementers and Users Workshop (CHIUW 2018)

  5. arXiv:1812.05955  [pdf, other

    cs.DC cs.PF

    Impact of Traditional Sparse Optimizations on a Migratory Thread Architecture

    Authors: Thomas B. Rolinger, Christopher D. Krieger

    Abstract: Achieving high performance for sparse applications is challenging due to irregular access patterns and weak locality. These properties preclude many static optimizations and degrade cache performance on traditional systems. To address these challenges, novel systems such as the Emu architecture have been proposed. The Emu design uses light-weight migratory threads, narrow memory, and near-memory p… ▽ More

    Submitted 14 December, 2018; originally announced December 2018.

    Comments: 8th Workshop on Irregular Applications: Architectures and Algorithms (IA^3) 2018