Zum Hauptinhalt springen

Showing 1–13 of 13 results for author: Sinclair, M D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11919  [pdf, other

    cs.DC

    PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters

    Authors: Rutwik Jain, Brandon Tran, Keting Chen, Matthew D. Sinclair, Shivaram Venkataraman

    Abstract: Large-scale computing systems are increasingly using accelerators such as GPUs to enable peta- and exa-scale levels of compute to meet the needs of Machine Learning (ML) and scientific computing applications. Given the widespread and growing use of ML, including in some scientific applications, optimizing these clusters for ML workloads is particularly important. However, recent work has demonstra… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  2. arXiv:2401.16677  [pdf, other

    cs.AR cs.DC cs.LG

    T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives

    Authors: Suchita Pati, Shaizeen Aga, Mahzabeen Islam, Nuwan Jayasena, Matthew D. Sinclair

    Abstract: Large Language Models increasingly rely on distributed techniques for their training and inference. These techniques require communication across devices which can reduce scaling efficiency as the number of devices increases. While some distributed techniques can overlap, and thus, hide this communication with independent computations, techniques such as Tensor Parallelism (TP) inherently serializ… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: To appear at the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2024

    ACM Class: C.2.4; C.1.2

  3. arXiv:2306.03964  [pdf

    cs.AR

    Fifty Years of ISCA: A data-driven retrospective on key trends

    Authors: Gaurang Upasani, Matthew D. Sinclair, Adrian Sampson, Parthasarathy Ranganathan, David Patterson, Shaan Shah, Nidhi Parthasarathy, Rutwik Jain

    Abstract: Computer Architecture, broadly, involves optimizing hardware and software for current and future processing systems. Although there are several other top venues to publish Computer Architecture research, including ASPLOS, HPCA, and MICRO, ISCA (the International Symposium on Computer Architecture) is one of the oldest, longest running, and most prestigious venues for publishing Computer Architectu… ▽ More

    Submitted 18 November, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: 34 pages, 16 figures

  4. arXiv:2304.11136  [pdf

    cs.AR cs.PF

    Integrating Per-Stream Stat Tracking into Accel-Sim

    Authors: Shichen Qiao, Xin Su, Matthew D. Sinclair

    Abstract: Accel-Sim is a widely used computer architecture simulator that models the behavior of modern NVIDIA GPUs in great detail. However, although Accel-Sim and the underlying GPGPU-Sim model many of the features of real GPUs, thus far it has not been able to track statistics separately per stream. Instead, Accel-Sim combines statistics (e.g., cycles and cache hits/misses) across all simultaneously runn… ▽ More

    Submitted 4 September, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

    Comments: 13 pages

  5. arXiv:2302.02825  [pdf

    cs.AR cs.DC

    Computation vs. Communication Scaling for Future Transformers on Future Hardware

    Authors: Suchita Pati, Shaizeen Aga, Mahzabeen Islam, Nuwan Jayasena, Matthew D. Sinclair

    Abstract: Scaling neural network models has delivered dramatic quality gains across ML problems. However, this scaling has increased the reliance on efficient distributed training techniques. Accordingly, as with other distributed computing scenarios, it is important to understand how will compute and communication scale relative to one another as models scale and hardware evolves? A careful study which ans… ▽ More

    Submitted 2 May, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    ACM Class: C.4; C.2.4

  6. arXiv:2208.11035  [pdf, other

    cs.DC

    Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems

    Authors: Prasoon Sinha, Akhil Guliani, Rutwik Jain, Brandon Tran, Matthew D. Sinclair, Shivaram Venkataraman

    Abstract: Scientists are increasingly exploring and utilizing the massive parallelism of general-purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters, hyperscalers, national computing centers, and supercomputers have procured hardware to support this evolving application paradigm. These systems contain hundreds to tens of thousands of accelerators, enabling peta- and exa-… ▽ More

    Submitted 8 November, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

    Comments: 14 pages, 18 figures, to appear at The 34th International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '22)

  7. arXiv:2104.11678  [pdf, other

    cs.AR

    A Case for Fine-grain Coherence Specialization in Heterogeneous Systems

    Authors: Johnathan Alsop, Weon Taek Na, Matthew D. Sinclair, Samuel Grayson, Sarita V. Adve

    Abstract: Hardware specialization is becoming a key enabler of energyefficient performance. Future systems will be increasingly heterogeneous, integrating multiple specialized and programmable accelerators, each with different memory demands. Traditionally, communication between accelerators has been inefficient, typically orchestrated through explicit DMA transfers between different address spaces. More re… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

  8. arXiv:2104.08335  [pdf

    cs.AR cs.DC cs.LG

    Demystifying BERT: Implications for Accelerator Design

    Authors: Suchita Pati, Shaizeen Aga, Nuwan Jayasena, Matthew D. Sinclair

    Abstract: Transfer learning in natural language processing (NLP), as realized using models like BERT (Bi-directional Encoder Representation from Transformer), has significantly improved language representation with models that can tackle challenging language problems. Consequently, these applications are driving the requirements of future systems. Thus, we focus on BERT, one of the most popular NLP transfer… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

    ACM Class: C.3; C.4

  9. arXiv:2007.10459  [pdf

    cs.DC

    SeqPoint: Identifying Representative Iterations of Sequence-based Neural Networks

    Authors: Suchita Pati, Shaizeen Aga, Matthew D. Sinclair, Nuwan Jayasena

    Abstract: The ubiquity of deep neural networks (DNNs) continues to rise, making them a crucial application class for hardware optimizations. However, detailed profiling and characterization of DNN training remains difficult as these applications often run for hours to days on real hardware. Prior works exploit the iterative nature of DNNs to profile a few training iterations. While such a strategy is sound… ▽ More

    Submitted 20 July, 2020; originally announced July 2020.

    Comments: To appear in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2020)

    ACM Class: C.4

  10. arXiv:2007.03152  [pdf, other

    cs.AR

    The gem5 Simulator: Version 20.0+

    Authors: Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adrià Armejach, Nils Asmussen, Brad Beckmann, Srikant Bharadwaj, Gabe Black, Gedare Bloom, Bobby R. Bruce, Daniel Rodrigues Carvalho, Jeronimo Castrillon, Lizhong Chen, Nicolas Derumigny, Stephan Diestelhorst, Wendy Elsasser, Carlos Escuin, Marjan Fariborz, Amin Farmahini-Farahani, Pouya Fotouhi, Ryan Gambord, Jayneel Gandhi , et al. (53 additional authors not shown)

    Abstract: The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern computer hardware at the cycle level, and it has enough fidelity to boot unmodified Linux-based operating systems and run full applications for multiple architectures including x86, Arm, and RISC-V. The gem5 si… ▽ More

    Submitted 29 September, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

    Comments: Source, comments, and feedback: https://github.com/darchr/gem5-20-paper

  11. arXiv:2002.10245  [pdf, other

    cs.DC

    Specializing Coherence, Consistency, and Push/Pull for GPU Graph Analytics

    Authors: Giordano Salvador, Wesley H. Darvin, Muhammad Huzaifa, Johnathan Alsop, Matthew D. Sinclair, Sarita V. Adve

    Abstract: This work provides the first study to explore the interaction of update propagation with and without fine-grained synchronization (push vs. pull), emerging coherence protocols (GPU vs. DeNovo coherence), and software-centric consistency models (DRF0, DRF1, and DRFrlx) for graph workloads on emerging integrated GPU-CPU systems with native unified shared memory. We study 6 graph applications with 6… ▽ More

    Submitted 25 February, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

  12. arXiv:1910.00134  [pdf

    cs.AR

    Optimizing GPU Cache Policies for MI Workloads

    Authors: Johnathan Alsop, Matthew D. Sinclair, Srikant Bharadwaj, Alexandru Dutu, Anthony Gutierrez, Onur Kayiran, Michael LeBeane, Sooraj Puthoor, Xianwei Zhang, Tsung Tai Yeh, Bradford M. Beckmann

    Abstract: In recent years, machine intelligence (MI) applications have emerged as a major driver for the computing industry. Optimizing these workloads is important but complicated. As memory demands grow and data movement overheads increasingly limit performance, determining the best GPU caching policy to use for a diverse range of MI workloads represents one important challenge. To study this, we evaluate… ▽ More

    Submitted 30 September, 2019; originally announced October 2019.

    Comments: Extended version of short paper published in the 2019 IEEE International Symposium on Workload Characterization

  13. arXiv:1811.08933  [pdf, other

    cs.DC

    Analyzing Machine Learning Workloads Using a Detailed GPU Simulator

    Authors: Jonathan Lew, Deval Shah, Suchita Pati, Shaylin Cattell, Mengchi Zhang, Amruth Sandhupatla, Christopher Ng, Negar Goli, Matthew D. Sinclair, Timothy G. Rogers, Tor Aamodt

    Abstract: Most deep neural networks deployed today are trained using GPUs via high-level frameworks such as TensorFlow and PyTorch. This paper describes changes we made to the GPGPU-Sim simulator to enable it to run PyTorch by running PTX kernels included in NVIDIA's cuDNN library. We use the resulting modified simulator, which has been made available publicly with this paper, to study some simple deep lear… ▽ More

    Submitted 26 January, 2019; v1 submitted 18 November, 2018; originally announced November 2018.

    Comments: Source code available at: https://github.com/gpgpu-sim/gpgpu-sim_distribution/tree/dev