Zum Hauptinhalt springen

Showing 1–21 of 21 results for author: Bhatele, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10328  [pdf, other

    cs.CV cs.CL cs.LG

    From Pixels to Prose: A Large Dataset of Dense Image Captions

    Authors: Vasu Singla, Kaiyu Yue, Sukriti Paul, Reza Shirkavand, Mayuka Jayawardhana, Alireza Ganjdanesh, Heng Huang, Abhinav Bhatele, Gowthami Somepalli, Tom Goldstein

    Abstract: Training large vision-language models requires extensive, high-quality image-text pairs. Existing web-scraped datasets, however, are noisy and lack detailed image descriptions. To bridge this gap, we introduce PixelProse, a comprehensive dataset of over 16M (million) synthetically generated captions, leveraging cutting-edge vision-language models for detailed and accurate descriptions. To ensure d… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: pixelprose 16M dataset

  2. arXiv:2406.10209  [pdf, other

    cs.CL

    Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

    Authors: Abhimanyu Hans, Yuxin Wen, Neel Jain, John Kirchenbauer, Hamid Kazemi, Prajwal Singhania, Siddharth Singh, Gowthami Somepalli, Jonas Geiping, Abhinav Bhatele, Tom Goldstein

    Abstract: Large language models can memorize and repeat their training data, causing privacy and copyright risks. To mitigate memorization, we introduce a subtle modification to the next-token training objective that we call the goldfish loss. During training, a randomly sampled subset of tokens are excluded from the loss computation. These dropped tokens are not memorized by the model, which prevents verba… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 9.5 pages, 8 figures, and 1 table in the main body. Code available at https://github.com/ahans30/goldfish-loss

  3. arXiv:2406.02542  [pdf, other

    cs.LG

    Loki: Low-Rank Keys for Efficient Sparse Attention

    Authors: Prajwal Singhania, Siddharth Singh, Shwai He, Soheil Feizi, Abhinav Bhatele

    Abstract: Inference on large language models can be expensive in terms of the compute and memory costs involved, especially when long sequence lengths are used. In particular, the self-attention mechanism used in such models contributes significantly to these costs, which has resulted in several recent works that propose sparse attention approximations for inference. In this work, we propose to approximate… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  4. arXiv:2405.17399  [pdf, other

    cs.LG cs.AI

    Transformers Can Do Arithmetic with the Right Embeddings

    Authors: Sean McLeish, Arpit Bansal, Alex Stein, Neel Jain, John Kirchenbauer, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Jonas Geiping, Avi Schwarzschild, Tom Goldstein

    Abstract: The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix ena… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  5. arXiv:2404.18864  [pdf, other

    cs.DC cs.AI cs.SE

    Performance-Aligned LLMs for Generating Fast Code

    Authors: Daniel Nichols, Pranav Polasam, Harshitha Menon, Aniruddha Marathe, Todd Gamblin, Abhinav Bhatele

    Abstract: Optimizing scientific software is a difficult task because codebases are often large and complex, and performance can depend upon several factors including the algorithm, its implementation, and hardware among others. Causes of poor performance can originate from disparate sources and be difficult to diagnose. Recent years have seen a multitude of work that use large language models (LLMs) to assi… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  6. arXiv:2402.08950  [pdf, other

    cs.DC cs.PF

    Taking GPU Programming Models to Task for Performance Portability

    Authors: Joshua H. Davis, Pranav Sivaraman, Joy Kitson, Konstantinos Parasyris, Harshitha Menon, Isaac Minn, Giorgis Georgakoudis, Abhinav Bhatele

    Abstract: Portability is critical to ensuring high productivity in developing and maintaining scientific software as the diversity in on-node hardware architectures increases. While several programming models provide portability for diverse GPU platforms, they don't make any guarantees about performance portability. In this work, we explore several programming models -- CUDA, HIP, Kokkos, RAJA, OpenMP, Open… ▽ More

    Submitted 21 May, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 12 pages, 4 figures

  7. arXiv:2401.13150  [pdf, other

    cs.DC cs.PF

    Automated Programmatic Performance Analysis of Parallel Programs

    Authors: Onur Cankur, Aditya Tomar, Daniel Nichols, Connor Scully-Allison, Katherine E. Isaacs, Abhinav Bhatele

    Abstract: Developing efficient parallel applications is critical to advancing scientific development but requires significant performance analysis and optimization. Performance analysis tools help developers manage the increasing complexity and scale of performance data, but often rely on the user to manually explore low-level data and are rigid in how the data can be manipulated. We propose a Python-based… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  8. Can Large Language Models Write Parallel Code?

    Authors: Daniel Nichols, Joshua H. Davis, Zhaojun Xie, Arjun Rajaram, Abhinav Bhatele

    Abstract: Large language models are increasingly becoming a popular tool for software development. Their ability to model and generate source code has been demonstrated in a variety of contexts, including code completion, summarization, translation, and lookup. However, they often struggle to generate code for complex programs. In this paper, we study the capabilities of state-of-the-art language models to… ▽ More

    Submitted 14 May, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Journal ref: The 33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC '24), June 3-7, 2024, Pisa, Italy. ACM, New York, NY, USA, 14 pages

  9. arXiv:2401.08124  [pdf, other

    cs.DC

    A Large-Scale Epidemic Simulation Framework for Realistic Social Contact Networks

    Authors: Joy Kitson, Ian Costello, Jiangzhuo Chen, Diego Jiménez, Stefan Hoops, Henning Mortveit, Esteban Meneses, Jae-Seung Yeom, Madhav V. Marathe, Abhinav Bhatele

    Abstract: Global pandemics can wreak havoc and lead to significant social, economic, and personal losses. Preventing the spread of infectious diseases requires implementing interventions at different levels of government, and evaluating the potential impact and efficacy of those preemptive measures. Agent-based modeling can be used for detailed studies of epidemic diffusion and possible interventions. We pr… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 13 pages (including references), 9 figures

  10. arXiv:2312.06131  [pdf, other

    cs.DC

    ML-based Modeling to Predict I/O Performance on Different Storage Sub-systems

    Authors: Yiheng Xu, Pranav Sivaraman, Hariharan Devarajan, Kathryn Mohror, Abhinav Bhatele

    Abstract: Parallel applications can spend a significant amount of time performing I/O on large-scale supercomputers. Fast near-compute storage accelerators called burst buffers can reduce the time a processor spends performing I/O and mitigate I/O bottlenecks. However, determining if a given application could be accelerated using burst buffers is not straightforward even for storage experts. The relationshi… ▽ More

    Submitted 11 January, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  11. arXiv:2310.12298  [pdf, other

    cs.LG cs.AI cs.DC

    Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization

    Authors: Siddharth Singh, Zachary Sating, Abhinav Bhatele

    Abstract: Despite their better convergence properties compared to first-order optimizers, second-order optimizers for deep learning have been less popular due to their significant computational costs. The primary efficiency bottleneck in such optimizers is matrix inverse calculations in the preconditioning step, which are expensive to compute on GPUs. In this paper, we introduce Jorge, a second-order optimi… ▽ More

    Submitted 26 October, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

  12. HPC-Coder: Modeling Parallel Programs using Large Language Models

    Authors: Daniel Nichols, Aniruddha Marathe, Harshitha Menon, Todd Gamblin, Abhinav Bhatele

    Abstract: Parallel programs in high performance computing (HPC) continue to grow in complexity and scale in the exascale era. The diversity in hardware and parallel programming models make developing, optimizing, and maintaining parallel software even more burdensome for developers. One way to alleviate some of these burdens is with automated development and analysis tools. Such tools can perform complex an… ▽ More

    Submitted 14 May, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

    Journal ref: ISC High Performance 2024 Research Paper Proceedings (39th International Conference), Hamburg, Germany, 2024, pp. 1-12

  13. arXiv:2306.11177  [pdf, other

    cs.DC cs.PF

    Pipit: Scripting the analysis of parallel execution traces

    Authors: Abhinav Bhatele, Rakrish Dhakal, Alexander Movsesyan, Aditya K. Ranjan, Onur Cankur

    Abstract: Performance analysis is a critical step in the oft-repeated, iterative process of performance tuning of parallel programs. Per-process, per-thread traces (detailed logs of events with timestamps) enable in-depth analysis of parallel program execution to identify different kinds of performance issues. Often times, trace collection tools provide a graphical tool to analyze the trace output. However,… ▽ More

    Submitted 14 May, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

  14. arXiv:2305.13525  [pdf, other

    cs.LG cs.AI cs.DC cs.PF

    A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs

    Authors: Siddharth Singh, Prajwal Singhania, Aditya K. Ranjan, Zack Sating, Abhinav Bhatele

    Abstract: Heavy communication, in particular, collective operations, can become a critical performance bottleneck in scaling the training of billion-parameter neural networks to large-scale parallel systems. This paper introduces a four-dimensional (4D) approach to optimize communication in parallel training. This 4D approach is a hybrid of 3D tensor and data parallelism, and is implemented in the AxoNN fra… ▽ More

    Submitted 14 May, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

  15. arXiv:2303.06318  [pdf, other

    cs.LG cs.AI cs.DC cs.PF

    A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training

    Authors: Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele

    Abstract: Mixture-of-Experts (MoE) is a neural network architecture that adds sparsely activated expert blocks to a base model, increasing the number of parameters without impacting computational costs. However, current distributed deep learning frameworks are limited in their ability to train high-quality MoE models with large base models. In this work, we present DeepSpeed-TED, a novel, three-dimensional,… ▽ More

    Submitted 13 May, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

  16. arXiv:2302.05045  [pdf, other

    cs.LG cs.AI cs.DC cs.PF

    Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training

    Authors: Siddharth Singh, Abhinav Bhatele

    Abstract: Parallel training of neural networks at scale is challenging due to significant overheads arising from communication. Recently, deep learning researchers have developed a variety of pruning algorithms that are capable of pruning (i.e. setting to zero) 80-90% of the parameters in a neural network to yield sparse subnetworks that equal the accuracy of the unpruned parent network. In this work, we pr… ▽ More

    Submitted 14 May, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

  17. arXiv:2205.04557  [pdf, other

    cs.HC

    Designing an Interactive, Notebook-Embedded, Tree Visualization to Support Exploratory Performance Analysis

    Authors: Connor Scully-Allison, Ian Lumsden, Katy Williams, Jesse Bartels, Michela Taufer, Stephanie Brink, Abhinav Bhatele, Olga Pearce, Katherine E. Isaacs

    Abstract: Interactive visualization via direct manipulation has inherent design trade-offs in flexibility, discoverability, and ease-of-use. Scripting languages can support a vast range of user queries and tasks, but may be more cumbersome for free-form exploration. Embedding interactive visualization in a scripting environment, such as a computational notebook, provides an opportunity for leveraging the st… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: Submitted to IEEE VIS 2022

  18. arXiv:2111.04949  [pdf, other

    cs.LG cs.AI cs.DC

    A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks

    Authors: Daniel Nichols, Siddharth Singh, Shu-Huai Lin, Abhinav Bhatele

    Abstract: The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields. This phenomenon has spurred the development of algorithms for distributed training of neural networks over a larger number of hardware accelerators. In this paper… ▽ More

    Submitted 30 June, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

  19. arXiv:2110.13005  [pdf, other

    cs.LG cs.AI cs.CL cs.DC cs.PF

    AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning

    Authors: Siddharth Singh, Abhinav Bhatele

    Abstract: In the last few years, the memory requirements to train state-of-the-art neural networks have far exceeded the DRAM capacities of modern hardware accelerators. This has necessitated the development of efficient algorithms to train these neural networks in parallel on large-scale GPU-based clusters. Since computation is relatively inexpensive on modern GPUs, designing and implementing extremely eff… ▽ More

    Submitted 14 May, 2023; v1 submitted 25 October, 2021; originally announced October 2021.

    Comments: Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS). IEEE Computer Society, May 2022

  20. arXiv:2007.03451  [pdf, other

    cs.DC cs.LG cs.PF

    Analytics of Longitudinal System Monitoring Data for Performance Prediction

    Authors: Ian J. Costello, Abhinav Bhatele

    Abstract: In recent years, several HPC facilities have started continuous monitoring of their systems and jobs to collect performance-related data for understanding performance and operational efficiency. Such data can be used to optimize the performance of individual jobs and the overall system by creating data-driven models that can predict the performance of jobs waiting in the scheduler queue. In this p… ▽ More

    Submitted 2 July, 2024; v1 submitted 7 July, 2020; originally announced July 2020.

  21. arXiv:2007.01395  [pdf, other

    cs.DC cs.PF

    Scalable Comparative Visualization of Ensembles of Call Graphs

    Authors: Suraj P. Kesavan, Harsh Bhatia, Abhinav Bhatele, Todd Gamblin, Peer-Timo Bremer, Kwan-Liu Ma

    Abstract: Optimizing the performance of large-scale parallel codes is critical for efficient utilization of computing resources. Code developers often explore various execution parameters, such as hardware configurations, system software choices, and application parameters, and are interested in detecting and understanding bottlenecks in different executions. They often collect hierarchical performance prof… ▽ More

    Submitted 30 June, 2020; originally announced July 2020.

    Comments: 12 pages, 6 figures, Submitted to IEEE VIS 2020