Zum Hauptinhalt springen

Showing 1–41 of 41 results for author: Patwary, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11796  [pdf, other

    cs.CL cs.AI cs.LG

    LLM Pruning and Distillation in Practice: The Minitron Approach

    Authors: Sharath Turuvekere Sreenivas, Saurav Muralidharan, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov

    Abstract: We present a comprehensive report on compressing the Llama 3.1 8B and Mistral NeMo 12B models to 4B and 8B parameters, respectively, using pruning and distillation. We explore two distinct pruning strategies: (1) depth pruning and (2) joint hidden/attention/MLP (width) pruning, and evaluate the results on common benchmarks from the LM Evaluation Harness. The models are then aligned with NeMo Align… ▽ More

    Submitted 26 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: v2: Added missing references. Cleaned up runtime performance section

  2. arXiv:2407.14679  [pdf, other

    cs.CL cs.AI cs.LG

    Compact Language Models via Pruning and Knowledge Distillation

    Authors: Saurav Muralidharan, Sharath Turuvekere Sreenivas, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov

    Abstract: Large language models (LLMs) targeting different deployment scales and sizes are currently produced by training each variant from scratch; this is extremely compute-intensive. In this paper, we investigate if pruning an existing LLM and then re-training it with a fraction (<3%) of the original training data can be a suitable alternative to repeated, full retraining. To this end, we develop a set o… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  3. arXiv:2407.07263  [pdf, other

    cs.CL

    Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models

    Authors: Jupinder Parmar, Sanjev Satheesh, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: As language models have scaled both their number of parameters and pretraining dataset sizes, the computational cost for pretraining has become intractable except for the most well-resourced teams. This increasing cost makes it ever more important to be able to reuse a model after it has completed pretraining; allowing for a model's abilities to further improve without needing to train from scratc… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Preprint. Under review

  4. arXiv:2407.06380  [pdf, other

    cs.CL

    Data, Data Everywhere: A Guide for Pretraining Dataset Construction

    Authors: Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Bo Liu, Aastha Jhunjhunwala, Zhilin Wang, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: The impressive capabilities of recent language models can be largely attributed to the multi-trillion token pretraining datasets that they are trained on. However, model developers fail to disclose their construction methodology which has lead to a lack of open information on how to develop effective pretraining sets. To address this issue, we perform the first systematic study across the entire p… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Preprint. Under review

  5. arXiv:2406.11704  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-4 340B Technical Report

    Authors: Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek , et al. (58 additional authors not shown)

    Abstract: We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be… ▽ More

    Submitted 6 August, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  6. arXiv:2406.09524  [pdf, other

    cs.SE

    Structure Editor for Building Software Models

    Authors: Mohammad Nurullah Patwary, Ana Jovanovic, Allison Sullivan

    Abstract: Alloy is well known a declarative modeling language. A key strength of Alloy is its scenario finding toolset, the Analyzer, which allows users to explore all valid scenarios that adhere to the model's constraints up to a user-provided scope. Despite the Analyzer, Alloy is still difficult for novice users to learn and use. A recent empirical study of over 93,000 new user models reveals that users h… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  7. arXiv:2402.19173  [pdf, other

    cs.SE cs.AI

    StarCoder 2 and The Stack v2: The Next Generation

    Authors: Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo , et al. (41 additional authors not shown)

    Abstract: The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  8. arXiv:2402.16819  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-4 15B Technical Report

    Authors: Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Mostofa Patwary, Sandeep Subramanian, Dan Su, Chen Zhu, Deepak Narayanan, Aastha Jhunjhunwala, Ayush Dattagupta, Vibhu Jawa, Jiwei Liu, Ameya Mahabaleshwarkar, Osvald Nitski, Annika Brundyn, James Maki, Miguel Martinez, Jiaxuan You, John Kamalu, Patrick LeGresley, Denys Fridman, Jared Casper, Ashwath Aithal, Oleksii Kuchaiev, Mohammad Shoeybi , et al. (2 additional authors not shown)

    Abstract: We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remai… ▽ More

    Submitted 27 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  9. arXiv:2401.10659  [pdf, other

    cs.CV

    BadODD: Bangladeshi Autonomous Driving Object Detection Dataset

    Authors: Mirza Nihal Baig, Rony Hajong, Mahdi Murshed Patwary, Mohammad Shahidur Rahman, Husne Ara Chowdhury

    Abstract: We propose a comprehensive dataset for object detection in diverse driving environments across 9 districts in Bangladesh. The dataset, collected exclusively from smartphone cameras, provided a realistic representation of real-world scenarios, including day and night conditions. Most existing datasets lack suitable classes for autonomous navigation on Bangladeshi roads, making it challenging for re… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 7 pages

  10. arXiv:2302.07388  [pdf, other

    cs.CL cs.AI

    Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

    Authors: Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: Pretrained large language models have become indispensable for solving various natural language processing (NLP) tasks. However, safely deploying them in real world applications is challenging because they generate toxic content. To address this challenge, we propose two novel pretraining data augmentation strategies that significantly reduce model toxicity without compromising its utility. Our tw… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

    Comments: This paper will be presented at EACL 2023

  11. arXiv:2210.13673  [pdf, other

    cs.CL

    Evaluating Parameter Efficient Learning for Generation

    Authors: Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan J. Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: Parameter efficient learning methods (PERMs) have recently gained significant attention as they provide an efficient way for pre-trained language models (PLMs) to adapt to a downstream task. However, these conclusions are mostly drawn from in-domain evaluations over the full training set. In this paper, we present comparisons between PERMs and finetuning from three new perspectives: (1) the effect… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted to EMNLP 2022 main conference

  12. arXiv:2210.06349  [pdf, other

    cs.CL cs.AI

    Context Generation Improves Open Domain Question Answering

    Authors: Dan Su, Mostofa Patwary, Shrimai Prabhumoye, Peng Xu, Ryan Prenger, Mohammad Shoeybi, Pascale Fung, Anima Anandkumar, Bryan Catanzaro

    Abstract: Closed-book question answering (QA) requires a model to directly answer an open-domain question without access to any external knowledge. Prior work on closed-book QA either directly finetunes or prompts a pretrained language model (LM) to leverage the stored knowledge. However, they do not fully exploit the parameterized knowledge. To address this issue, we propose a two-stage, closed-book QA fra… ▽ More

    Submitted 27 April, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: 8 pages; Accepted at EACL2023

  13. arXiv:2206.04624  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Factuality Enhanced Language Models for Open-Ended Text Generation

    Authors: Nayeon Lee, Wei Ping, Peng Xu, Mostofa Patwary, Pascale Fung, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: Pretrained language models (LMs) are susceptible to generate text with nonfactual information. In this work, we measure and improve the factual accuracy of large-scale LMs for open-ended text generation. We design the FactualityPrompts test set and metrics to measure the factuality of LM generations. Based on that, we study the factual accuracy of LMs with parameter sizes ranging from 126M to 530B… ▽ More

    Submitted 2 March, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  14. arXiv:2203.08745  [pdf, other

    cs.CL cs.AI

    Multi-Stage Prompting for Knowledgeable Dialogue Generation

    Authors: Zihan Liu, Mostofa Patwary, Ryan Prenger, Shrimai Prabhumoye, Wei Ping, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: Existing knowledge-grounded dialogue systems typically use finetuned versions of a pretrained language model (LM) and large-scale knowledge bases. These models typically fail to generalize on topics outside of the knowledge base, and require maintaining separate potentially large checkpoints each time finetuning is needed. In this paper, we aim to address these limitations by leveraging the inhere… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

  15. arXiv:2202.04173  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

    Authors: Boxin Wang, Wei Ping, Chaowei Xiao, Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Bo Li, Anima Anandkumar, Bryan Catanzaro

    Abstract: Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models. We conduct this study on three dimensions: training corpus, model size, and parameter efficiency. For the training corpus, we propose to leverage the generative power of LMs and generate nontoxic datasets for doma… ▽ More

    Submitted 21 October, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

    Comments: NeurIPS 2022

  16. arXiv:2201.11990  [pdf, other

    cs.CL

    Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

    Authors: Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro

    Abstract: Pretrained general-purpose language models can achieve state-of-the-art accuracies in various natural language processing domains by adapting to downstream tasks via zero-shot, few-shot and fine-tuning techniques. Because of their success, the size of these models has increased rapidly, requiring high-performance hardware, software, and algorithmic techniques to enable training such large models.… ▽ More

    Submitted 4 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: Shaden Smith and Mostofa Patwary contributed equally

  17. arXiv:2110.15669  [pdf, other

    cs.DC

    SDP: Scalable Real-time Dynamic Graph Partitioner

    Authors: Md Anwarul Kaium Patwary, Saurabh Garg, Sudheer Kumar Battula, Byeong Kang

    Abstract: Time-evolving large graph has received attention due to their participation in real-world applications such as social networks and PageRank calculation. It is necessary to partition a large-scale dynamic graph in a streaming manner to overcome the memory bottleneck while partitioning the computational load. Reducing network communication and balancing the load between the partitions are the criter… ▽ More

    Submitted 29 October, 2021; originally announced October 2021.

  18. arXiv:2104.04473  [pdf, other

    cs.CL cs.DC

    Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

    Authors: Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Anand Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia

    Abstract: Large language models have led to state-of-the-art accuracies across a range of tasks. However, training these models efficiently is challenging for two reasons: a) GPU memory capacity is limited, making it impossible to fit large models on even a multi-GPU server, and b) the number of compute operations required to train these models can result in unrealistically long training times. Consequently… ▽ More

    Submitted 23 August, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

    Comments: Accepted to SC 2021

  19. arXiv:2101.00408  [pdf, other

    cs.CL cs.AI

    End-to-End Training of Neural Retrievers for Open-Domain Question Answering

    Authors: Devendra Singh Sachan, Mostofa Patwary, Mohammad Shoeybi, Neel Kant, Wei Ping, William L Hamilton, Bryan Catanzaro

    Abstract: Recent work on training neural retrievers for open-domain question answering (OpenQA) has employed both supervised and unsupervised approaches. However, it remains unclear how unsupervised and supervised methods can be used most effectively for neural retrievers. In this work, we systematically study retriever pre-training. We first propose an approach of unsupervised pre-training with the Inverse… ▽ More

    Submitted 1 June, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: ACL 2021

  20. arXiv:2010.10150  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Local Knowledge Powered Conversational Agents

    Authors: Sashank Santhanam, Wei Ping, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro

    Abstract: State-of-the-art conversational agents have advanced significantly in conjunction with the use of large transformer-based language models. However, even with these advancements, conversational agents still lack the ability to produce responses that are informative and coherent with the local context. In this work, we propose a dialog framework that incorporates both local knowledge as well as user… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

  21. arXiv:2010.06060  [pdf, other

    cs.CL

    BioMegatron: Larger Biomedical Domain Language Model

    Authors: Hoo-Chang Shin, Yang Zhang, Evelina Bakhturina, Raul Puri, Mostofa Patwary, Mohammad Shoeybi, Raghav Mani

    Abstract: There has been an influx of biomedical domain-specific language models, showing language models pre-trained on biomedical text perform better on biomedical domain benchmarks than those trained on general domain text corpora such as Wikipedia and Books. Yet, most works do not study the factors affecting each domain language application deeply. Additionally, the study of model size on domain-specifi… ▽ More

    Submitted 13 October, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: Accepted for publication at EMNLP 2020

  22. arXiv:2010.00840  [pdf, other

    cs.CL

    MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models

    Authors: Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Raul Puri, Pascale Fung, Anima Anandkumar, Bryan Catanzaro

    Abstract: Existing pre-trained large language models have shown unparalleled generative capabilities. However, they are not controllable. In this paper, we propose MEGATRON-CNTRL, a novel framework that uses large-scale language models and adds control to text generation by incorporating an external knowledge base. Our framework consists of a keyword predictor, a knowledge retriever, a contextual knowledge… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Comments: Accepted in EMNLP 2020 main conference

  23. arXiv:2005.06114  [pdf, other

    cs.CL

    Large Scale Multi-Actor Generative Dialog Modeling

    Authors: Alex Boyd, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro

    Abstract: Non-goal oriented dialog agents (i.e. chatbots) aim to produce varying and engaging conversations with a user; however, they typically exhibit either inconsistent personality across conversations or the average personality of all users. This paper addresses these issues by controlling an agent's persona upon generation via conditioning on prior conversations of a target actor. In doing so, we are… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.

  24. arXiv:2003.00395  [pdf, other

    cs.CR cs.DC

    Authentication, Access Control, Privacy, Threats and Trust Management Towards Securing Fog Computing Environments: A Review

    Authors: Abdullah Al-Noman Patwary, Anmin Fu, Ranesh Kumar Naha, Sudheer Kumar Battula, Saurabh Garg, Md Anwarul Kaium Patwary, Erfan Aghasian

    Abstract: Fog computing is an emerging computing paradigm that has come into consideration for the deployment of IoT applications amongst researchers and technology industries over the last few years. Fog is highly distributed and consists of a wide number of autonomous end devices, which contribute to the processing. However, the variety of devices offered across different users are not audited. Hence, the… ▽ More

    Submitted 29 February, 2020; originally announced March 2020.

    Comments: 34 pages, 9 figures

  25. arXiv:2002.09599  [pdf, other

    cs.CL cs.AI

    Training Question Answering Models From Synthetic Data

    Authors: Raul Puri, Ryan Spring, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: Question and answer generation is a data augmentation method that aims to improve question answering (QA) models given the limited amount of human labeled data. However, a considerable gap remains between synthetic and human-generated question-answer pairs. This work aims to narrow this gap by taking advantage of large language models and explores several factors such as model size, quality of pre… ▽ More

    Submitted 21 February, 2020; originally announced February 2020.

  26. arXiv:1909.11822  [pdf, other

    physics.comp-ph cs.LG cs.PF

    DisCo: Physics-Based Unsupervised Discovery of Coherent Structures in Spatiotemporal Systems

    Authors: Adam Rupe, Nalini Kumar, Vladislav Epifanov, Karthik Kashinath, Oleksandr Pavlyk, Frank Schlimbach, Mostofa Patwary, Sergey Maidanov, Victor Lee, Prabhat, James P. Crutchfield

    Abstract: Extracting actionable insight from complex unlabeled scientific data is an open challenge and key to unlocking data-driven discovery in science. Complementary and alternative to supervised machine learning approaches, unsupervised physics-based methods based on behavior-driven theories hold great promise. Due to computational limitations, practical application on real-world domain science problems… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

  27. arXiv:1909.10576  [pdf, other

    cs.NI

    The Potential Short- and Long-Term Disruptions and Transformative Impacts of 5G and Beyond Wireless Networks: Lessons Learnt from the Development of a 5G Testbed Environment

    Authors: Mohmammad N. Patwary, Syed Junaid Nawaz, Md. Abdur Rahman, Shree Krishna Sharma, Md Mamunur Rashid, Stuart J. Barnes

    Abstract: The anticipated deployment cost of 5G communication networks in the UK is predicted to be in between £30bn- £50bn, whereas the current annual capital expenditure of the mobile network operators (MNOs) is £2.5bn. This prospect has vastly impacted and has become one of the major delaying factors for building the 5G physical infrastructure, whereas other areas of 5G developments are progressing at th… ▽ More

    Submitted 31 May, 2020; v1 submitted 23 September, 2019; originally announced September 2019.

    Comments: 22 pages, 9 figures, 11 tables

  28. arXiv:1909.08053  [pdf, other

    cs.CL

    Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

    Authors: Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro

    Abstract: Recent work in language modeling demonstrates that training large transformer models advances the state of the art in Natural Language Processing applications. However, very large models can be quite difficult to train due to memory constraints. In this work, we present our techniques for training very large transformer models and implement a simple, efficient intra-layer model parallel approach t… ▽ More

    Submitted 13 March, 2020; v1 submitted 17 September, 2019; originally announced September 2019.

  29. arXiv:1902.10162  [pdf, other

    cs.AI cs.DM cs.LG

    Coloring Big Graphs with AlphaGoZero

    Authors: Jiayi Huang, Mostofa Patwary, Gregory Diamos

    Abstract: We show that recent innovations in deep reinforcement learning can effectively color very large graphs -- a well-known NP-hard problem with clear commercial applications. Because the Monte Carlo Tree Search with Upper Confidence Bound algorithm used in AlphaGoZero can improve the performance of a given heuristic, our approach allows deep neural networks trained using high performance computing (HP… ▽ More

    Submitted 8 November, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

  30. Window-based Streaming Graph Partitioning Algorithm

    Authors: Md Anwarul kaium Patwary, Saurabh Garg, Byeong Kang

    Abstract: In the recent years, the scale of graph datasets has increased to such a degree that a single machine is not capable of efficiently processing large graphs. Thereby, efficient graph partitioning is necessary for those large graph applications. Traditional graph partitioning generally loads the whole graph data into the memory before performing partitioning; this is not only a time consuming task b… ▽ More

    Submitted 4 February, 2019; originally announced February 2019.

  31. arXiv:1810.10045  [pdf, other

    cs.CL

    Language Modeling at Scale

    Authors: Mostofa Patwary, Milind Chabbi, Heewoo Jun, Jiaji Huang, Gregory Diamos, Kenneth Church

    Abstract: We show how Zipf's Law can be used to scale up language modeling (LM) to take advantage of more training data and more GPUs. LM plays a key role in many important natural language applications such as speech recognition and machine translation. Scaling up LM is important since it is widely accepted by the community that there is no data like more data. Eventually, we would like to train on terabyt… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

  32. arXiv:1712.00409  [pdf, other

    cs.LG stat.ML

    Deep Learning Scaling is Predictable, Empirically

    Authors: Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou

    Abstract: Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve accuracy and result in better products. As DL application domains grow, we would like a deeper understanding of the relationships between training set size, comput… ▽ More

    Submitted 1 December, 2017; originally announced December 2017.

    Comments: 19 pages, 11 figures

  33. arXiv:1709.00086  [pdf, other

    astro-ph.CO cs.CE cs.PF

    Galactos: Computing the Anisotropic 3-Point Correlation Function for 2 Billion Galaxies

    Authors: Brian Friesen, Md. Mostofa Ali Patwary, Brian Austin, Nadathur Satish, Zachary Slepian, Narayanan Sundaram, Deborah Bard, Daniel J Eisenstein, Jack Deslippe, Pradeep Dubey, Prabhat

    Abstract: The nature of dark energy and the complete theory of gravity are two central questions currently facing cosmology. A vital tool for addressing them is the 3-point correlation function (3PCF), which probes deviations from a spatially random distribution of galaxies. However, the 3PCF's formidable computational expense has prevented its application to astronomical surveys comprising millions to bill… ▽ More

    Submitted 31 August, 2017; originally announced September 2017.

    Comments: 11 pages, 7 figures, accepted to SuperComputing 2017

  34. arXiv:1708.05256  [pdf, other

    cs.PF cs.CV cs.LG

    Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data

    Authors: Thorsten Kurth, Jian Zhang, Nadathur Satish, Ioannis Mitliagkas, Evan Racah, Mostofa Ali Patwary, Tareq Malas, Narayanan Sundaram, Wahid Bhimji, Mikhail Smorkalov, Jack Deslippe, Mikhail Shiryaev, Srinivas Sridharan, Prabhat, Pradeep Dubey

    Abstract: This paper presents the first, 15-PetaFLOP Deep Learning system for solving scientific pattern classification problems on contemporary HPC architectures. We develop supervised convolutional architectures for discriminating signals in high-energy physics data as well as semi-supervised architectures for localizing and classifying extreme weather in climate data. Our Intelcaffe-based implementation… ▽ More

    Submitted 17 August, 2017; originally announced August 2017.

    Comments: 12 pages, 9 figures

  35. PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures

    Authors: Md. Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jialin Liu, Peter Sadowski, Evan Racah, Suren Byna, Craig Tull, Wahid Bhimji, Prabhat, Pradeep Dubey

    Abstract: Computing $k$-Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based $O(\log n)$ algorithms have been proposed for computing KNN, due to its inherent sequentiality, linear algorithms are being used in practice. This limits the applicability of such methods to millions of data points, with limited s… ▽ More

    Submitted 27 July, 2016; originally announced July 2016.

    Comments: 11 pages in PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures, Md. Mostofa Ali Patwary et.al., IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016

  36. arXiv:1606.05973  [pdf, other

    cs.DS cs.CV cs.PF

    A New Parallel Algorithm for Two-Pass Connected Component Labeling

    Authors: Siddharth Gupta, Diana Palsetia, Md. Mostofa Ali Patwary, Ankit Agrawal, Alok Choudhary

    Abstract: Connected Component Labeling (CCL) is an important step in pattern recognition and image processing. It assigns labels to the pixels such that adjacent pixels sharing the same features are assigned the same label. Typically, CCL requires several passes over the data. We focus on two-pass technique where each pixel is given a provisional label in the first pass whereas an actual label is assigned i… ▽ More

    Submitted 20 June, 2016; originally announced June 2016.

    Comments: Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014

  37. arXiv:1503.07241  [pdf, other

    cs.PF cs.DB cs.DC

    GraphMat: High performance graph analytics made productive

    Authors: Narayanan Sundaram, Nadathur Rajagopalan Satish, Md Mostofa Ali Patwary, Subramanya R Dulloor, Satya Gautam Vadlamudi, Dipankar Das, Pradeep Dubey

    Abstract: Given the growing importance of large-scale graph analytics, there is a need to improve the performance of graph analysis frameworks without compromising on productivity. GraphMat is our solution to bridge this gap between a user-friendly graph analytics framework and native, hand-optimized code. GraphMat functions by taking vertex programs and mapping them to high performance sparse matrix operat… ▽ More

    Submitted 24 March, 2015; originally announced March 2015.

  38. Fast Algorithms for the Maximum Clique Problem on Massive Graphs with Applications to Overlapping Community Detection

    Authors: Bharath Pattabiraman, Md. Mostofa Ali Patwary, Assefaw H. Gebremedhin, Wei-keng Liao, Alok Choudhary

    Abstract: The maximum clique problem is a well known NP-Hard problem with applications in data mining, network analysis, information retrieval and many other areas related to the World Wide Web. There exist several algorithms for the problem with acceptable runtimes for certain classes of graphs, but many of them are infeasible for massive graphs. We present a new exact algorithm that employs novel pruning… ▽ More

    Submitted 26 November, 2014; originally announced November 2014.

    Comments: 28 pages, 7 figures, 10 tables, 2 algorithms. arXiv admin note: substantial text overlap with arXiv:1209.5818

    Journal ref: Internet Mathematics 2014, Special Issue (WAW'13)

  39. arXiv:1302.6256  [pdf, ps, other

    cs.SI cs.DC cs.DM cs.DS physics.soc-ph

    Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage

    Authors: Ryan A. Rossi, David F. Gleich, Assefaw H. Gebremedhin, Md. Mostofa Ali Patwary

    Abstract: We propose a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. The method exhibits a roughly linear runtime scaling over real-world networks ranging from 1000 to 100 million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. Our method… ▽ More

    Submitted 25 December, 2013; v1 submitted 25 February, 2013; originally announced February 2013.

    Comments: 11 pages

    MSC Class: 05C69 ACM Class: G.2.2

  40. arXiv:1210.5802  [pdf, other

    cs.SI cs.DC cs.DM physics.soc-ph

    What if CLIQUE were fast? Maximum Cliques in Information Networks and Strong Components in Temporal Networks

    Authors: Ryan A. Rossi, David F. Gleich, Assefaw H. Gebremedhin, Md. Mostofa Ali Patwary

    Abstract: Exact maximum clique finders have progressed to the point where we can investigate cliques in million-node social and information networks, as well as find strongly connected components in temporal networks. We use one such finder to study a large collection of modern networks emanating from biological, social, and technological domains. We show inter-relationships between maximum cliques and seve… ▽ More

    Submitted 30 October, 2012; v1 submitted 22 October, 2012; originally announced October 2012.

    MSC Class: 05C69; 05C85; 91D30 ACM Class: G.2.2; H.2.8

  41. arXiv:1209.5818  [pdf, other

    cs.DS cs.IR

    Fast Algorithms for the Maximum Clique Problem on Massive Sparse Graphs

    Authors: Bharath Pattabiraman, Md. Mostofa Ali Patwary, Assefaw H. Gebremedhin, Wei-keng Liao, Alok Choudhary

    Abstract: The maximum clique problem is a well known NP-Hard problem with applications in data mining, network analysis, informatics, and many other areas. Although there exist several algorithms with acceptable runtimes for certain classes of graphs, many of them are infeasible for massive graphs. We present a new exact algorithm that employs novel pruning techniques to very quickly find maximum cliques in… ▽ More

    Submitted 14 November, 2012; v1 submitted 25 September, 2012; originally announced September 2012.

    Comments: 15 pages (including 2-page appendix), 5 tables, 4 figures

    ACM Class: G.2.2