Zum Hauptinhalt springen

Showing 1–8 of 8 results for author: LeGresley, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11704  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-4 340B Technical Report

    Authors: Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek , et al. (58 additional authors not shown)

    Abstract: We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be… ▽ More

    Submitted 6 August, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2402.16819  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-4 15B Technical Report

    Authors: Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Mostofa Patwary, Sandeep Subramanian, Dan Su, Chen Zhu, Deepak Narayanan, Aastha Jhunjhunwala, Ayush Dattagupta, Vibhu Jawa, Jiwei Liu, Ameya Mahabaleshwarkar, Osvald Nitski, Annika Brundyn, James Maki, Miguel Martinez, Jiaxuan You, John Kamalu, Patrick LeGresley, Denys Fridman, Jared Casper, Ashwath Aithal, Oleksii Kuchaiev, Mohammad Shoeybi , et al. (2 additional authors not shown)

    Abstract: We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remai… ▽ More

    Submitted 27 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  3. arXiv:2201.11990  [pdf, other

    cs.CL

    Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

    Authors: Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro

    Abstract: Pretrained general-purpose language models can achieve state-of-the-art accuracies in various natural language processing domains by adapting to downstream tasks via zero-shot, few-shot and fine-tuning techniques. Because of their success, the size of these models has increased rapidly, requiring high-performance hardware, software, and algorithmic techniques to enable training such large models.… ▽ More

    Submitted 4 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: Shaden Smith and Mostofa Patwary contributed equally

  4. arXiv:2104.04473  [pdf, other

    cs.CL cs.DC

    Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

    Authors: Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Anand Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia

    Abstract: Large language models have led to state-of-the-art accuracies across a range of tasks. However, training these models efficiently is challenging for two reasons: a) GPU memory capacity is limited, making it impossible to fit large models on even a multi-GPU server, and b) the number of compute operations required to train these models can result in unrealistically long training times. Consequently… ▽ More

    Submitted 23 August, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

    Comments: Accepted to SC 2021

  5. arXiv:1912.11683  [pdf, other

    cs.CV cs.LG eess.IV

    Neural ODEs for Image Segmentation with Level Sets

    Authors: Rafael Valle, Fitsum Reda, Mohammad Shoeybi, Patrick Legresley, Andrew Tao, Bryan Catanzaro

    Abstract: We propose a novel approach for image segmentation that combines Neural Ordinary Differential Equations (NODEs) and the Level Set method. Our approach parametrizes the evolution of an initial contour with a NODE that implicitly learns from data a speed function describing the evolution. In addition, for cases where an initial contour is not available and to alleviate the need for careful choice or… ▽ More

    Submitted 25 December, 2019; originally announced December 2019.

  6. arXiv:1909.08053  [pdf, other

    cs.CL

    Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

    Authors: Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro

    Abstract: Recent work in language modeling demonstrates that training large transformer models advances the state of the art in Natural Language Processing applications. However, very large models can be quite difficult to train due to memory constraints. In this work, we present our techniques for training very large transformer models and implement a simple, efficient intra-layer model parallel approach t… ▽ More

    Submitted 13 March, 2020; v1 submitted 17 September, 2019; originally announced September 2019.

  7. arXiv:1512.02595  [pdf, other

    cs.CL

    Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

    Authors: Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh , et al. (9 additional authors not shown)

    Abstract: We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our app… ▽ More

    Submitted 8 December, 2015; originally announced December 2015.

  8. arXiv:1310.7556  [pdf, other

    physics.comp-ph cs.DC hep-ex

    First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC

    Authors: V. Halyo, P. LeGresley, P. Lujan, V. Karpusenko, A. Vladimirov

    Abstract: Recent innovations focused around {\em parallel} processing, either through systems containing multiple processors or processors containing multiple cores, hold great promise for enhancing the performance of the trigger at the LHC and extending its physics program. The flexibility of the CMS/ATLAS trigger system allows for easy integration of computational accelerators, such as NVIDIA's Tesla Grap… ▽ More

    Submitted 3 February, 2014; v1 submitted 28 October, 2013; originally announced October 2013.

    Comments: 13 pages, 4 figures, Accepted to JINST