Zum Hauptinhalt springen

Showing 1–5 of 5 results for author: McAfee, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.07713  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining

    Authors: Boxin Wang, Wei Ping, Lawrence McAfee, Peng Xu, Bo Li, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: Pretraining auto-regressive large language models~(LLMs) with retrieval demonstrates better perplexity and factual accuracy by leveraging external databases. However, the size of existing pretrained retrieval-augmented LLM is still limited (e.g., Retro has 7.5B parameters), which limits the effectiveness of instruction tuning and zero-shot generalization. In this work, we introduce Retro 48B, the… ▽ More

    Submitted 29 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: ICML 2024

  2. arXiv:2310.03025  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Retrieval meets Long Context Large Language Models

    Authors: Peng Xu, Wei Ping, Xianchao Wu, Lawrence McAfee, Chen Zhu, Zihan Liu, Sandeep Subramanian, Evelina Bakhturina, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: Extending the context window of large language models (LLMs) is getting popular recently, while the solution of augmenting LLMs with retrieval has existed for years. The natural questions are: i) Retrieval-augmentation versus long context window, which one is better for downstream tasks? ii) Can both methods be combined to get the best of both worlds? In this work, we answer these questions by stu… ▽ More

    Submitted 23 January, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Published at ICLR 2024

  3. arXiv:2304.06762  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study

    Authors: Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, Anima Anandkumar, Bryan Catanzaro

    Abstract: Large decoder-only language models (LMs) can be largely improved in terms of perplexity by retrieval (e.g., RETRO), but its impact on text generation quality and downstream task accuracy is unclear. Thus, it is still an open question: shall we pretrain large autoregressive LMs with retrieval? To answer it, we perform a comprehensive study on a scalable pre-trained retrieval-augmented LM (i.e., RET… ▽ More

    Submitted 20 December, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: EMNLP 2023

  4. arXiv:2205.05198  [pdf, other

    cs.LG cs.CL

    Reducing Activation Recomputation in Large Transformer Models

    Authors: Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: Training large transformer models is one of the most important computational challenges of modern AI. In this paper, we show how to significantly accelerate training of large transformer models by reducing activation recomputation. Activation recomputation is commonly used to work around memory capacity constraints. Rather than storing activations for backpropagation, they are traditionally recomp… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

  5. arXiv:1206.6466  [pdf

    cs.NE cs.MS cs.PL

    Utilizing Static Analysis and Code Generation to Accelerate Neural Networks

    Authors: Lawrence McAfee, Kunle Olukotun

    Abstract: As datasets continue to grow, neural network (NN) applications are becoming increasingly limited by both the amount of available computational power and the ease of developing high-performance applications. Researchers often must have expert systems knowledge to make their algorithms run efficiently. Although available computing power increases rapidly each year, algorithm efficiency is not able t… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)