Zum Hauptinhalt springen

Showing 1–5 of 5 results for author: Youn, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.14112  [pdf, other

    cs.LG cs.AI cs.AR

    FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

    Authors: Haojun Xia, Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Donglin Zhuang, Zhongzhu Zhou, Olatunji Ruwase, Yuxiong He, Shuaiwen Leon Song

    Abstract: Six-bit quantization (FP6) can effectively reduce the size of large language models (LLMs) and preserve the model quality consistently across varied applications. However, existing systems do not provide Tensor Core support for FP6 quantization and struggle to achieve practical performance improvements during LLM inference. It is challenging to support FP6 quantization on GPUs due to (1) unfriendl… ▽ More

    Submitted 3 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: Adding URL link of the source code

  2. arXiv:2312.08583  [pdf, other

    cs.CL stat.ML

    ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

    Authors: Xiaoxia Wu, Haojun Xia, Stephen Youn, Zhen Zheng, Shiyang Chen, Arash Bakhtiari, Michael Wyatt, Reza Yazdani Aminabadi, Yuxiong He, Olatunji Ruwase, Leon Song, Zhewei Yao

    Abstract: This study examines 4-bit quantization methods like GPTQ in large language models (LLMs), highlighting GPTQ's overfitting and limited enhancement in Zero-Shot tasks. While prior works merely focusing on zero-shot measurement, we extend task scope to more generative categories such as code generation and abstractive summarization, in which we found that INT4 quantization can significantly underperf… ▽ More

    Submitted 18 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

  3. arXiv:2310.17723  [pdf, other

    cs.LG cs.CL

    ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers

    Authors: Zhewei Yao, Reza Yazdani Aminabadi, Stephen Youn, Xiaoxia Wu, Elton Zheng, Yuxiong He

    Abstract: Quantization techniques are pivotal in reducing the memory and computational demands of deep neural network inference. Existing solutions, such as ZeroQuant, offer dynamic quantization for models like BERT and GPT but overlook crucial memory-bounded operators and the complexities of per-token quantization. Addressing these gaps, we present a novel, fully hardware-enhanced robust optimized post-tra… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: 8 pages, 2 figures

  4. arXiv:2303.08302  [pdf, other

    cs.LG cs.AI cs.CL

    ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation

    Authors: Zhewei Yao, Xiaoxia Wu, Cheng Li, Stephen Youn, Yuxiong He

    Abstract: Post-training quantization (PTQ) has emerged as a promising technique for mitigating memory consumption and computational costs in large language models (LLMs). However, a systematic examination of various quantization schemes, model families, and quantization bit precision has been absent from the literature. In this paper, we conduct a comprehensive analysis of these factors by investigating the… ▽ More

    Submitted 25 May, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: 25 pages, 4 figures

  5. arXiv:1510.05700  [pdf, other

    cs.SI cs.CY

    Dawn of the Selfie Era: The Whos, Wheres, and Hows of Selfies on Instagram

    Authors: Flávio Souza, Diego de Las Casas, Vinícius Flores, SunBum Youn, Meeyoung Cha, Daniele Quercia, Virgílio Almeida

    Abstract: Online interactions are increasingly involving images, especially those containing human faces, which are naturally attention grabbing and more effective at conveying feelings than text. To understand this new convention of digital culture, we study the collective behavior of sharing selfies on Instagram and present how people appear in selfies and which patterns emerge from such interactions. Ana… ▽ More

    Submitted 19 October, 2015; originally announced October 2015.

    Comments: ACM Conference on Online Social Networks 2015, Stanford University, California, USA

    ACM Class: J.4; H.3.5