Zum Hauptinhalt springen

Showing 1–7 of 7 results for author: Merth, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.21075  [pdf, other

    cs.AI cs.CL cs.LG

    Apple Intelligence Foundation Language Models

    Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

    Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  2. arXiv:2407.14057  [pdf, other

    cs.CL cs.AI cs.LG

    LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

    Authors: Qichen Fu, Minsik Cho, Thomas Merth, Sachin Mehta, Mohammad Rastegari, Mahyar Najibi

    Abstract: The inference of transformer-based large language models consists of two sequential stages: 1) a prefilling stage to compute the KV cache of prompts and generate the first token, and 2) a decoding stage to generate subsequent tokens. For long prompts, the KV cache must be computed for all tokens during the prefilling stage, which can significantly increase the time needed to generate the first tok… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  3. arXiv:2404.06910  [pdf, other

    cs.CL cs.AI cs.LG

    Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation

    Authors: Thomas Merth, Qichen Fu, Mohammad Rastegari, Mahyar Najibi

    Abstract: Despite the successes of large language models (LLMs), they exhibit significant drawbacks, particularly when processing long contexts. Their inference cost scales quadratically with respect to sequence length, making it expensive for deployment in some real-world text processing applications, such as retrieval-augmented generation (RAG). Additionally, LLMs also exhibit the "distraction phenomenon"… ▽ More

    Submitted 19 July, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  4. arXiv:2312.11537  [pdf, other

    cs.CV cs.GR

    FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline

    Authors: Chien-Yu Lin, Qichen Fu, Thomas Merth, Karren Yang, Anurag Ranjan

    Abstract: Super-resolution (SR) techniques have recently been proposed to upscale the outputs of neural radiance fields (NeRF) and generate high-quality images with enhanced inference speeds. However, existing NeRF+SR methods increase training overhead by using extra input features, loss functions, and/or expensive training procedures such as knowledge distillation. In this paper, we aim to leverage SR for… ▽ More

    Submitted 20 December, 2023; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: WACV 2024 (Oral)

  5. arXiv:2310.00867  [pdf, other

    cs.CL cs.AI

    Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications

    Authors: Duc N. M Hoang, Minsik Cho, Thomas Merth, Mohammad Rastegari, Zhangyang Wang

    Abstract: Compressing Large Language Models (LLMs) often leads to reduced performance, especially for knowledge-intensive tasks. In this work, we dive into how compression damages LLMs' inherent knowledge and the possible remedies. We start by proposing two conjectures on the nature of the damage: one is certain knowledge being forgotten (or erased) after LLM compression, hence necessitating the compressed… ▽ More

    Submitted 16 February, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

  6. arXiv:2309.04502  [pdf, other

    cs.CV

    On the Efficacy of Multi-scale Data Samplers for Vision Applications

    Authors: Elvis Nunez, Thomas Merth, Anish Prabhu, Mehrdad Farajtabar, Mohammad Rastegari, Sachin Mehta, Maxwell Horton

    Abstract: Multi-scale resolution training has seen an increased adoption across multiple vision tasks, including classification and detection. Training with smaller resolutions enables faster training at the expense of a drop in accuracy. Conversely, training with larger resolutions has been shown to improve performance, but memory constraints often make this infeasible. In this paper, we empirically study… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  7. arXiv:2207.10237  [pdf, other

    cs.CV

    SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks

    Authors: Chien-Yu Lin, Anish Prabhu, Thomas Merth, Sachin Mehta, Anurag Ranjan, Maxwell Horton, Mohammad Rastegari

    Abstract: Recent isotropic networks, such as ConvMixer and vision transformers, have found significant success across visual recognition tasks, matching or outperforming non-isotropic convolutional neural networks (CNNs). Isotropic architectures are particularly well-suited to cross-layer weight sharing, an effective neural network compression technique. In this paper, we perform an empirical evaluation on… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted at ECCV 2022