Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: Panferov, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.10994  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Panza: A Personalized Text Writing Assistant via Data Playback and Local Fine-Tuning

    Authors: Armand Nicolicioiu, Eugenia Iofinova, Eldar Kurtic, Mahdi Nikdan, Andrei Panferov, Ilia Markov, Nir Shavit, Dan Alistarh

    Abstract: The availability of powerful open-source large language models (LLMs) opens exciting use-cases, such as automated personal assistants that adapt to the user's unique data and demands. Two key desiderata for such assistants are personalization-in the sense that the assistant should reflect the user's own style-and privacy-in the sense that users may prefer to always store their personal data locall… ▽ More

    Submitted 24 June, 2024; originally announced July 2024.

    Comments: Panza is available at https://github.com/IST-DASLab/PanzaMail

  2. arXiv:2401.06118  [pdf, other

    cs.LG cs.CL

    Extreme Compression of Large Language Models via Additive Quantization

    Authors: Vage Egiazarian, Andrei Panferov, Denis Kuznedelev, Elias Frantar, Artem Babenko, Dan Alistarh

    Abstract: The emergence of accurate open large language models (LLMs) has led to a race towards performant quantization techniques which can enable their execution on end-user devices. In this paper, we revisit the problem of ``extreme'' LLM compression -- defined as targeting extremely low bit counts, such as 2 to 3 bits per parameter -- from the point of view of classic methods in Multi-Codebook Quantizat… ▽ More

    Submitted 8 June, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: ICML, 2024

  3. arXiv:2401.05518  [pdf, other

    cs.LG cs.AI cs.DC math.OC

    Correlated Quantization for Faster Nonconvex Distributed Optimization

    Authors: Andrei Panferov, Yury Demidovich, Ahmad Rammal, Peter Richtárik

    Abstract: Quantization (Alistarh et al., 2017) is an important (stochastic) compression technique that reduces the volume of transmitted bits during each communication round in distributed model training. Suresh et al. (2022) introduce correlated quantizers and show their advantages over independent counterparts by analyzing distributed SGD communication complexity. We analyze the forefront distributed non-… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.