Zum Hauptinhalt springen

Showing 1–24 of 24 results for author: Ryabinin, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.02532  [pdf, other

    cs.CL

    SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices

    Authors: Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin

    Abstract: As large language models gain widespread adoption, running them efficiently becomes crucial. Recent works on LLM inference use speculative decoding to achieve extreme speedups. However, most of these works implicitly design their algorithms for high-end datacenter hardware. In this work, we ask the opposite question: how fast can we run LLMs on consumer machines? Consumer GPUs can no longer fit th… ▽ More

    Submitted 25 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: preprint

  2. arXiv:2404.05904  [pdf, other

    cs.CL

    The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models

    Authors: Giwon Hong, Aryo Pradipta Gema, Rohit Saxena, Xiaotang Du, Ping Nie, Yu Zhao, Laura Perez-Beltrachini, Max Ryabinin, Xuanli He, Clémentine Fourrier, Pasquale Minervini

    Abstract: Large Language Models (LLMs) have transformed the Natural Language Processing (NLP) landscape with their remarkable ability to understand and generate human-like text. However, these models are prone to ``hallucinations'' -- outputs that do not align with factual reality or the input context. This paper introduces the Hallucinations Leaderboard, an open initiative to quantitatively measure and com… ▽ More

    Submitted 17 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  3. arXiv:2402.12374  [pdf, other

    cs.CL

    Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

    Authors: Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen

    Abstract: As the usage of large language models (LLMs) grows, performing efficient inference with these models becomes increasingly important. While speculative decoding has recently emerged as a promising direction for speeding up inference, existing methods are limited in their ability to scale to larger speculation budgets, and adapt to different hyperparameters and hardware. This paper introduces Sequoi… ▽ More

    Submitted 29 February, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  4. arXiv:2401.06766  [pdf, other

    cs.CL

    Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements

    Authors: Anton Voronov, Lena Wolf, Max Ryabinin

    Abstract: Large language models demonstrate a remarkable capability for learning to solve new tasks from a few examples. The prompt template, or the way the input examples are formatted to obtain the prompt, is an important yet often overlooked aspect of in-context learning. In this work, we conduct a comprehensive study of the template format's influence on the in-context learning performance. We evaluate… ▽ More

    Submitted 6 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: Accepted to Findings of ACL 2024. 24 pages, 10 figures. Code: https://github.com/yandex-research/mind-your-format

  5. arXiv:2312.08361  [pdf, other

    cs.LG cs.DC

    Distributed Inference and Fine-tuning of Large Language Models Over The Internet

    Authors: Alexander Borzunov, Max Ryabinin, Artem Chumachenko, Dmitry Baranchuk, Tim Dettmers, Younes Belkada, Pavel Samygin, Colin Raffel

    Abstract: Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion parameters. However, using these 50B+ models requires high-end hardware, making them inaccessible to most researchers. In this work, we investigate methods for cost-efficient inference and fine-tuning of LLMs, comparing local and distributed strategie… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: Accepted to Conference on Neural Information Processing Systems (NeurIPS) 2023. 20 pages, 3 figures

  6. arXiv:2310.09247  [pdf, other

    cs.CV cs.CL cs.LG

    Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy

    Authors: Anton Baryshnikov, Max Ryabinin

    Abstract: Text-to-image synthesis has recently attracted widespread attention due to rapidly improving quality and numerous practical applications. However, the language understanding capabilities of text-to-image models are still poorly understood, which makes it difficult to reason about prompt formulations that a given model would understand well. In this work, we measure the capability of popular text-t… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  7. arXiv:2303.06865  [pdf, other

    cs.LG cs.AI cs.PF

    FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

    Authors: Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang

    Abstract: The high computational and memory requirements of large language model (LLM) inference make it feasible only with multiple high-end accelerators. Motivated by the emerging demand for latency-insensitive tasks with batched processing, this paper initiates the study of high-throughput LLM inference using limited resources, such as a single commodity GPU. We present FlexGen, a high-throughput generat… ▽ More

    Submitted 12 June, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

  8. arXiv:2302.04841  [pdf, other

    cs.CV cs.LG

    Is This Loss Informative? Faster Text-to-Image Customization by Tracking Objective Dynamics

    Authors: Anton Voronov, Mikhail Khoroshikh, Artem Babenko, Max Ryabinin

    Abstract: Text-to-image generation models represent the next step of evolution in image synthesis, offering a natural way to achieve flexible yet fine-grained control over the result. One emerging area of research is the fast adaptation of large text-to-image models to smaller datasets or new visual concepts. However, many efficient methods of adaptation have a long training time, which limits their practic… ▽ More

    Submitted 1 November, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

    Comments: Accepted to Conference on Neural Information Processing Systems (NeurIPS) 2023. 20 pages, 15 figures. Code: https://github.com/yandex-research/DVAR

  9. arXiv:2301.11913  [pdf, other

    cs.DC cs.LG

    SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

    Authors: Max Ryabinin, Tim Dettmers, Michael Diskin, Alexander Borzunov

    Abstract: Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for specialized HPC clusters. In this work, we consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions. We analyze the performance of existing model-parallel… ▽ More

    Submitted 29 June, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: Accepted to International Conference on Machine Learning (ICML) 2023. 25 pages, 8 figures

  10. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  11. RuCoLA: Russian Corpus of Linguistic Acceptability

    Authors: Vladislav Mikhailov, Tatiana Shamardina, Max Ryabinin, Alena Pestova, Ivan Smurov, Ekaterina Artemova

    Abstract: Linguistic acceptability (LA) attracts the attention of the research community due to its many uses, such as testing the grammatical knowledge of language models and filtering implausible texts with acceptability classifiers. However, the application scope of LA in languages other than English is limited due to the lack of high-quality resources. To this end, we introduce the Russian Corpus of Lin… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Accepted to the EMNLP 2022 main conference

  12. arXiv:2209.01188  [pdf, other

    cs.LG cs.DC

    Petals: Collaborative Inference and Fine-tuning of Large Models

    Authors: Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, Colin Raffel

    Abstract: Many NLP tasks benefit from using large language models (LLMs) that often have more than 100 billion parameters. With the release of BLOOM-176B and OPT-175B, everyone can download pretrained models of this scale. Still, using these models requires high-end hardware unavailable to many researchers. In some cases, LLMs can be used more affordably via RAM offloading or hosted APIs. However, these tec… ▽ More

    Submitted 2 March, 2023; v1 submitted 2 September, 2022; originally announced September 2022.

    Comments: 10 pages, 4 figures. The version 2 updates the benchmarks and the description of the chat application. Source code and docs: https://petals.ml

  13. arXiv:2207.03481  [pdf, other

    cs.LG cs.DC

    Training Transformers Together

    Authors: Alexander Borzunov, Max Ryabinin, Tim Dettmers, Quentin Lhoest, Lucile Saulnier, Michael Diskin, Yacine Jernite, Thomas Wolf

    Abstract: The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large corporations and institutions. Recent work proposes several methods for training such models collaboratively, i.e., by pooling together hardware from many independent parties and training a shared model over the Internet. In this demonstration, w… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: Accepted to NeurIPS 2021 Demonstration Track. 10 pages, 2 figures. Link: https://training-transformers-together.github.io

  14. arXiv:2110.03313  [pdf, other

    cs.LG stat.ML

    Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees

    Authors: Aleksandr Beznosikov, Peter Richtárik, Michael Diskin, Max Ryabinin, Alexander Gasnikov

    Abstract: Variational inequalities in general and saddle point problems in particular are increasingly relevant in machine learning applications, including adversarial learning, GANs, transport and robust optimization. With increasing data and problem sizes necessary to train high performing models across various applications, we need to rely on parallel and distributed computing. However, in distributed tr… ▽ More

    Submitted 2 April, 2023; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Appears in: Advances in Neural Information Processing Systems 35 (NeurIPS 2022). Minor modifications with respect to the NeurIPS version. 73 pages, 9 algorithms, 2 figures, 2 tables

    Journal ref: https://proceedings.neurips.cc/paper_files/paper/2022/hash/5ac1428c23b5da5e66d029646ea3206d-Abstract-Conference.html

  15. arXiv:2106.12066  [pdf, other

    cs.CL cs.LG

    It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning

    Authors: Alexey Tikhonov, Max Ryabinin

    Abstract: Commonsense reasoning is one of the key problems in natural language processing, but the relative scarcity of labeled data holds back the progress for languages other than English. Pretrained cross-lingual models are a source of powerful language-agnostic representations, yet their inherent reasoning capabilities are still actively studied. In this work, we design a simple approach to commonsense… ▽ More

    Submitted 30 November, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

    Comments: Accepted to Findings of ACL 2021. 13 pages, 4 figures. Code: https://github.com/yandex-research/crosslingual_winograd

  16. arXiv:2106.11257  [pdf, other

    cs.LG cs.DC math.OC

    Secure Distributed Training at Scale

    Authors: Eduard Gorbunov, Alexander Borzunov, Michael Diskin, Max Ryabinin

    Abstract: Many areas of deep learning benefit from using increasingly larger neural networks trained on public data, as is the case for pre-trained models for NLP and computer vision. Training such models requires a lot of computational resources (e.g., HPC clusters) that are not available to small research groups and independent researchers. One way to address it is for several smaller groups to pool their… ▽ More

    Submitted 1 January, 2023; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: Accepted to International Conference on Machine Learning (ICML 2022). 61 pages, 10 figures. The version 4 fixes inaccuracies in the proofs of Lemmas E.2 and E.4. Code: https://github.com/yandex-research/btard

  17. arXiv:2106.10207  [pdf, other

    cs.LG cs.DC

    Distributed Deep Learning in Open Collaborations

    Authors: Michael Diskin, Alexey Bukhtiyarov, Max Ryabinin, Lucile Saulnier, Quentin Lhoest, Anton Sinitsin, Dmitry Popov, Dmitry Pyrkin, Maxim Kashirin, Alexander Borzunov, Albert Villanova del Moral, Denis Mazur, Ilia Kobelev, Yacine Jernite, Thomas Wolf, Gennady Pekhimenko

    Abstract: Modern deep learning applications require increasingly more compute to train state-of-the-art models. To address this demand, large corporations and institutions use dedicated High-Performance Computing clusters, whose construction and maintenance are both environmentally costly and well beyond the budget of most organizations. As a result, some research directions become the exclusive domain of a… ▽ More

    Submitted 8 November, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: Accepted to Conference on Neural Information Processing Systems (NeurIPS) 2021. 32 pages, 10 figures. Code: https://github.com/yandex-research/DeDLOC

  18. arXiv:2105.06987  [pdf, other

    cs.LG cs.AI stat.ML

    Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

    Authors: Max Ryabinin, Andrey Malinin, Mark Gales

    Abstract: Ensembles of machine learning models yield improved system performance as well as robust and interpretable uncertainty estimates; however, their inference costs may often be prohibitively high. \emph{Ensemble Distribution Distillation} is an approach that allows a single model to efficiently capture both the predictive performance and uncertainty estimates of an ensemble. For classification, this… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

  19. arXiv:2103.03239  [pdf, other

    cs.LG cs.DC math.OC

    Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices

    Authors: Max Ryabinin, Eduard Gorbunov, Vsevolod Plokhotnyuk, Gennady Pekhimenko

    Abstract: Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes. This approach, known as distributed training, can utilize hundreds of computers via specialized message-passing protocols such as Ring All-Reduce. However, running these protocols at scale requires reliable high-speed networking that is only available in dedicated clusters. In contrast, many r… ▽ More

    Submitted 11 January, 2022; v1 submitted 4 March, 2021; originally announced March 2021.

    Comments: Accepted to Conference on Neural Information Processing Systems (NeurIPS) 2021. Code: https://github.com/yandex-research/moshpit-sgd

  20. arXiv:2010.02598  [pdf, other

    cs.CL cs.LG

    Embedding Words in Non-Vector Space with Unsupervised Graph Learning

    Authors: Max Ryabinin, Sergei Popov, Liudmila Prokhorenkova, Elena Voita

    Abstract: It has become a de-facto standard to represent words as elements of a vector space (word2vec, GloVe). While this approach is convenient, it is unnatural for language: words form a graph with a latent hierarchical structure, and this structure has to be revealed and encoded by word embeddings. We introduce GraphGlove: unsupervised graph word representations which are learned end-to-end. In our sett… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: Accepted as a long paper for EMNLP 2020. 15 pages, 6 figures

  21. arXiv:2002.04013  [pdf, other

    cs.DC cs.LG stat.ML

    Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts

    Authors: Max Ryabinin, Anton Gusev

    Abstract: Many recent breakthroughs in deep learning were achieved by training increasingly larger models on massive datasets. However, training such models can be prohibitively expensive. For instance, the cluster used to train GPT-3 costs over \… ▽ More

    Submitted 21 October, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

    Comments: Advances in Neural Information Processing Systems, 2020. Code URL: https://github.com/mryab/learning-at-home. 16 pages, 6 figures

    Journal ref: Advances in Neural Information Processing Systems 33 (2020) 3659-3672

  22. arXiv:1901.00213  [pdf, other

    stat.ML cs.LG

    A weighted random survival forest

    Authors: Lev V. Utkin, Andrei V. Konstantinov, Viacheslav S. Chukanov, Mikhail V. Kots, Mikhail A. Ryabinin, Anna A. Meldo

    Abstract: A weighted random survival forest is presented in the paper. It can be regarded as a modification of the random forest improving its performance. The main idea underlying the proposed model is to replace the standard procedure of averaging used for estimation of the random survival forest hazard function by weighted avaraging where the weights are assigned to every tree and can be veiwed as traini… ▽ More

    Submitted 1 January, 2019; originally announced January 2019.

  23. arXiv:1705.09620  [pdf, other

    stat.ML cs.LG

    Discriminative Metric Learning with Deep Forest

    Authors: Lev V. Utkin, Mikhail A. Ryabinin

    Abstract: A Discriminative Deep Forest (DisDF) as a metric learning algorithm is proposed in the paper. It is based on the Deep Forest or gcForest proposed by Zhou and Feng and can be viewed as a gcForest modification. The case of the fully supervised learning is studied when the class labels of individual training examples are known. The main idea underlying the algorithm is to assign weights to decision t… ▽ More

    Submitted 25 May, 2017; originally announced May 2017.

    Comments: arXiv admin note: substantial text overlap with arXiv:1704.08715

    MSC Class: 68T10

  24. arXiv:1704.08715  [pdf, other

    stat.ML cs.LG

    A Siamese Deep Forest

    Authors: Lev V. Utkin, Mikhail A. Ryabinin

    Abstract: A Siamese Deep Forest (SDF) is proposed in the paper. It is based on the Deep Forest or gcForest proposed by Zhou and Feng and can be viewed as a gcForest modification. It can be also regarded as an alternative to the well-known Siamese neural networks. The SDF uses a modified training set consisting of concatenated pairs of vectors. Moreover, it defines the class distributions in the deep forest… ▽ More

    Submitted 27 April, 2017; originally announced April 2017.

    MSC Class: 68T10