Zum Hauptinhalt springen

Showing 1–29 of 29 results for author: Severyn, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.00118  [pdf, other

    cs.CL cs.AI

    Gemma 2: Improving Open Language Models at a Practical Size

    Authors: Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman , et al. (172 additional authors not shown)

    Abstract: In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

  2. arXiv:2407.14622  [pdf, other

    cs.LG cs.AI cs.CL

    BOND: Aligning LLMs with Best-of-N Distillation

    Authors: Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Nino Vieillard, Alexandre Ramé, Bobak Shariari, Sarah Perrin, Abe Friesen, Geoffrey Cideron, Sertan Girgin, Piotr Stanczyk, Andrea Michi, Danila Sinopalnikov, Sabela Ramos, Amélie Héliou, Aliaksei Severyn, Matt Hoffman, Nikola Momchev, Olivier Bachem

    Abstract: Reinforcement learning from human feedback (RLHF) is a key driver of quality and safety in state-of-the-art large language models. Yet, a surprisingly simple and strong inference-time strategy is Best-of-N sampling that selects the best generation among N candidates. In this paper, we propose Best-of-N Distillation (BOND), a novel RLHF algorithm that seeks to emulate Best-of-N but without its sign… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  3. arXiv:2405.19107  [pdf, ps, other

    cs.LG cs.AI

    Offline Regularised Reinforcement Learning for Large Language Models Alignment

    Authors: Pierre Harvey Richemond, Yunhao Tang, Daniel Guo, Daniele Calandriello, Mohammad Gheshlaghi Azar, Rafael Rafailov, Bernardo Avila Pires, Eugene Tarassov, Lucas Spangher, Will Ellsworth, Aliaksei Severyn, Jonathan Mallinson, Lior Shani, Gil Shamir, Rishabh Joshi, Tianqi Liu, Remi Munos, Bilal Piot

    Abstract: The dominant framework for alignment of large language models (LLM), whether through reinforcement learning from human feedback or direct preference optimisation, is to learn from preference data. This involves building datasets where each element is a quadruplet composed of a prompt, two independent responses (completions of the prompt) and a human preference between the two independent responses… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  4. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  5. arXiv:2401.12086  [pdf, other

    cs.CL cs.AI cs.LG

    West-of-N: Synthetic Preference Generation for Improved Reward Modeling

    Authors: Alizée Pace, Jonathan Mallinson, Eric Malmi, Sebastian Krause, Aliaksei Severyn

    Abstract: The success of reinforcement learning from human feedback (RLHF) in language model alignment is strongly dependent on the quality of the underlying reward model. In this paper, we present a novel approach to improve reward model quality by generating synthetic preference data, thereby augmenting the training dataset with on-policy, high-quality preference pairs. Motivated by the promising results… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  6. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  7. arXiv:2305.13514  [pdf, other

    cs.CL cs.LG

    Small Language Models Improve Giants by Rewriting Their Outputs

    Authors: Giorgos Vernikos, Arthur Bražinskas, Jakub Adamek, Jonathan Mallinson, Aliaksei Severyn, Eric Malmi

    Abstract: Despite the impressive performance of large language models (LLMs), they often lag behind specialized models in various tasks. LLMs only use a fraction of the existing training data for in-context learning, while task-specific models harness the full dataset for fine-tuning. In this work, we tackle the problem of leveraging training data to improve the performance of LLMs without fine-tuning. Our… ▽ More

    Submitted 1 February, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted at EACL 2024

  8. arXiv:2212.08410  [pdf, other

    cs.CL cs.LG

    Teaching Small Language Models to Reason

    Authors: Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn

    Abstract: Chain of thought prompting successfully improves the reasoning capabilities of large language models, achieving state of the art results on a range of datasets. However, these reasoning capabilities only appear to emerge in models with a size of over 100 billion parameters. In this paper, we explore the transfer of such reasoning capabilities to models with less than 100 billion parameters via kno… ▽ More

    Submitted 1 June, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

  9. arXiv:2206.07043  [pdf, other

    cs.CL

    Text Generation with Text-Editing Models

    Authors: Eric Malmi, Yue Dong, Jonathan Mallinson, Aleksandr Chuklin, Jakub Adamek, Daniil Mirylenka, Felix Stahlberg, Sebastian Krause, Shankar Kumar, Aliaksei Severyn

    Abstract: Text-editing models have recently become a prominent alternative to seq2seq models for monolingual text-generation tasks such as grammatical error correction, simplification, and style transfer. These tasks share a common trait - they exhibit a large amount of textual overlap between the source and target texts. Text-editing models take advantage of this observation and learn to generate the outpu… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: Accepted as a tutorial at NAACL 2022

  10. arXiv:2205.12209  [pdf, other

    cs.CL

    EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start

    Authors: Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn

    Abstract: We present EdiT5 - a novel semi-autoregressive text-editing model designed to combine the strengths of non-autoregressive text-editing and autoregressive decoding. EdiT5 is faster during inference than conventional sequence-to-sequence (seq2seq) models, while being capable of modelling flexible input-output transformations. This is achieved by decomposing the generation process into three sub-ta… ▽ More

    Submitted 26 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: To be published in Findings of EMNLP 2022

  11. arXiv:2108.01850  [pdf, other

    cs.CL

    Controlled Text Generation as Continuous Optimization with Multiple Constraints

    Authors: Sachin Kumar, Eric Malmi, Aliaksei Severyn, Yulia Tsvetkov

    Abstract: As large-scale language model pretraining pushes the state-of-the-art in text generation, recent work has turned to controlling attributes of the text such models generate. While modifying the pretrained models via fine-tuning remains the popular approach, it incurs a significant computational cost and can be infeasible due to lack of appropriate data. As an alternative, we propose MuCoCO -- a fle… ▽ More

    Submitted 4 August, 2021; originally announced August 2021.

  12. arXiv:2106.03830  [pdf, other

    cs.CL

    A Simple Recipe for Multilingual Grammatical Error Correction

    Authors: Sascha Rothe, Jonathan Mallinson, Eric Malmi, Sebastian Krause, Aliaksei Severyn

    Abstract: This paper presents a simple recipe to train state-of-the-art multilingual Grammatical Error Correction (GEC) models. We achieve this by first proposing a language-agnostic method to generate a large number of synthetic examples. The second ingredient is to use large-scale multilingual language models (up to 11B parameters). Once fine-tuned on language-specific supervised sets we surpass the previ… ▽ More

    Submitted 9 August, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

  13. arXiv:2010.01054  [pdf, other

    cs.CL

    Unsupervised Text Style Transfer with Padded Masked Language Models

    Authors: Eric Malmi, Aliaksei Severyn, Sascha Rothe

    Abstract: We propose Masker, an unsupervised text-editing method for style transfer. To tackle cases when no parallel source-target pairs are available, we train masked language models (MLMs) for both the source and the target domain. Then we find the text spans where the two models disagree the most in terms of likelihood. This allows us to identify the source tokens to delete to transform the source text… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020

  14. arXiv:2003.10687  [pdf, other

    cs.CL

    Felix: Flexible Text Editing Through Tagging and Insertion

    Authors: Jonathan Mallinson, Aliaksei Severyn, Eric Malmi, Guillermo Garrido

    Abstract: We present Felix --- a flexible text-editing approach for generation, designed to derive the maximum benefit from the ideas of decoding with bi-directional contexts and self-supervised pre-training. In contrast to conventional sequence-to-sequence (seq2seq) models, Felix is efficient in low-resource settings and fast at inference time, while being capable of modeling flexible input-output transfor… ▽ More

    Submitted 24 March, 2020; originally announced March 2020.

  15. arXiv:1909.01187  [pdf, other

    cs.CL

    Encode, Tag, Realize: High-Precision Text Editing

    Authors: Eric Malmi, Sebastian Krause, Sascha Rothe, Daniil Mirylenka, Aliaksei Severyn

    Abstract: We propose LaserTagger - a sequence tagging approach that casts text generation as a text editing task. Target texts are reconstructed from the inputs using three main edit operations: keeping a token, deleting it, and adding a phrase before the token. To predict the edit operations, we propose a novel model, which combines a BERT encoder with an autoregressive Transformer decoder. This approach i… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019

  16. Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

    Authors: Sascha Rothe, Shashi Narayan, Aliaksei Severyn

    Abstract: Unsupervised pre-training of large neural models has recently revolutionized Natural Language Processing. By warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language Understanding tasks. In this paper, we demonstrate the e… ▽ More

    Submitted 16 April, 2020; v1 submitted 29 July, 2019; originally announced July 2019.

    Comments: To be published in Transactions of the Association for Computational Linguistics (TACL)

  17. arXiv:1902.08077  [pdf, other

    cs.LG stat.ML

    Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities

    Authors: Octavian-Eugen Ganea, Sylvain Gelly, Gary Bécigneul, Aliaksei Severyn

    Abstract: The Softmax function on top of a final linear layer is the de facto method to output probability distributions in neural networks. In many applications such as language models or text generation, this model has to produce distributions over large output vocabularies. Recently, this has been shown to have limited representational capacity due to its connection with the rank bottleneck in matrix fac… ▽ More

    Submitted 13 May, 2019; v1 submitted 21 February, 2019; originally announced February 2019.

    Journal ref: ICML 2019

  18. arXiv:1808.04736  [pdf, other

    cs.CL

    Adversarial Neural Networks for Cross-lingual Sequence Tagging

    Authors: Heike Adel, Anton Bryl, David Weiss, Aliaksei Severyn

    Abstract: We study cross-lingual sequence tagging with little or no labeled data in the target language. Adversarial training has previously been shown to be effective for training cross-lingual sentence classifiers. However, it is not clear if language-agnostic representations enforced by an adversarial language discriminator will also enable effective transfer for token-level prediction tasks. Therefore,… ▽ More

    Submitted 14 August, 2018; originally announced August 2018.

  19. arXiv:1806.04936  [pdf, other

    cs.CL

    On Accurate Evaluation of GANs for Language Generation

    Authors: Stanislau Semeniuta, Aliaksei Severyn, Sylvain Gelly

    Abstract: Generative Adversarial Networks (GANs) are a promising approach to language generation. The latest works introducing novel GAN models for language generation use n-gram based metrics for evaluation and only report single scores of the best run. In this paper, we argue that this often misrepresents the true picture and does not tell the full story, as GAN models can be extremely sensitive to the ra… ▽ More

    Submitted 18 July, 2019; v1 submitted 13 June, 2018; originally announced June 2018.

  20. Prosody Modifications for Question-Answering in Voice-Only Settings

    Authors: Aleksandr Chuklin, Aliaksei Severyn, Johanne Trippas, Enrique Alfonseca, Hanna Silen, Damiano Spina

    Abstract: Many popular form factors of digital assistants---such as Amazon Echo, Apple Homepod, or Google Home---enable the user to hold a conversation with these systems based only on the speech modality. The lack of a screen presents unique challenges. To satisfy the information need of a user, the presentation of the answer needs to be optimized for such voice-only interactions. In this paper, we propose… ▽ More

    Submitted 2 October, 2019; v1 submitted 11 June, 2018; originally announced June 2018.

    Comments: Shorter version of this paper was accepted to CLEF'2019, Lugano, Switzerland. The final authenticated version is available online at https://doi.org/10.1007/978-3-030-28577-7_12

    ACM Class: H.3.3; H.5.2

    Journal ref: Lecture Notes in Computer Science, vol 11696 CLEF 2019

  21. arXiv:1804.07972  [pdf, other

    cs.CL

    Eval all, trust a few, do wrong to none: Comparing sentence generation models

    Authors: Ondřej Cífka, Aliaksei Severyn, Enrique Alfonseca, Katja Filippova

    Abstract: In this paper, we study recent neural generative models for text generation related to variational autoencoders. Previous works have employed various techniques to control the prior distribution of the latent codes in these models, which is important for sampling performance, but little attention has been paid to reconstruction error. In our study, we follow a rigorous evaluation protocol using a… ▽ More

    Submitted 30 October, 2018; v1 submitted 21 April, 2018; originally announced April 2018.

    Comments: 12 pages (3 page appendix); v2: added hyperparameter settings, clarifications

  22. arXiv:1711.11383  [pdf, other

    stat.ML cs.AI cs.CL cs.LG

    Learning to Learn from Weak Supervision by Full Supervision

    Authors: Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps

    Abstract: In this paper, we propose a method for training neural networks when we have a large set of data with weak labels and a small amount of data with true labels. In our proposed model, we train two neural networks: a target network, the learner and a confidence network, the meta-learner. The target network is optimized to perform a given task and is trained using a large set of unlabeled data that ar… ▽ More

    Submitted 30 November, 2017; originally announced November 2017.

    Comments: Accepted at NIPS Workshop on Meta-Learning (MetaLearn 2017), Long Beach, CA, USA

  23. arXiv:1711.00313  [pdf, other

    cs.LG cs.CL cs.NE stat.ML

    Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision

    Authors: Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps

    Abstract: Training deep neural networks requires massive amounts of training data, but for many tasks only limited labeled data is available. This makes weak supervision attractive, using weak or noisy signals like the output of heuristic methods or user click-through data for training. In a semi-supervised setting, we can use a large set of data with weak labels to pretrain a neural network and then fine-t… ▽ More

    Submitted 7 December, 2017; v1 submitted 1 November, 2017; originally announced November 2017.

  24. arXiv:1704.08803  [pdf, other

    cs.IR cs.CL cs.LG

    Neural Ranking Models with Weak Supervision

    Authors: Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, W. Bruce Croft

    Abstract: Despite the impressive improvements achieved by unsupervised deep neural networks in computer vision and NLP tasks, such improvements have not yet been observed in ranking for information retrieval. The reason may be the complexity of the ranking problem, as it is not obvious how to learn from queries and documents when no supervised signal is available. Hence, in this paper, we propose to train a… ▽ More

    Submitted 29 May, 2017; v1 submitted 28 April, 2017; originally announced April 2017.

    Comments: In proceedings of The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2017)

  25. arXiv:1703.02504  [pdf, other

    cs.CL cs.IR cs.LG

    Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification

    Authors: Jan Deriu, Aurelien Lucchi, Valeria De Luca, Aliaksei Severyn, Simon Müller, Mark Cieliebak, Thomas Hofmann, Martin Jaggi

    Abstract: This paper presents a novel approach for multi-lingual sentiment classification in short texts. This is a challenging task as the amount of training data in languages other than English is very limited. Previously proposed multi-lingual approaches typically require to establish a correspondence to English for which powerful classifiers are already available. In contrast, our method does not requir… ▽ More

    Submitted 7 March, 2017; originally announced March 2017.

    Comments: appearing at WWW 2017 - 26th International World Wide Web Conference

    ACM Class: I.2.7

  26. arXiv:1702.02390  [pdf, other

    cs.CL

    A Hybrid Convolutional Variational Autoencoder for Text Generation

    Authors: Stanislau Semeniuta, Aliaksei Severyn, Erhardt Barth

    Abstract: In this paper we explore the effect of architectural choices on learning a Variational Autoencoder (VAE) for text generation. In contrast to the previously introduced VAE model for text where both the encoder and decoder are RNNs, we propose a novel hybrid architecture that blends fully feed-forward convolutional and deconvolutional components with a recurrent language model. Our architecture exhi… ▽ More

    Submitted 8 February, 2017; originally announced February 2017.

  27. arXiv:1604.01178  [pdf, other

    cs.CL

    Modeling Relational Information in Question-Answer Pairs with Convolutional Neural Networks

    Authors: Aliaksei Severyn, Alessandro Moschitti

    Abstract: In this paper, we propose convolutional neural networks for learning an optimal representation of question and answer sentences. Their main aspect is the use of relational information given by the matches between words from the two members of the pair. The matches are encoded as embeddings with additional parameters (dimensions), which are tuned by the network. These allows for better capturing in… ▽ More

    Submitted 5 April, 2016; originally announced April 2016.

  28. arXiv:1603.06042  [pdf, ps, other

    cs.CL cs.LG cs.NE

    Globally Normalized Transition-Based Neural Networks

    Authors: Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, Michael Collins

    Abstract: We introduce a globally normalized transition-based neural network model that achieves state-of-the-art part-of-speech tagging, dependency parsing and sentence compression results. Our model is a simple feed-forward neural network that operates on a task-specific transition system, yet achieves comparable or better accuracies than recurrent models. We discuss the importance of global as opposed to… ▽ More

    Submitted 8 June, 2016; v1 submitted 18 March, 2016; originally announced March 2016.

  29. arXiv:1603.05118  [pdf, ps, other

    cs.CL

    Recurrent Dropout without Memory Loss

    Authors: Stanislau Semeniuta, Aliaksei Severyn, Erhardt Barth

    Abstract: This paper presents a novel approach to recurrent neural network (RNN) regularization. Differently from the widely adopted dropout method, which is applied to \textit{forward} connections of feed-forward architectures or RNNs, we propose to drop neurons directly in \textit{recurrent} connections in a way that does not cause loss of long-term memory. Our approach is as easy to implement and apply a… ▽ More

    Submitted 5 August, 2016; v1 submitted 16 March, 2016; originally announced March 2016.