-
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Authors:
Pierre Harvey Richemond,
Yunhao Tang,
Daniel Guo,
Daniele Calandriello,
Mohammad Gheshlaghi Azar,
Rafael Rafailov,
Bernardo Avila Pires,
Eugene Tarassov,
Lucas Spangher,
Will Ellsworth,
Aliaksei Severyn,
Jonathan Mallinson,
Lior Shani,
Gil Shamir,
Rishabh Joshi,
Tianqi Liu,
Remi Munos,
Bilal Piot
Abstract:
The dominant framework for alignment of large language models (LLM), whether through reinforcement learning from human feedback or direct preference optimisation, is to learn from preference data. This involves building datasets where each element is a quadruplet composed of a prompt, two independent responses (completions of the prompt) and a human preference between the two independent responses…
▽ More
The dominant framework for alignment of large language models (LLM), whether through reinforcement learning from human feedback or direct preference optimisation, is to learn from preference data. This involves building datasets where each element is a quadruplet composed of a prompt, two independent responses (completions of the prompt) and a human preference between the two independent responses, yielding a preferred and a dis-preferred response. Such data is typically scarce and expensive to collect. On the other hand, \emph{single-trajectory} datasets where each element is a triplet composed of a prompt, a response and a human feedback is naturally more abundant. The canonical element of such datasets is for instance an LLM's response to a user's prompt followed by a user's feedback such as a thumbs-up/down. Consequently, in this work, we propose DRO, or \emph{Direct Reward Optimisation}, as a framework and associated algorithms that do not require pairwise preferences. DRO uses a simple mean-squared objective that can be implemented in various ways. We validate our findings empirically, using T5 encoder-decoder language models, and show DRO's performance over selected baselines such as Kahneman-Tversky Optimization (KTO). Thus, we confirm that DRO is a simple and empirically compelling method for single-trajectory policy optimisation.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
West-of-N: Synthetic Preference Generation for Improved Reward Modeling
Authors:
Alizée Pace,
Jonathan Mallinson,
Eric Malmi,
Sebastian Krause,
Aliaksei Severyn
Abstract:
The success of reinforcement learning from human feedback (RLHF) in language model alignment is strongly dependent on the quality of the underlying reward model. In this paper, we present a novel approach to improve reward model quality by generating synthetic preference data, thereby augmenting the training dataset with on-policy, high-quality preference pairs. Motivated by the promising results…
▽ More
The success of reinforcement learning from human feedback (RLHF) in language model alignment is strongly dependent on the quality of the underlying reward model. In this paper, we present a novel approach to improve reward model quality by generating synthetic preference data, thereby augmenting the training dataset with on-policy, high-quality preference pairs. Motivated by the promising results of Best-of-N sampling strategies in language model training, we extend their application to reward model training. This results in a self-training strategy to generate preference pairs by selecting the best and worst candidates in a pool of responses to a given query. Empirically, we find that this approach improves the performance of any reward model, with an effect comparable to the addition of a similar quantity of human preference data. This work opens up new avenues of research for improving RLHF for language model alignment, by offering synthetic preference generation as a solution to reward modeling challenges.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Small Language Models Improve Giants by Rewriting Their Outputs
Authors:
Giorgos Vernikos,
Arthur Bražinskas,
Jakub Adamek,
Jonathan Mallinson,
Aliaksei Severyn,
Eric Malmi
Abstract:
Despite the impressive performance of large language models (LLMs), they often lag behind specialized models in various tasks. LLMs only use a fraction of the existing training data for in-context learning, while task-specific models harness the full dataset for fine-tuning. In this work, we tackle the problem of leveraging training data to improve the performance of LLMs without fine-tuning. Our…
▽ More
Despite the impressive performance of large language models (LLMs), they often lag behind specialized models in various tasks. LLMs only use a fraction of the existing training data for in-context learning, while task-specific models harness the full dataset for fine-tuning. In this work, we tackle the problem of leveraging training data to improve the performance of LLMs without fine-tuning. Our approach directly targets LLM predictions without requiring access to their weights. We create a pool of candidates from the LLM through few-shot prompting and we employ a compact model, the LM-corrector (LMCor), specifically trained to merge these candidates to produce an enhanced output. Our experiments on four natural language generation tasks demonstrate that even a small LMCor model (250M) substantially improves the few-shot performance of LLMs (62B), matching and even outperforming standard fine-tuning. Furthermore, we illustrate the robustness of LMCor against different prompts, thereby minimizing the need for extensive prompt engineering. Finally, we show that LMCor can be seamlessly integrated with different LLMs at inference, serving as a plug-and-play module to improve their performance.
△ Less
Submitted 1 February, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Teaching Small Language Models to Reason
Authors:
Lucie Charlotte Magister,
Jonathan Mallinson,
Jakub Adamek,
Eric Malmi,
Aliaksei Severyn
Abstract:
Chain of thought prompting successfully improves the reasoning capabilities of large language models, achieving state of the art results on a range of datasets. However, these reasoning capabilities only appear to emerge in models with a size of over 100 billion parameters. In this paper, we explore the transfer of such reasoning capabilities to models with less than 100 billion parameters via kno…
▽ More
Chain of thought prompting successfully improves the reasoning capabilities of large language models, achieving state of the art results on a range of datasets. However, these reasoning capabilities only appear to emerge in models with a size of over 100 billion parameters. In this paper, we explore the transfer of such reasoning capabilities to models with less than 100 billion parameters via knowledge distillation. Specifically, we finetune a student model on the chain of thought outputs generated by a larger teacher model. Our experiments show that the proposed method improves task performance across arithmetic, commonsense and symbolic reasoning datasets. For example, the accuracy of T5 XXL on GSM8K improves from 8.11% to 21.99% when finetuned on PaLM-540B generated chains of thought.
△ Less
Submitted 1 June, 2023; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Reservoir Computing with 3D Nanowire Networks
Authors:
R. K. Daniels,
J. B. Mallinson,
Z. E. Heywood,
P. J. Bones,
M. D. Arnold,
S. A. Brown
Abstract:
Networks of nanowires are currently being explored for a range of applications in brain-like (or neuromorphic) computing, and especially in reservoir computing (RC). Fabrication of real-world computing devices requires that the nanowires are deposited sequentially, leading to stacking of the wires on top of each other. However, most simulations of computational tasks using these systems treat the…
▽ More
Networks of nanowires are currently being explored for a range of applications in brain-like (or neuromorphic) computing, and especially in reservoir computing (RC). Fabrication of real-world computing devices requires that the nanowires are deposited sequentially, leading to stacking of the wires on top of each other. However, most simulations of computational tasks using these systems treat the nanowires as 1D objects lying in a perfectly 2D plane - the effect of stacking on RC performance has not yet been established. Here we use detailed simulations to compare the performance of perfectly 2D and quasi-3D (stacked) networks of nanowires in two tasks: memory capacity and nonlinear transformation. We also show that our model of the junctions between nanowires is general enough to describe a wide range of memristive networks, and consider the impact of physically realistic electrode configurations on performance. We show that the various networks and configurations have a strikingly similar performance in RC tasks, which is surprising given their radically different topologies. Our results show that networks with an experimentally achievable number of electrodes perform close to the upper bounds achievable when using the information from every wire. However, we also show important differences, in particular that the quasi-3D networks are more resilient to changes in the input parameters, generalizing better to noisy training data. Since previous literature suggests that topology plays an important role in computing performance, these results may have important implications for future applications of nanowire networks in neuromorphic computing.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
-
Text Generation with Text-Editing Models
Authors:
Eric Malmi,
Yue Dong,
Jonathan Mallinson,
Aleksandr Chuklin,
Jakub Adamek,
Daniil Mirylenka,
Felix Stahlberg,
Sebastian Krause,
Shankar Kumar,
Aliaksei Severyn
Abstract:
Text-editing models have recently become a prominent alternative to seq2seq models for monolingual text-generation tasks such as grammatical error correction, simplification, and style transfer. These tasks share a common trait - they exhibit a large amount of textual overlap between the source and target texts. Text-editing models take advantage of this observation and learn to generate the outpu…
▽ More
Text-editing models have recently become a prominent alternative to seq2seq models for monolingual text-generation tasks such as grammatical error correction, simplification, and style transfer. These tasks share a common trait - they exhibit a large amount of textual overlap between the source and target texts. Text-editing models take advantage of this observation and learn to generate the output by predicting edit operations applied to the source sequence. In contrast, seq2seq models generate outputs word-by-word from scratch thus making them slow at inference time. Text-editing models provide several benefits over seq2seq models including faster inference speed, higher sample efficiency, and better control and interpretability of the outputs. This tutorial provides a comprehensive overview of text-editing models and current state-of-the-art approaches, and analyzes their pros and cons. We discuss challenges related to productionization and how these models can be used to mitigate hallucination and bias, both pressing challenges in the field of text generation.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start
Authors:
Jonathan Mallinson,
Jakub Adamek,
Eric Malmi,
Aliaksei Severyn
Abstract:
We present EdiT5 - a novel semi-autoregressive text-editing model designed to combine the strengths of non-autoregressive text-editing and autoregressive decoding. EdiT5 is faster during inference than conventional sequence-to-sequence (seq2seq) models, while being capable of modelling flexible input-output transformations.
This is achieved by decomposing the generation process into three sub-ta…
▽ More
We present EdiT5 - a novel semi-autoregressive text-editing model designed to combine the strengths of non-autoregressive text-editing and autoregressive decoding. EdiT5 is faster during inference than conventional sequence-to-sequence (seq2seq) models, while being capable of modelling flexible input-output transformations.
This is achieved by decomposing the generation process into three sub-tasks: (1) tagging to decide on the subset of input tokens to be preserved in the output, (2) re-ordering to define their order in the output text, and (3) insertion to infill the missing tokens that are not present in the input. The tagging and re-ordering steps, which are responsible for generating the largest portion of the output, are non-autoregressive, while the insertion step uses an autoregressive decoder.
Depending on the task, EdiT5 on average requires significantly fewer autoregressive steps, demonstrating speedups of up to 25x when compared to seq2seq models. Quality-wise, EdiT5 is initialized with a pre-trained T5 checkpoint yielding comparable performance to T5 in high-resource settings when evaluated on three NLG tasks: Sentence Fusion, Grammatical Error Correction, and Decontextualization while clearly outperforming T5 in low-resource settings.
△ Less
Submitted 26 October, 2022; v1 submitted 24 May, 2022;
originally announced May 2022.
-
RED-ACE: Robust Error Detection for ASR using Confidence Embeddings
Authors:
Zorik Gekhman,
Dina Zverinski,
Jonathan Mallinson,
Genady Beryozkin
Abstract:
ASR Error Detection (AED) models aim to post-process the output of Automatic Speech Recognition (ASR) systems, in order to detect transcription errors. Modern approaches usually use text-based input, comprised solely of the ASR transcription hypothesis, disregarding additional signals from the ASR model. Instead, we propose to utilize the ASR system's word-level confidence scores for improving AED…
▽ More
ASR Error Detection (AED) models aim to post-process the output of Automatic Speech Recognition (ASR) systems, in order to detect transcription errors. Modern approaches usually use text-based input, comprised solely of the ASR transcription hypothesis, disregarding additional signals from the ASR model. Instead, we propose to utilize the ASR system's word-level confidence scores for improving AED performance. Specifically, we add an ASR Confidence Embedding (ACE) layer to the AED model's encoder, allowing us to jointly encode the confidence scores and the transcribed text into a contextualized representation. Our experiments show the benefits of ASR confidence scores for AED, their complementary effect over the textual signal, as well as the effectiveness and robustness of ACE for combining these signals. To foster further research, we publish a novel AED dataset consisting of ASR outputs on the LibriSpeech corpus with annotated transcription errors.
△ Less
Submitted 26 October, 2022; v1 submitted 14 March, 2022;
originally announced March 2022.
-
A Simple Recipe for Multilingual Grammatical Error Correction
Authors:
Sascha Rothe,
Jonathan Mallinson,
Eric Malmi,
Sebastian Krause,
Aliaksei Severyn
Abstract:
This paper presents a simple recipe to train state-of-the-art multilingual Grammatical Error Correction (GEC) models. We achieve this by first proposing a language-agnostic method to generate a large number of synthetic examples. The second ingredient is to use large-scale multilingual language models (up to 11B parameters). Once fine-tuned on language-specific supervised sets we surpass the previ…
▽ More
This paper presents a simple recipe to train state-of-the-art multilingual Grammatical Error Correction (GEC) models. We achieve this by first proposing a language-agnostic method to generate a large number of synthetic examples. The second ingredient is to use large-scale multilingual language models (up to 11B parameters). Once fine-tuned on language-specific supervised sets we surpass the previous state-of-the-art results on GEC benchmarks in four languages: English, Czech, German and Russian. Having established a new set of baselines for GEC, we make our results easily reproducible and accessible by releasing a cLang-8 dataset. It is produced by using our best model, which we call gT5, to clean the targets of a widely used yet noisy lang-8 dataset. cLang-8 greatly simplifies typical GEC training pipelines composed of multiple fine-tuning stages -- we demonstrate that performing a single fine-tuning step on cLang-8 with the off-the-shelf language models yields further accuracy improvements over an already top-performing gT5 model for English.
△ Less
Submitted 9 August, 2022; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Felix: Flexible Text Editing Through Tagging and Insertion
Authors:
Jonathan Mallinson,
Aliaksei Severyn,
Eric Malmi,
Guillermo Garrido
Abstract:
We present Felix --- a flexible text-editing approach for generation, designed to derive the maximum benefit from the ideas of decoding with bi-directional contexts and self-supervised pre-training. In contrast to conventional sequence-to-sequence (seq2seq) models, Felix is efficient in low-resource settings and fast at inference time, while being capable of modeling flexible input-output transfor…
▽ More
We present Felix --- a flexible text-editing approach for generation, designed to derive the maximum benefit from the ideas of decoding with bi-directional contexts and self-supervised pre-training. In contrast to conventional sequence-to-sequence (seq2seq) models, Felix is efficient in low-resource settings and fast at inference time, while being capable of modeling flexible input-output transformations. We achieve this by decomposing the text-editing task into two sub-tasks: tagging to decide on the subset of input tokens and their order in the output text and insertion to in-fill the missing tokens in the output not present in the input. The tagging model employs a novel Pointer mechanism, while the insertion model is based on a Masked Language Model. Both of these models are chosen to be non-autoregressive to guarantee faster inference. Felix performs favourably when compared to recent text-editing methods and strong seq2seq baselines when evaluated on four NLG tasks: Sentence Fusion, Machine Translation Automatic Post-Editing, Summarization, and Text Simplification.
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
Controllable Sentence Simplification: Employing Syntactic and Lexical Constraints
Authors:
Jonathan Mallinson,
Mirella Lapata
Abstract:
Sentence simplification aims to make sentences easier to read and understand. Recent approaches have shown promising results with sequence-to-sequence models which have been developed assuming homogeneous target audiences. In this paper we argue that different users have different simplification needs (e.g. dyslexics vs. non-native speakers), and propose CROSS, ContROllable Sentence Simplification…
▽ More
Sentence simplification aims to make sentences easier to read and understand. Recent approaches have shown promising results with sequence-to-sequence models which have been developed assuming homogeneous target audiences. In this paper we argue that different users have different simplification needs (e.g. dyslexics vs. non-native speakers), and propose CROSS, ContROllable Sentence Simplification model, which allows to control both the level of simplicity and the type of the simplification. We achieve this by enriching a Transformer-based architecture with syntactic and lexical constraints (which can be set or learned from data). Empirical results on two benchmark datasets show that constraints are key to successful simplification, offering flexible generation output.
△ Less
Submitted 10 October, 2019;
originally announced October 2019.
-
Stable Self-Assembled Atomic-Switch Networks for Neuromorphic Applications
Authors:
Saurabh K. Bose,
Joshua B. Mallinson,
Rodrigo M. Gazoni,
Simon A. Brown
Abstract:
Nature inspired neuromorphic architectures are being explored as an alternative to imminent limitations of conventional complementary metal-oxide semiconductor (CMOS) architectures. Utilization of such architectures for practical applications like advanced pattern recognition tasks will require synaptic connections that are both reconfigurable and stable. Here, we report realization of stable atom…
▽ More
Nature inspired neuromorphic architectures are being explored as an alternative to imminent limitations of conventional complementary metal-oxide semiconductor (CMOS) architectures. Utilization of such architectures for practical applications like advanced pattern recognition tasks will require synaptic connections that are both reconfigurable and stable. Here, we report realization of stable atomic-switch networks (ASN), with inherent complex connectivity, self-assembled from percolating metal nanoparticles (NPs). The device conductance reflects the configuration of synapses which can be modulated via voltage stimulus. By controlling Relative Humidity (RH) and oxygen partial-pressure during NP deposition we obtain stochastic conductance switching that is stable over several months. Detailed characterization reveals signatures of electric-field induced atomic-wire formation within the tunnel-gaps of the oxidized percolating network. Finally we show that the synaptic structure can be reconfigured by stimulating at different repetition rates, which can be utilized as short-term to long-term memory conversion. This demonstration of stable stochastic switching in ASNs provides a promising route to hardware implementation of biological neuronal models and, as an example, we highlight possible applications in Reservoir Computing (RC).
△ Less
Submitted 27 December, 2017;
originally announced December 2017.
-
Learning to Paraphrase for Question Answering
Authors:
Li Dong,
Jonathan Mallinson,
Siva Reddy,
Mirella Lapata
Abstract:
Question answering (QA) systems are sensitive to the many different ways natural language expresses the same information need. In this paper we turn to paraphrases as a means of capturing this knowledge and present a general framework which learns felicitous paraphrases for various QA tasks. Our method is trained end-to-end using question-answer pairs as a supervision signal. A question and its pa…
▽ More
Question answering (QA) systems are sensitive to the many different ways natural language expresses the same information need. In this paper we turn to paraphrases as a means of capturing this knowledge and present a general framework which learns felicitous paraphrases for various QA tasks. Our method is trained end-to-end using question-answer pairs as a supervision signal. A question and its paraphrases serve as input to a neural scoring model which assigns higher weights to linguistic expressions most likely to yield correct answers. We evaluate our approach on QA over Freebase and answer sentence selection. Experimental results on three datasets show that our framework consistently improves performance, achieving competitive results despite the use of simple QA models.
△ Less
Submitted 20 August, 2017;
originally announced August 2017.
-
Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext
Authors:
John Wieting,
Jonathan Mallinson,
Kevin Gimpel
Abstract:
We consider the problem of learning general-purpose, paraphrastic sentence embeddings in the setting of Wieting et al. (2016b). We use neural machine translation to generate sentential paraphrases via back-translation of bilingual sentence pairs. We evaluate the paraphrase pairs by their ability to serve as training data for learning paraphrastic sentence embeddings. We find that the data quality…
▽ More
We consider the problem of learning general-purpose, paraphrastic sentence embeddings in the setting of Wieting et al. (2016b). We use neural machine translation to generate sentential paraphrases via back-translation of bilingual sentence pairs. We evaluate the paraphrase pairs by their ability to serve as training data for learning paraphrastic sentence embeddings. We find that the data quality is stronger than prior work based on bitext and on par with manually-written English paraphrase pairs, with the advantage that our approach can scale up to generate large training sets for many languages and domains. We experiment with several language pairs and data sources, and develop a variety of data filtering techniques. In the process, we explore how neural machine translation output differs from human-written sentences, finding clear differences in length, the amount of repetition, and the use of rare words.
△ Less
Submitted 6 June, 2017;
originally announced June 2017.