Skip to main content

Showing 1–15 of 15 results for author: Mesnard, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.07839  [pdf, other

    cs.LG cs.AI cs.CL

    RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

    Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

    Abstract: We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned var… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  2. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  3. arXiv:2402.04792  [pdf, other

    cs.AI cs.CL cs.HC

    Direct Language Model Alignment from Online AI Feedback

    Authors: Shangmin Guo, Biao Zhang, Tianlin Liu, Tianqi Liu, Misha Khalman, Felipe Llinares, Alexandre Rame, Thomas Mesnard, Yao Zhao, Bilal Piot, Johan Ferret, Mathieu Blondel

    Abstract: Direct alignment from preferences (DAP) methods, such as DPO, have recently emerged as efficient alternatives to reinforcement learning from human feedback (RLHF), that do not require a separate reward model. However, the preference datasets used in DAP methods are usually collected ahead of training and never updated, thus the feedback is purely offline. Moreover, responses in these datasets are… ▽ More

    Submitted 29 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 18 pages, 9 figures, 4 tables

  4. arXiv:2312.01072  [pdf, other

    cs.LG cs.AI

    A Survey of Temporal Credit Assignment in Deep Reinforcement Learning

    Authors: Eduardo Pignatelli, Johan Ferret, Matthieu Geist, Thomas Mesnard, Hado van Hasselt, Olivier Pietquin, Laura Toni

    Abstract: The Credit Assignment Problem (CAP) refers to the longstanding challenge of Reinforcement Learning (RL) agents to associate actions with their long-term consequences. Solving the CAP is a crucial step towards the successful deployment of RL in the real world since most decision problems provide feedback that is noisy, delayed, and with little or no information about the causes. These conditions ma… ▽ More

    Submitted 4 July, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

    Comments: 56 pages, 2 figures, 4 tables

  5. arXiv:2312.00886  [pdf, other

    stat.ML cs.AI cs.GT cs.LG cs.MA

    Nash Learning from Human Feedback

    Authors: Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot

    Abstract: Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human preferences. Typically, RLHF involves the initial step of learning a reward model from human feedback, often expressed as preferences between pairs of text generations produced by a pre-trained LLM. Subsequently, the LLM's policy is fine-tuned by optimizing it to… ▽ More

    Submitted 11 June, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

  6. arXiv:2309.00267  [pdf, other

    cs.CL cs.AI cs.LG

    RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

    Authors: Harrison Lee, Samrat Phatale, Hassan Mansoor, Thomas Mesnard, Johan Ferret, Kellie Lu, Colton Bishop, Ethan Hall, Victor Carbune, Abhinav Rastogi, Sushant Prakash

    Abstract: Reinforcement learning from human feedback (RLHF) has proven effective in aligning large language models (LLMs) with human preferences. However, gathering high-quality human preference labels can be a time-consuming and expensive endeavor. RL from AI Feedback (RLAIF), introduced by Bai et al., offers a promising alternative that leverages a powerful off-the-shelf LLM to generate preferences in lie… ▽ More

    Submitted 30 November, 2023; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: Added two more tasks and many more experiments and analyses (e.g. same-size RLAIF, direct RLAIF, cost analysis)

  7. arXiv:2211.10515  [pdf, other

    stat.ML cs.LG

    Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments

    Authors: Daniel Jarrett, Corentin Tallec, Florent Altché, Thomas Mesnard, Rémi Munos, Michal Valko

    Abstract: Consider the problem of exploration in sparse-reward or reward-free environments, such as in Montezuma's Revenge. In the curiosity-driven paradigm, the agent is rewarded for how much each realized outcome differs from their predicted outcome. But using predictive error as intrinsic motivation is fragile in stochastic environments, as the agent may become trapped by high-entropy areas of the state-… ▽ More

    Submitted 14 July, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Journal ref: In Proc. 40th International Conference on Machine Learning (ICML 2023)

  8. arXiv:2101.02055  [pdf, other

    cs.LG

    Geometric Entropic Exploration

    Authors: Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Avila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos

    Abstract: Exploration is essential for solving complex Reinforcement Learning (RL) tasks. Maximum State-Visitation Entropy (MSVE) formulates the exploration problem as a well-defined policy optimization problem whose solution aims at visiting all states as uniformly as possible. This is in contrast to standard uncertainty-based approaches where exploration is transient and eventually vanishes. However, exis… ▽ More

    Submitted 7 January, 2021; v1 submitted 6 January, 2021; originally announced January 2021.

  9. arXiv:2011.09464  [pdf, other

    cs.LG

    Counterfactual Credit Assignment in Model-Free Reinforcement Learning

    Authors: Thomas Mesnard, Théophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Tom Stepleton, Nicolas Heess, Arthur Guez, Éric Moulines, Marcus Hutter, Lars Buesing, Rémi Munos

    Abstract: Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards. In particular, this requires separating skill from luck, i.e. disentangling the effect of an action on rewards from that of external factors and subsequent actions. To achieve this, we adapt the notion of counterfactuals from causality theory to a model-free RL setup. The key idea is to… ▽ More

    Submitted 14 December, 2021; v1 submitted 18 November, 2020; originally announced November 2020.

  10. arXiv:1912.02503  [pdf, other

    cs.LG stat.ML

    Hindsight Credit Assignment

    Authors: Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Greg Wayne, Satinder Singh, Doina Precup, Remi Munos

    Abstract: We consider the problem of efficient credit assignment in reinforcement learning. In order to efficiently and meaningfully utilize new data, we propose to explicitly assign credit to past decisions based on the likelihood of them having led to the observed outcome. This approach uses new information in hindsight, rather than employing foresight. Somewhat surprisingly, we show that value functions… ▽ More

    Submitted 5 December, 2019; originally announced December 2019.

    Comments: NeurIPS 2019

  11. arXiv:1911.08585  [pdf, other

    q-bio.NC cs.LG stat.ML

    Ghost Units Yield Biologically Plausible Backprop in Deep Neural Networks

    Authors: Thomas Mesnard, Gaetan Vignoud, Joao Sacramento, Walter Senn, Yoshua Bengio

    Abstract: In the past few years, deep learning has transformed artificial intelligence research and led to impressive performance in various difficult tasks. However, it is still unclear how the brain can perform credit assignment across many areas as efficiently as backpropagation does in deep neural networks. In this paper, we introduce a model that relies on a new role for a neuronal inhibitory machinery… ▽ More

    Submitted 15 November, 2019; originally announced November 2019.

  12. arXiv:1808.04873  [pdf, other

    cs.LG stat.ML

    Generalization of Equilibrium Propagation to Vector Field Dynamics

    Authors: Benjamin Scellier, Anirudh Goyal, Jonathan Binas, Thomas Mesnard, Yoshua Bengio

    Abstract: The biological plausibility of the backpropagation algorithm has long been doubted by neuroscientists. Two major reasons are that neurons would need to send two different types of signal in the forward and backward phases, and that pairs of neurons would need to communicate through symmetric bidirectional connections. We present a simple two-phase learning procedure for fixed point recurrent netwo… ▽ More

    Submitted 14 August, 2018; originally announced August 2018.

  13. arXiv:1612.03214  [pdf, other

    cs.LG cs.NE q-bio.NC

    Towards deep learning with spiking neurons in energy based models with contrastive Hebbian plasticity

    Authors: Thomas Mesnard, Wulfram Gerstner, Johanni Brea

    Abstract: In machine learning, error back-propagation in multi-layer neural networks (deep learning) has been impressively successful in supervised and reinforcement learning tasks. As a model for learning in the brain, however, deep learning has long been regarded as implausible, since it relies in its basic form on a non-local plasticity rule. To overcome this problem, energy-based models with local contr… ▽ More

    Submitted 9 December, 2016; originally announced December 2016.

  14. arXiv:1509.05936  [pdf, other

    cs.NE cs.LG q-bio.NC

    STDP as presynaptic activity times rate of change of postsynaptic activity

    Authors: Yoshua Bengio, Thomas Mesnard, Asja Fischer, Saizheng Zhang, Yuhuai Wu

    Abstract: We introduce a weight update formula that is expressed only in terms of firing rates and their derivatives and that results in changes consistent with those associated with spike-timing dependent plasticity (STDP) rules and biological observations, even though the explicit timing of spikes is not needed. The new rule changes a synaptic weight in proportion to the product of the presynaptic firing… ▽ More

    Submitted 21 March, 2016; v1 submitted 19 September, 2015; originally announced September 2015.

  15. arXiv:1502.04156  [pdf, other

    cs.LG

    Towards Biologically Plausible Deep Learning

    Authors: Yoshua Bengio, Dong-Hyun Lee, Jorg Bornschein, Thomas Mesnard, Zhouhan Lin

    Abstract: Neuroscientists have long criticised deep learning algorithms as incompatible with current knowledge of neurobiology. We explore more biologically plausible versions of deep representation learning, focusing here mostly on unsupervised learning but developing a learning mechanism that could account for supervised, unsupervised and reinforcement learning. The starting point is that the basic learni… ▽ More

    Submitted 8 August, 2016; v1 submitted 13 February, 2015; originally announced February 2015.