Skip to main content

Showing 1–50 of 86 results for author: Pietquin, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19188  [pdf, other

    cs.LG

    Averaging log-likelihoods in direct alignment

    Authors: Nathan Grinsztajn, Yannis Flet-Berliac, Mohammad Gheshlaghi Azar, Florian Strub, Bill Wu, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Olivier Pietquin, Matthieu Geist

    Abstract: To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to learn such a fine-tuned model directly from a preference dataset without computing a proxy reward function. These methods are built upon contrastive losses involvin… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.19185  [pdf, other

    cs.LG

    Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

    Authors: Yannis Flet-Berliac, Nathan Grinsztajn, Florian Strub, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Mohammad Gheshlaghi Azar, Olivier Pietquin, Matthieu Geist

    Abstract: Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment. The recently introduced direct alignment methods, which are often simpler, more stable, and computationally lighter, can more directly achieve this. However, these approaches cannot optimize arbitrary rewards, and the preference-… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.01660  [pdf, other

    cs.LG cs.AI stat.ML

    Self-Improving Robust Preference Optimization

    Authors: Eugene Choi, Arash Ahmadian, Matthieu Geist, Oilvier Pietquin, Mohammad Gheshlaghi Azar

    Abstract: Both online and offline RLHF methods such as PPO and DPO have been extremely successful in aligning AI with human preferences. Despite their success, the existing methods suffer from a fundamental problem that their optimal solution is highly task-dependent (i.e., not robust to out-of-distribution (OOD) tasks). Here we address this challenge by proposing Self-Improving Robust Preference Optimizati… ▽ More

    Submitted 7 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  4. arXiv:2404.19409  [pdf, other

    cs.CL

    Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning

    Authors: Mathieu Rita, Florian Strub, Rahma Chaabouni, Paul Michel, Emmanuel Dupoux, Olivier Pietquin

    Abstract: While Reinforcement Learning (RL) has been proven essential for tuning large language models (LLMs), it can lead to reward over-optimization (ROO). Existing approaches address ROO by adding KL regularization, requiring computationally expensive hyperparameter tuning. Additionally, KL regularization focuses solely on regularizing the language policy, neglecting a potential source of regularization:… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  5. arXiv:2403.11958  [pdf, other

    cs.CL cs.MA

    Language Evolution with Deep Learning

    Authors: Mathieu Rita, Paul Michel, Rahma Chaabouni, Olivier Pietquin, Emmanuel Dupoux, Florian Strub

    Abstract: Computational modeling plays an essential role in the study of language emergence. It aims to simulate the conditions and learning processes that could trigger the emergence of a structured language within a simulated controlled environment. Several methods have been used to investigate the origin of our language, including agent-based systems, Bayesian agents, genetic algorithms, and rule-based s… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: to appear in the Oxford Handbook of Approaches to Language Evolution

  6. arXiv:2403.03552  [pdf, other

    cs.GT cs.LG cs.MA eess.SY

    Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning

    Authors: Zida Wu, Mathieu Lauriere, Samuel Jia Cong Chua, Matthieu Geist, Olivier Pietquin, Ankur Mehta

    Abstract: Mean Field Games (MFGs) have the ability to handle large-scale multi-agent systems, but learning Nash equilibria in MFGs remains a challenging task. In this paper, we propose a deep reinforcement learning (DRL) algorithm that achieves population-dependent Nash equilibrium without the need for averaging or sampling from history, inspired by Munchausen RL and Online Mirror Descent. Through the desig… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  7. arXiv:2402.14740  [pdf, other

    cs.LG

    Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

    Authors: Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

    Abstract: AI alignment in the shape of Reinforcement Learning from Human Feedback (RLHF) is increasingly treated as a crucial ingredient for high performance large language models. Proximal Policy Optimization (PPO) has been positioned by recent literature as the canonical method for the RL part of RLHF. However, it involves both high computational cost and sensitive hyperparameter tuning. We posit that mos… ▽ More

    Submitted 26 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 27 pages, 7 figures, 2 tables

    ACM Class: I.2.7

  8. arXiv:2402.04229  [pdf, other

    cs.LG cs.SD eess.AS

    MusicRL: Aligning Music Generation to Human Preferences

    Authors: Geoffrey Cideron, Sertan Girgin, Mauro Verzetti, Damien Vincent, Matej Kastelic, Zalán Borsos, Brian McWilliams, Victor Ungureanu, Olivier Bachem, Olivier Pietquin, Matthieu Geist, Léonard Hussenot, Neil Zeghidour, Andrea Agostinelli

    Abstract: We propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the specific intention behind a caption are user-dependent (e.g. a caption such as "upbeat work-out music" can map to a retro guitar solo or a techno pop beat). Not only this makes supervised training of such… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  9. arXiv:2312.10787  [pdf, other

    cs.GT cs.LG cs.MA math.OC

    Learning Discrete-Time Major-Minor Mean Field Games

    Authors: Kai Cui, Gökçe Dayanıklı, Mathieu Laurière, Matthieu Geist, Olivier Pietquin, Heinz Koeppl

    Abstract: Recent techniques based on Mean Field Games (MFGs) allow the scalable analysis of multi-player games with many similar, rational agents. However, standard MFGs remain limited to homogeneous players that weakly influence each other, and cannot model major players that strongly influence other players, severely limiting the class of problems that can be handled. We propose a novel discrete time vers… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024

  10. arXiv:2312.01072  [pdf, other

    cs.LG cs.AI

    A Survey of Temporal Credit Assignment in Deep Reinforcement Learning

    Authors: Eduardo Pignatelli, Johan Ferret, Matthieu Geist, Thomas Mesnard, Hado van Hasselt, Olivier Pietquin, Laura Toni

    Abstract: The Credit Assignment Problem (CAP) refers to the longstanding challenge of Reinforcement Learning (RL) agents to associate actions with their long-term consequences. Solving the CAP is a crucial step towards the successful deployment of RL in the real world since most decision problems provide feedback that is noisy, delayed, and with little or no information about the causes. These conditions ma… ▽ More

    Submitted 4 July, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

    Comments: 56 pages, 2 figures, 4 tables

  11. arXiv:2306.14799  [pdf, other

    cs.LG cs.GT

    On Imitation in Mean-field Games

    Authors: Giorgia Ramponi, Pavel Kolev, Olivier Pietquin, Niao He, Mathieu Laurière, Matthieu Geist

    Abstract: We explore the problem of imitation learning (IL) in the context of mean-field games (MFGs), where the goal is to imitate the behavior of a population of agents following a Nash equilibrium policy according to some unknown payoff function. IL in MFGs presents new challenges compared to single-agent IL, particularly when both the reward function and the transition kernel depend on the population di… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  12. arXiv:2306.00186  [pdf, other

    cs.CL

    Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback

    Authors: Paul Roit, Johan Ferret, Lior Shani, Roee Aharoni, Geoffrey Cideron, Robert Dadashi, Matthieu Geist, Sertan Girgin, Léonard Hussenot, Orgad Keller, Nikola Momchev, Sabela Ramos, Piotr Stanczyk, Nino Vieillard, Olivier Bachem, Gal Elidan, Avinatan Hassidim, Olivier Pietquin, Idan Szpektor

    Abstract: Despite the seeming success of contemporary grounded text generation systems, they often tend to generate factually inconsistent text with respect to their input. This phenomenon is emphasized in tasks like summarization, in which the generated summaries should be corroborated by their source article. In this work, we leverage recent progress on textual entailment models to directly address this p… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: ACL 2023

  13. arXiv:2305.13185  [pdf, other

    cs.LG

    Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

    Authors: Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo

    Abstract: Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms. However, despite the use of function approximation in practice, the theoretical understanding of MDVI has been limited to tabular Markov decision processes (MDPs). We study MDVI with linear fu… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: ICML 2023 accepted

  14. arXiv:2305.01400  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Get Back Here: Robust Imitation by Return-to-Distribution Planning

    Authors: Geoffrey Cideron, Baruch Tabanpour, Sebastian Curi, Sertan Girgin, Leonard Hussenot, Gabriel Dulac-Arnold, Matthieu Geist, Olivier Pietquin, Robert Dadashi

    Abstract: We consider the Imitation Learning (IL) setup where expert data are not collected on the actual deployment environment but on a different version. To address the resulting distribution shift, we combine behavior cloning (BC) with a planner that is tasked to bring the agent back to states visited by the expert whenever the agent deviates from the demonstration distribution. The resulting algorithm,… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  15. arXiv:2302.03540  [pdf, other

    cs.SD eess.AS

    Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision

    Authors: Eugene Kharitonov, Damien Vincent, Zalán Borsos, Raphaël Marinier, Sertan Girgin, Olivier Pietquin, Matt Sharifi, Marco Tagliasacchi, Neil Zeghidour

    Abstract: We introduce SPEAR-TTS, a multi-speaker text-to-speech (TTS) system that can be trained with minimal supervision. By combining two types of discrete speech representations, we cast TTS as a composition of two sequence-to-sequence tasks: from text to high-level semantic tokens (akin to "reading") and from semantic tokens to low-level acoustic tokens ("speaking"). Decoupling these two tasks enables… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

  16. arXiv:2301.12662  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    SingSong: Generating musical accompaniments from singing

    Authors: Chris Donahue, Antoine Caillon, Adam Roberts, Ethan Manilow, Philippe Esling, Andrea Agostinelli, Mauro Verzetti, Ian Simon, Olivier Pietquin, Neil Zeghidour, Jesse Engel

    Abstract: We present SingSong, a system that generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice. To accomplish this, we build on recent developments in musical source separation and audio generation. Specifically, we apply a state-of-the-art source separation algorithm to a large corpus… ▽ More

    Submitted 29 January, 2023; originally announced January 2023.

  17. arXiv:2211.03521  [pdf, other

    cs.AI

    On the importance of data collection for training general goal-reaching policies

    Authors: Alexis Jacq, Manu Orsini, Gabriel Dulac-Arnold, Olivier Pietquin, Matthieu Geist, Olivier Bachem

    Abstract: Recent advances in ML suggest that the quantity of data available to a model is one of the primary bottlenecks to high performance. Although for language-based tasks there exist almost unlimited amounts of reasonably coherent data to train from, this is generally not the case for Reinforcement Learning, especially when dealing with a novel environment. In effect, even a relatively trivial continuo… ▽ More

    Submitted 20 February, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

  18. arXiv:2209.15342  [pdf, other

    cs.MA cs.CL cs.IT

    Emergent Communication: Generalization and Overfitting in Lewis Games

    Authors: Mathieu Rita, Corentin Tallec, Paul Michel, Jean-Bastien Grill, Olivier Pietquin, Emmanuel Dupoux, Florian Strub

    Abstract: Lewis signaling games are a class of simple communication games for simulating the emergence of language. In these games, two agents must agree on a communication protocol in order to solve a cooperative task. Previous work has shown that agents trained to play this game with reinforcement learning tend to develop languages that display undesirable properties from a linguistic point of view (lack… ▽ More

    Submitted 15 October, 2022; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  19. arXiv:2209.06792  [pdf, other

    cs.CL cs.LG

    vec2text with Round-Trip Translations

    Authors: Geoffrey Cideron, Sertan Girgin, Anton Raichuk, Olivier Pietquin, Olivier Bachem, Léonard Hussenot

    Abstract: We investigate models that can generate arbitrary natural language text (e.g. all English sentences) from a bounded, convex and well-behaved control space. We call them universal vec2text models. Such models would allow making semantic decisions in the vector space (e.g. via reinforcement learning) while the natural language generation is handled by the vec2text model. We propose four desired prop… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

  20. arXiv:2209.03143  [pdf, other

    cs.SD cs.LG eess.AS

    AudioLM: a Language Modeling Approach to Audio Generation

    Authors: Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, Neil Zeghidour

    Abstract: We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. We show how existing audio tokenizers provide different trade-offs between reconstruction quality and long-term structure, and we propose a hybrid tokenizati… ▽ More

    Submitted 25 July, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

  21. arXiv:2208.10138  [pdf, other

    cs.GT stat.ML

    Learning Correlated Equilibria in Mean-Field Games

    Authors: Paul Muller, Romuald Elie, Mark Rowland, Mathieu Lauriere, Julien Perolat, Sarah Perrin, Matthieu Geist, Georgios Piliouras, Olivier Pietquin, Karl Tuyls

    Abstract: The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

  22. arXiv:2205.14211  [pdf, other

    cs.LG cs.AI stat.ML

    KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

    Authors: Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári

    Abstract: In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model. Particularly, we analyze mirror descent value iteration (MDVI) by Geist et al. (2019) and Vieillard et al. (2020a), which uses the Kullback-Leibler divergence and entropy regularization in its value and policy updates. Our analysis shows that it is nearly minimax-optimal for fi… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: 29 pages, 6 figures

  23. arXiv:2205.12944  [pdf, other

    cs.LG cs.AI cs.GT math.OC

    Learning in Mean Field Games: A Survey

    Authors: Mathieu Laurière, Sarah Perrin, Julien Pérolat, Sertan Girgin, Paul Muller, Romuald Élie, Matthieu Geist, Olivier Pietquin

    Abstract: Non-cooperative and cooperative games with a very large number of players have many applications but remain generally intractable when the number of players increases. Introduced by Lasry and Lions, and Huang, Caines and Malhamé, Mean Field Games (MFGs) rely on a mean-field approximation to allow the number of players to grow to infinity. Traditional methods for solving these games generally rely… ▽ More

    Submitted 20 February, 2024; v1 submitted 25 May, 2022; originally announced May 2022.

  24. arXiv:2204.12982  [pdf, other

    cs.MA

    On the role of population heterogeneity in emergent communication

    Authors: Mathieu Rita, Florian Strub, Jean-Bastien Grill, Olivier Pietquin, Emmanuel Dupoux

    Abstract: Populations have often been perceived as a structuring component for language to emerge and evolve: the larger the population, the more structured the language. While this observation is widespread in the sociolinguistic literature, it has not been consistently reproduced in computer simulations with neural agents. In this paper, we thus aim to clarify this apparent contradiction. We explore emerg… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: International Conference on Learning Representations (ICLR) 2022

  25. arXiv:2203.11973  [pdf, other

    cs.LG math.OC stat.ML

    Scalable Deep Reinforcement Learning Algorithms for Mean Field Games

    Authors: Mathieu Laurière, Sarah Perrin, Sertan Girgin, Paul Muller, Ayush Jain, Theophile Cabannes, Georgios Piliouras, Julien Pérolat, Romuald Élie, Olivier Pietquin, Matthieu Geist

    Abstract: Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quant… ▽ More

    Submitted 17 June, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

  26. arXiv:2203.08542  [pdf, other

    cs.LG cs.AI

    Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to Act

    Authors: Alexis Jacq, Johan Ferret, Olivier Pietquin, Matthieu Geist

    Abstract: Traditionally, Reinforcement Learning (RL) aims at deciding how to act optimally for an artificial agent. We argue that deciding when to act is equally important. As humans, we drift from default, instinctive or memorized behaviors to focused, thought-out behaviors when required by the situation. To enhance RL agents with this aptitude, we propose to augment the standard Markov Decision Process an… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: AAMAS 2022 (14 pages extended version, added Sec. 7.4 and appendix K)

    Journal ref: Autonomous Agents and Multi-Agent Systems (2022)

  27. arXiv:2111.08350  [pdf, other

    cs.GT cs.MA

    Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO

    Authors: Paul Muller, Mark Rowland, Romuald Elie, Georgios Piliouras, Julien Perolat, Mathieu Lauriere, Raphael Marinier, Olivier Pietquin, Karl Tuyls

    Abstract: Recent advances in multiagent learning have seen the introduction ofa family of algorithms that revolve around the population-based trainingmethod PSRO, showing convergence to Nash, correlated and coarse corre-lated equilibria. Notably, when the number of agents increases, learningbest-responses becomes exponentially more difficult, and as such ham-pers PSRO training methods. The paradigm of mean-… ▽ More

    Submitted 29 August, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

    Comments: AAMAS

  28. arXiv:2111.02767  [pdf, other

    cs.LG

    RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning

    Authors: Sabela Ramos, Sertan Girgin, Léonard Hussenot, Damien Vincent, Hanna Yakubovich, Daniel Toyama, Anita Gergely, Piotr Stanczyk, Raphael Marinier, Jeremiah Harmsen, Olivier Pietquin, Nikola Momchev

    Abstract: We introduce RLDS (Reinforcement Learning Datasets), an ecosystem for recording, replaying, manipulating, annotating and sharing data in the context of Sequential Decision Making (SDM) including Reinforcement Learning (RL), Learning from Demonstrations, Offline RL or Imitation Learning. RLDS enables not only reproducibility of existing research and easy generation of new datasets, but also acceler… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: https://github.com/google-research/rlds

  29. arXiv:2110.11943  [pdf, other

    math.DS cs.MA cs.NI eess.SY math.OC

    Solving N-player dynamic routing games with congestion: a mean field approach

    Authors: Theophile Cabannes, Mathieu Lauriere, Julien Perolat, Raphael Marinier, Sertan Girgin, Sarah Perrin, Olivier Pietquin, Alexandre M. Bayen, Eric Goubault, Romuald Elie

    Abstract: The recent emergence of navigational tools has changed traffic patterns and has now enabled new types of congestion-aware routing control like dynamic road pricing. Using the fundamental diagram of traffic flows - applied in macroscopic and mesoscopic traffic modeling - the article introduces a new N-player dynamic routing game with explicit congestion dynamics. The model is well-posed and can rep… ▽ More

    Submitted 27 October, 2021; v1 submitted 22 October, 2021; originally announced October 2021.

  30. arXiv:2110.10149  [pdf, other

    cs.LG cs.AI cs.RO

    Continuous Control with Action Quantization from Demonstrations

    Authors: Robert Dadashi, Léonard Hussenot, Damien Vincent, Sertan Girgin, Anton Raichuk, Matthieu Geist, Olivier Pietquin

    Abstract: In this paper, we propose a novel Reinforcement Learning (RL) framework for problems with continuous action spaces: Action Quantization from Demonstrations (AQuaDem). The proposed approach consists in learning a discretization of continuous action spaces from human demonstrations. This discretization returns a set of plausible actions (in light of the demonstrations) for each input state, thus cap… ▽ More

    Submitted 3 June, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: Accepted to ICML 2022

  31. arXiv:2109.09717  [pdf, other

    cs.LG cs.GT cs.MA math.OC

    Generalization in Mean Field Games by Learning Master Policies

    Authors: Sarah Perrin, Mathieu Laurière, Julien Pérolat, Romuald Élie, Matthieu Geist, Olivier Pietquin

    Abstract: Mean Field Games (MFGs) can potentially scale multi-agent systems to extremely large populations of agents. Yet, most of the literature assumes a single initial distribution for the agents, which limits the practical applications of MFGs. Machine Learning has the potential to solve a wider diversity of MFG problems thanks to generalizations capacities. We study how to leverage these generalization… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

  32. arXiv:2109.09371  [pdf, other

    cs.AI cs.CL cs.NE stat.ML

    Learning Natural Language Generation from Scratch

    Authors: Alice Martin Donati, Guillaume Quispe, Charles Ollion, Sylvain Le Corff, Florian Strub, Olivier Pietquin

    Abstract: This paper introduces TRUncated ReinForcement Learning for Language (TrufLL), an original ap-proach to train conditional language models from scratch by only using reinforcement learning (RL). AsRL methods unsuccessfully scale to large action spaces, we dynamically truncate the vocabulary spaceusing a generic language model. TrufLL thus enables to train a language agent by solely interacting withi… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

  33. arXiv:2108.07041  [pdf, other

    cs.LG

    Implicitly Regularized RL with Implicit Q-Values

    Authors: Nino Vieillard, Marcin Andrychowicz, Anton Raichuk, Olivier Pietquin, Matthieu Geist

    Abstract: The $Q$-function is a central quantity in many Reinforcement Learning (RL) algorithms for which RL agents behave following a (soft)-greedy policy w.r.t. to $Q$. It is a powerful tool that allows action selection without a model of the environment and even without explicitly modeling the policy. Yet, this scheme can only be used in discrete action tasks, with small numbers of actions, as the softma… ▽ More

    Submitted 31 May, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: AISTATS 2022

  34. arXiv:2106.06431  [pdf, other

    cs.LG

    Offline Reinforcement Learning as Anti-Exploration

    Authors: Shideh Rezaeifar, Robert Dadashi, Nino Vieillard, Léonard Hussenot, Olivier Bachem, Olivier Pietquin, Matthieu Geist

    Abstract: Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, without interactions with the system. An agent in this setting should avoid selecting actions whose consequences cannot be predicted from the data. This is the converse of exploration in RL, which favors such actions. We thus take inspiration from the literature on bonus-based exploration to design a new… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

  35. arXiv:2106.04480  [pdf, other

    cs.LG cs.AI

    There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning

    Authors: Nathan Grinsztajn, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

    Abstract: We propose to learn to distinguish reversible from irreversible actions for better informed decision-making in Reinforcement Learning (RL). From theoretical considerations, we show that approximate reversibility can be learned through a simple surrogate task: ranking randomly sampled trajectory events in chronological order. Intuitively, pairs of events that are always observed in the same order a… ▽ More

    Submitted 29 October, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

  36. arXiv:2106.03787  [pdf, other

    cs.LG cs.MA

    Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint

    Authors: Matthieu Geist, Julien Pérolat, Mathieu Laurière, Romuald Elie, Sarah Perrin, Olivier Bachem, Rémi Munos, Olivier Pietquin

    Abstract: Concave Utility Reinforcement Learning (CURL) extends RL from linear to concave utilities in the occupancy measure induced by the agent's policy. This encompasses not only RL but also imitation learning and exploration, among others. Yet, this more general paradigm invalidates the classical Bellman equations, and calls for new algorithms. Mean-field Games (MFGs) are a continuous approximation of m… ▽ More

    Submitted 16 February, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: AAMAS 2022

  37. arXiv:2106.00672  [pdf, other

    cs.LG cs.AI cs.NE

    What Matters for Adversarial Imitation Learning?

    Authors: Manu Orsini, Anton Raichuk, Léonard Hussenot, Damien Vincent, Robert Dadashi, Sertan Girgin, Matthieu Geist, Olivier Bachem, Olivier Pietquin, Marcin Andrychowicz

    Abstract: Adversarial imitation learning has become a popular framework for imitation in continuous control. Over the years, several variations of its components were proposed to enhance the performance of the learned policies as well as the sample complexity of the algorithm. In practice, these choices are rarely tested all together in rigorous empirical studies. It is therefore difficult to discuss and un… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

  38. arXiv:2105.12034  [pdf, other

    cs.LG

    Hyperparameter Selection for Imitation Learning

    Authors: Leonard Hussenot, Marcin Andrychowicz, Damien Vincent, Robert Dadashi, Anton Raichuk, Lukasz Stafiniak, Sertan Girgin, Raphael Marinier, Nikola Momchev, Sabela Ramos, Manu Orsini, Olivier Bachem, Matthieu Geist, Olivier Pietquin

    Abstract: We address the issue of tuning hyperparameters (HPs) for imitation learning algorithms in the context of continuous-control, when the underlying reward function of the demonstrating expert cannot be observed at any time. The vast literature in imitation learning mostly considers this reward function to be available for HP selection, but this is not a realistic setting. Indeed, would this reward fu… ▽ More

    Submitted 25 May, 2021; originally announced May 2021.

    Comments: ICML 2021

  39. arXiv:2105.09992  [pdf, other

    cs.LG

    Don't Do What Doesn't Matter: Intrinsic Motivation with Action Usefulness

    Authors: Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin

    Abstract: Sparse rewards are double-edged training signals in reinforcement learning: easy to design but hard to optimize. Intrinsic motivation guidances have thus been developed toward alleviating the resulting exploration problem. They usually incentivize agents to look for new states through novelty signals. Yet, such methods encourage exhaustive exploration of the state space rather than focusing on the… ▽ More

    Submitted 31 May, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

    Comments: Accepted at Internationnal Joint Conference on Artificial Intelligence (IJCAI'21) and Self-Supervision for Reinforcement Learning Workshop (SSL-RL @ICLR'21)

  40. arXiv:2105.07933  [pdf, other

    cs.MA cs.AI

    Mean Field Games Flock! The Reinforcement Learning Way

    Authors: Sarah Perrin, Mathieu Laurière, Julien Pérolat, Matthieu Geist, Romuald Élie, Olivier Pietquin

    Abstract: We present a method enabling a large number of agents to learn how to flock, which is a natural behavior observed in large populations of animals. This problem has drawn a lot of interest but requires many structural assumptions and is tractable only in small dimensions. We phrase this problem as a Mean Field Game (MFG), where each individual chooses its acceleration depending on the population be… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

  41. arXiv:2103.01948  [pdf, other

    cs.LG

    Offline Reinforcement Learning with Pseudometric Learning

    Authors: Robert Dadashi, Shideh Rezaeifar, Nino Vieillard, Léonard Hussenot, Olivier Pietquin, Matthieu Geist

    Abstract: Offline Reinforcement Learning methods seek to learn a policy from logged transitions of an environment, without any interaction. In the presence of function approximation, and under the assumption of limited coverage of the state-action space of the environment, it is necessary to enforce the policy to visit state-action pairs close to the support of logged transitions. In this work, we propose a… ▽ More

    Submitted 2 June, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

    Comments: ICML 2021

  42. arXiv:2103.00623  [pdf, other

    cs.AI

    Scaling up Mean Field Games with Online Mirror Descent

    Authors: Julien Perolat, Sarah Perrin, Romuald Elie, Mathieu Laurière, Georgios Piliouras, Matthieu Geist, Karl Tuyls, Olivier Pietquin

    Abstract: We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD). We show that continuous-time OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions. This theoretical result nicely extends to multi-population games and to settings involving common noise. A thorough experimental investigation on vari… ▽ More

    Submitted 28 February, 2021; originally announced March 2021.

  43. arXiv:2102.04376  [pdf, other

    cs.LG cs.AI stat.ML

    Adversarially Guided Actor-Critic

    Authors: Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

    Abstract: Despite definite success in deep reinforcement learning problems, actor-critic algorithms are still confronted with sample inefficiency in complex environments, particularly in tasks where efficient exploration is a bottleneck. These methods consider a policy (the actor) and a value function (the critic) whose respective losses are built using different motivations and approaches. This paper intro… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: Accepted at ICLR 2021

  44. arXiv:2012.11989  [pdf, other

    cs.LG

    Self-Imitation Advantage Learning

    Authors: Johan Ferret, Olivier Pietquin, Matthieu Geist

    Abstract: Self-imitation learning is a Reinforcement Learning (RL) method that encourages actions whose returns were higher than expected, which helps in hard exploration and sparse reward problems. It was shown to improve the performance of on-policy actor-critic methods in several discrete control tasks. Nevertheless, applying self-imitation to the mostly action-value based off-policy RL methods is not st… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

    Comments: AAMAS 2021

  45. arXiv:2010.13694  [pdf, other

    eess.SP cs.LG

    Learning from Heterogeneous EEG Signals with Differentiable Channel Reordering

    Authors: Aaqib Saeed, David Grangier, Olivier Pietquin, Neil Zeghidour

    Abstract: We propose CHARM, a method for training a single neural network across inconsistent input channels. Our work is motivated by Electroencephalography (EEG), where data collection protocols from different headsets result in varying channel ordering and number, which limits the feasibility of transferring trained systems across datasets. Our approach builds upon attention mechanisms to estimate a late… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

  46. arXiv:2010.02975  [pdf, other

    cs.CL

    Supervised Seeded Iterated Learning for Interactive Language Learning

    Authors: Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville

    Abstract: Language drift has been one of the major obstacles to train language models through interaction. When word-based conversational agents are trained towards completing a task, they tend to invent their language rather than leveraging natural language. In recent literature, two general methods partially counter this phenomenon: Supervised Selfplay (S2P) and Seeded Iterated Learning (SIL). While S2P j… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

  47. arXiv:2008.03127  [pdf, other

    eess.AS cs.LG cs.SD

    A Machine of Few Words -- Interactive Speaker Recognition with Reinforcement Learning

    Authors: Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin

    Abstract: Speaker recognition is a well known and studied task in the speech processing domain. It has many applications, either for security or speaker adaptation of personal devices. In this paper, we present a new paradigm for automatic speaker recognition that we call Interactive Speaker Recognition (ISR). In this paradigm, the recognition system aims to incrementally build a representation of the speak… ▽ More

    Submitted 7 August, 2020; originally announced August 2020.

  48. arXiv:2007.14430  [pdf, other

    cs.LG stat.ML

    Munchausen Reinforcement Learning

    Authors: Nino Vieillard, Olivier Pietquin, Matthieu Geist

    Abstract: Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most algorithms, based on temporal differences, replace the true value of a transiting state by their current estimate of this value. Yet, another estimate could be leveraged to bootstrap RL: the current policy. Our core contribution stands in a very simple idea: adding the scaled log-policy to the immediate reward. We show that sli… ▽ More

    Submitted 4 November, 2020; v1 submitted 28 July, 2020; originally announced July 2020.

    Comments: NeurIPS 2020. Code: https://github.com/google-research/google-research/tree/master/munchausen_rl

  49. arXiv:2007.08620  [pdf, other

    cs.LG cs.AI stat.ML

    The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction

    Authors: Alice Martin, Charles Ollion, Florian Strub, Sylvain Le Corff, Olivier Pietquin

    Abstract: This paper introduces the Sequential Monte Carlo Transformer, an original approach that naturally captures the observations distribution in a transformer architecture. The keys, queries, values and attention vectors of the network are considered as the unobserved stochastic states of its hidden structure. This generative model is such that at each time step the received observation is a random fun… ▽ More

    Submitted 15 December, 2020; v1 submitted 15 July, 2020; originally announced July 2020.

  50. arXiv:2007.03458  [pdf, other

    math.OC cs.AI

    Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications

    Authors: Sarah Perrin, Julien Perolat, Mathieu Laurière, Matthieu Geist, Romuald Elie, Olivier Pietquin

    Abstract: In this paper, we deepen the analysis of continuous time Fictitious Play learning algorithm to the consideration of various finite state Mean Field Game settings (finite horizon, $γ$-discounted), allowing in particular for the introduction of an additional common noise. We first present a theoretical convergence analysis of the continuous time Fictitious Play process and prove that the induced e… ▽ More

    Submitted 26 October, 2020; v1 submitted 5 July, 2020; originally announced July 2020.