Zum Hauptinhalt springen

Showing 51–86 of 86 results for author: Pietquin, O

.
  1. arXiv:2006.12917  [pdf, other

    cs.LG stat.ML

    Show me the Way: Intrinsic Motivation from Demonstrations

    Authors: Léonard Hussenot, Robert Dadashi, Matthieu Geist, Olivier Pietquin

    Abstract: The study of exploration in the domain of decision making has a long history but remains actively debated. From the vast literature that addressed this topic for decades under various points of view (e.g., developmental psychology, experimental design, artificial intelligence), intrinsic motivation emerged as a concept that can practically be transferred to artificial agents. Especially, in the re… ▽ More

    Submitted 13 January, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: AAMAS 2021

  2. arXiv:2006.05990  [pdf, other

    cs.LG stat.ML

    What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

    Authors: Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphael Marinier, Léonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem

    Abstract: In recent years, on-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks. While RL algorithms are often conceptually simple, their state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. Those choices are usually not extensively discussed in the literatur… ▽ More

    Submitted 10 June, 2020; originally announced June 2020.

  3. arXiv:2006.04678  [pdf, other

    cs.LG stat.ML

    Primal Wasserstein Imitation Learning

    Authors: Robert Dadashi, Léonard Hussenot, Matthieu Geist, Olivier Pietquin

    Abstract: Imitation Learning (IL) methods seek to match the behavior of an agent with that of an expert. In the present work, we propose a new IL method based on a conceptually simple algorithm: Primal Wasserstein Imitation Learning (PWIL), which ties to the primal form of the Wasserstein distance between the expert and the agent state-action distributions. We present a reward function which is derived offl… ▽ More

    Submitted 17 March, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: Published in International Conference on Learning Representations (ICLR 2021)

  4. arXiv:2006.00979  [pdf, other

    cs.LG cs.AI

    Acme: A Research Framework for Distributed Reinforcement Learning

    Authors: Matthew W. Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Nikola Momchev, Danila Sinopalnikov, Piotr Stańczyk, Sabela Ramos, Anton Raichuk, Damien Vincent, Léonard Hussenot, Robert Dadashi, Gabriel Dulac-Arnold, Manu Orsini, Alexis Jacq, Johan Ferret, Nino Vieillard, Seyed Kamyar Seyed Ghasemipour, Sertan Girgin, Olivier Pietquin, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang , et al. (14 additional authors not shown)

    Abstract: Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce publishe… ▽ More

    Submitted 20 September, 2022; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme

  5. arXiv:2005.14419  [pdf, ps, other

    cs.LG stat.ML

    Reinforcement Learning

    Authors: Olivier Buffet, Olivier Pietquin, Paul Weng

    Abstract: Reinforcement learning (RL) is a general framework for adaptive control, which has proven to be efficient in many domains, e.g., board games, video games or autonomous vehicles. In such problems, an agent faces a sequential decision-making problem where, at every time step, it observes its state, performs an action, receives a reward and moves to a new state. An RL agent learns by trial and error… ▽ More

    Submitted 13 June, 2020; v1 submitted 29 May, 2020; originally announced May 2020.

    Comments: Chapter in "A Guided Tour of Artificial Intelligence Research", Springer

  6. arXiv:2003.14089  [pdf, other

    cs.LG stat.ML

    Leverage the Average: an Analysis of KL Regularization in RL

    Authors: Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Rémi Munos, Matthieu Geist

    Abstract: Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance. Yet, only little is understood theoretically about why KL regularization helps, so far. We study KL regularization within an approximate value iteration scheme and show that it implicitly averages q-values. Leveraging this insight, we provide a ve… ▽ More

    Submitted 6 January, 2021; v1 submitted 31 March, 2020; originally announced March 2020.

    Comments: NeurIPS 2020

  7. arXiv:2003.12694  [pdf, other

    cs.AI cs.CL

    Countering Language Drift with Seeded Iterated Learning

    Authors: Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville

    Abstract: Pretraining on human corpus and then finetuning in a simulator has become a standard pipeline for training a goal-oriented dialogue agent. Nevertheless, as soon as the agents are finetuned to maximize task completion, they suffer from the so-called language drift phenomenon: they slowly lose syntactic and semantic properties of language as they only focus on solving the task. In this paper, we pro… ▽ More

    Submitted 24 August, 2020; v1 submitted 27 March, 2020; originally announced March 2020.

  8. arXiv:1910.09451  [pdf, other

    cs.LG cs.CL stat.ML

    HIGhER : Improving instruction following with Hindsight Generation for Experience Replay

    Authors: Geoffrey Cideron, Mathieu Seurin, Florian Strub, Olivier Pietquin

    Abstract: Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality. While these characterizations may foster instructing, conditioning or structuring interactive agent behavior, it remains an open-problem to correctly relate language understanding and reinforcement learning in even simple instruction following scenarios… ▽ More

    Submitted 10 December, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

    Comments: Accepted at ADPRL'20

  9. arXiv:1910.09322  [pdf, other

    cs.LG stat.ML

    Momentum in Reinforcement Learning

    Authors: Nino Vieillard, Bruno Scherrer, Olivier Pietquin, Matthieu Geist

    Abstract: We adapt the optimization's concept of momentum to reinforcement learning. Seeing the state-action value functions as an analog to the gradients in optimization, we interpret momentum as an average of consecutive $q$-functions. We derive Momentum Value Iteration (MoVI), a variation of Value Iteration that incorporates this momentum idea. Our analysis shows that this allows MoVI to average errors o… ▽ More

    Submitted 31 March, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

    Comments: AISTATS 2020

  10. arXiv:1910.08476  [pdf, ps, other

    cs.LG math.OC stat.ML

    On Connections between Constrained Optimization and Reinforcement Learning

    Authors: Nino Vieillard, Olivier Pietquin, Matthieu Geist

    Abstract: Dynamic Programming (DP) provides standard algorithms to solve Markov Decision Processes. However, these algorithms generally do not optimize a scalar objective function. In this paper, we draw connections between DP and (constrained) convex optimization. Specifically, we show clear links in the algorithmic structure between three DP schemes and optimization algorithms. We link Conservative Policy… ▽ More

    Submitted 29 October, 2019; v1 submitted 18 October, 2019; originally announced October 2019.

    Comments: Optimization Foundations of Reinforcement Learning Workshop at NeurIPS 2019

  11. arXiv:1910.02078  [pdf, other

    cs.LG stat.ML

    I'm sorry Dave, I'm afraid I can't do that, Deep Q-learning from forbidden action

    Authors: Mathieu Seurin, Philippe Preux, Olivier Pietquin

    Abstract: The use of Reinforcement Learning (RL) is still restricted to simulation or to enhance human-operated systems through recommendations. Real-world environments (e.g. industrial robots or power grids) are generally designed with safety constraints in mind implemented in the shape of valid actions masks or contingency controllers. For example, the range of motion and the angles of the motors of a rob… ▽ More

    Submitted 13 August, 2020; v1 submitted 4 October, 2019; originally announced October 2019.

    Comments: Accepted at Internationnal Joint Conference on Neural Networks (IJCNN'2020)

  12. Self-Attentional Credit Assignment for Transfer in Reinforcement Learning

    Authors: Johan Ferret, Raphaël Marinier, Matthieu Geist, Olivier Pietquin

    Abstract: The ability to transfer knowledge to novel environments and tasks is a sensible desiderata for general learning agents. Despite the apparent promises, transfer in RL is still an open and little exploited research area. In this paper, we take a brand-new perspective about transfer: we suggest that the ability to assign credit unveils structural invariants in the tasks that can be transferred to mak… ▽ More

    Submitted 22 November, 2019; v1 submitted 18 July, 2019; originally announced July 2019.

    Comments: 21 pages, 10 figures, 3 tables (accepted as an oral presentation at the Learning Transferable Skills workshop, NeurIPS 2019)

    Journal ref: International Joint Conference on Artificial Intelligence. 29 (2020) 2655-2661

  13. arXiv:1907.02633  [pdf, other

    math.OC cs.LG stat.ML

    On the Convergence of Model Free Learning in Mean Field Games

    Authors: Romuald Elie, Julien Pérolat, Mathieu Laurière, Matthieu Geist, Olivier Pietquin

    Abstract: Learning by experience in Multi-Agent Systems (MAS) is a difficult and exciting task, due to the lack of stationarity of the environment, whose dynamics evolves as the population learns. In order to design scalable algorithms for systems with a large population of interacting agents (e.g. swarms), this paper focuses on Mean Field MAS, where the number of agents is asymptotically infinite. Recently… ▽ More

    Submitted 20 February, 2020; v1 submitted 4 July, 2019; originally announced July 2019.

    Journal ref: AAAI 2020 conference proceedings

  14. arXiv:1907.00868  [pdf, other

    cs.LG cs.AI stat.ML

    MULEX: Disentangling Exploitation from Exploration in Deep RL

    Authors: Lucas Beyer, Damien Vincent, Olivier Teboul, Sylvain Gelly, Matthieu Geist, Olivier Pietquin

    Abstract: An agent learning through interactions should balance its action selection process between probing the environment to discover new rewards and using the information acquired in the past to adopt useful behaviour. This trade-off is usually obtained by perturbing either the agent's actions (e.g., e-greedy or Gibbs sampling) or the agent's parameters (e.g., NoisyNet), or by modifying the reward it re… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

  15. arXiv:1906.09831  [pdf, other

    cs.GT cs.AI cs.LG

    Foolproof Cooperative Learning

    Authors: Alexis Jacq, Julien Perolat, Matthieu Geist, Olivier Pietquin

    Abstract: This paper extends the notion of learning equilibrium in game theory from matrix games to stochastic games. We introduce Foolproof Cooperative Learning (FCL), an algorithm that converges to a Tit-for-Tat behavior. It allows cooperative strategies when played against itself while being not exploitable by selfish players. We prove that in repeated symmetric games, this algorithm is a learning equili… ▽ More

    Submitted 15 October, 2020; v1 submitted 24 June, 2019; originally announced June 2019.

    Journal ref: Proceedings of The 12th Asian Conference on Machine Learning, PMLR 129:401-416, 2020

  16. arXiv:1906.09784  [pdf, other

    cs.LG stat.ML

    Deep Conservative Policy Iteration

    Authors: Nino Vieillard, Olivier Pietquin, Matthieu Geist

    Abstract: Conservative Policy Iteration (CPI) is a founding algorithm of Approximate Dynamic Programming (ADP). Its core principle is to stabilize greediness through stochastic mixtures of consecutive policies. It comes with strong theoretical guarantees, and inspired approaches in deep Reinforcement Learning (RL). However, CPI itself has rarely been implemented, never with neural networks, and only experim… ▽ More

    Submitted 6 January, 2020; v1 submitted 24 June, 2019; originally announced June 2019.

    Comments: AAAI 2020 (long version)

  17. arXiv:1905.12282  [pdf, other

    cs.LG cs.CR stat.ML

    CopyCAT: Taking Control of Neural Policies with Constant Attacks

    Authors: Léonard Hussenot, Matthieu Geist, Olivier Pietquin

    Abstract: We propose a new perspective on adversarial attacks against deep reinforcement learning agents. Our main contribution is CopyCAT, a targeted attack able to consistently lure an agent into following an outsider's policy. It is pre-computed, therefore fast inferred, and could thus be usable in a real-time scenario. We show its effectiveness on Atari 2600 games in the novel read-only setting. In this… ▽ More

    Submitted 21 January, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: AAMAS 2020

  18. arXiv:1903.01004  [pdf, other

    cs.LG cs.AI stat.ML

    Budgeted Reinforcement Learning in Continuous State Space

    Authors: Nicolas Carrara, Edouard Leurent, Romain Laroche, Tanguy Urvoy, Odalric-Ambrym Maillard, Olivier Pietquin

    Abstract: A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to… ▽ More

    Submitted 27 May, 2019; v1 submitted 3 March, 2019; originally announced March 2019.

    Comments: N. Carrara and E. Leurent have equally contributed

  19. arXiv:1901.11275  [pdf, other

    cs.LG stat.ML

    A Theory of Regularized Markov Decision Processes

    Authors: Matthieu Geist, Bruno Scherrer, Olivier Pietquin

    Abstract: Many recent successful (deep) reinforcement learning algorithms make use of regularization, generally based on entropy or Kullback-Leibler divergence. We propose a general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: we consider a larger class of regularizers, and we consider the general modified policy iteration approach, encompassing both p… ▽ More

    Submitted 4 June, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

    Comments: ICML 2019

  20. arXiv:1809.07802  [pdf, other

    cs.LG cs.CV stat.ML

    Playing the Game of Universal Adversarial Perturbations

    Authors: Julien Perolat, Mateusz Malinowski, Bilal Piot, Olivier Pietquin

    Abstract: We study the problem of learning classifiers robust to universal adversarial perturbations. While prior work approaches this problem via robust optimization, adversarial training, or input transformation, we instead phrase it as a two-player zero-sum game. In this new formulation, both players simultaneously play the same game, where one player chooses a classifier that minimizes a classification… ▽ More

    Submitted 25 September, 2018; v1 submitted 20 September, 2018; originally announced September 2018.

  21. arXiv:1808.04446  [pdf, other

    cs.CV cs.CL cs.LG stat.ML

    Visual Reasoning with Multi-hop Feature Modulation

    Authors: Florian Strub, Mathieu Seurin, Ethan Perez, Harm de Vries, Jérémie Mary, Philippe Preux, Aaron Courville, Olivier Pietquin

    Abstract: Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue. For such tasks, one successful approach is to condition image-based convolutional network computation on language via Feature-wise Linear Modulation (FiLM) layers, i.e., per-channel scaling and shifting. We propose to… ▽ More

    Submitted 12 October, 2018; v1 submitted 3 August, 2018; originally announced August 2018.

    Comments: In Proc of ECCV 2018

  22. arXiv:1805.11593  [pdf, other

    cs.LG cs.AI stat.ML

    Observe and Look Further: Achieving Consistent Performance on Atari

    Authors: Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

    Abstract: Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games. We identify three key challenges that any algorithm needs to master in order to perform well on all games: processing diverse reward distributions, reasoning over long time horizons, and explori… ▽ More

    Submitted 29 May, 2018; originally announced May 2018.

  23. arXiv:1802.04200  [pdf, other

    cs.CL

    End-to-End Automatic Speech Translation of Audiobooks

    Authors: Alexandre Bérard, Laurent Besacier, Ali Can Kocabiyikoglu, Olivier Pietquin

    Abstract: We investigate end-to-end speech-to-text translation on a corpus of audiobooks specifically augmented for this task. Previous works investigated the extreme case where source language transcription is not available during learning nor decoding, but we also study a midway case where source language transcription is available at training time only. In this case, a single model is trained to decode s… ▽ More

    Submitted 12 February, 2018; originally announced February 2018.

    Comments: Accepted to ICASSP 2018 (poster presentation)

  24. arXiv:1707.08817  [pdf, other

    cs.AI

    Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

    Authors: Mel Vecerik, Todd Hester, Jonathan Scholz, Fumin Wang, Olivier Pietquin, Bilal Piot, Nicolas Heess, Thomas Rothörl, Thomas Lampe, Martin Riedmiller

    Abstract: We propose a general and model-free approach for Reinforcement Learning (RL) on real robotics with sparse rewards. We build upon the Deep Deterministic Policy Gradient (DDPG) algorithm to use demonstrations. Both demonstrations and actual interactions are used to fill a replay buffer and the sampling ratio between demonstrations and transitions is automatically tuned via a prioritized replay mecha… ▽ More

    Submitted 8 October, 2018; v1 submitted 27 July, 2017; originally announced July 2017.

  25. arXiv:1707.05118  [pdf, other

    cs.CL

    LIG-CRIStAL System for the WMT17 Automatic Post-Editing Task

    Authors: Alexandre Berard, Olivier Pietquin, Laurent Besacier

    Abstract: This paper presents the LIG-CRIStAL submission to the shared Automatic Post- Editing task of WMT 2017. We propose two neural post-editing models: a monosource model with a task-specific attention mechanism, which performs particularly well in a low-resource scenario; and a chained architecture which makes use of the source sentence to provide extra context. This latter architecture manages to slig… ▽ More

    Submitted 17 July, 2017; originally announced July 2017.

    Comments: keywords: neural post-edition, attention models

  26. arXiv:1707.00683  [pdf, other

    cs.CV cs.CL cs.LG

    Modulating early visual processing by language

    Authors: Harm de Vries, Florian Strub, Jérémie Mary, Hugo Larochelle, Olivier Pietquin, Aaron Courville

    Abstract: It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view dominates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and pro… ▽ More

    Submitted 18 December, 2017; v1 submitted 2 July, 2017; originally announced July 2017.

    Comments: Advances in Neural Information Processing Systems 30 (NIPS 2017)

  27. arXiv:1706.10295  [pdf, other

    cs.LG stat.ML

    Noisy Networks for Exploration

    Authors: Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg

    Abstract: We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration. The parameters of the noise are learned with gradient descent along with the remaining network weights. NoisyNet is straightforward to implement and adds little computational overhead. We find… ▽ More

    Submitted 9 July, 2019; v1 submitted 30 June, 2017; originally announced June 2017.

    Comments: ICLR 2018

  28. arXiv:1706.06617  [pdf, other

    cs.LG cs.AI stat.ML

    Observational Learning by Reinforcement Learning

    Authors: Diana Borsa, Bilal Piot, Rémi Munos, Olivier Pietquin

    Abstract: Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent. It is a core mechanism appearing in various instances of social learning and has been found to be employed in several intelligent species, including humans. In this paper, we investigate to what extent the explicit modelling of other a… ▽ More

    Submitted 20 June, 2017; originally announced June 2017.

  29. arXiv:1704.03732  [pdf, ps, other

    cs.AI cs.LG

    Deep Q-learning from Demonstrations

    Authors: Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys

    Abstract: Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world… ▽ More

    Submitted 22 November, 2017; v1 submitted 12 April, 2017; originally announced April 2017.

    Comments: Published at AAAI 2018. Previously on arxiv as "Learning from Demonstrations for Real World Reinforcement Learning"

  30. arXiv:1703.05423  [pdf, other

    cs.CL

    End-to-end optimization of goal-driven and visually grounded dialogue systems

    Authors: Florian Strub, Harm de Vries, Jeremie Mary, Bilal Piot, Aaron Courville, Olivier Pietquin

    Abstract: End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision is too s… ▽ More

    Submitted 15 March, 2017; originally announced March 2017.

  31. arXiv:1612.01744  [pdf, other

    cs.CL

    Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation

    Authors: Alexandre Berard, Olivier Pietquin, Christophe Servan, Laurent Besacier

    Abstract: This paper proposes a first attempt to build an end-to-end speech-to-text translation system, which does not use source language transcription during learning or decoding. We propose a model for direct speech-to-text translation, which gives promising results on a small French-English synthetic corpus. Relaxing the need for source language transcription would drastically change the data collection… ▽ More

    Submitted 6 December, 2016; originally announced December 2016.

    Comments: accepted to NIPS workshop on End-to-end Learning for Speech and Audio Processing

  32. arXiv:1611.08481  [pdf, other

    cs.AI cs.CV

    GuessWhat?! Visual object discovery through multi-modal dialogue

    Authors: Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville

    Abstract: We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the colle… ▽ More

    Submitted 6 February, 2017; v1 submitted 23 November, 2016; originally announced November 2016.

    Comments: 23 pages; CVPR 2017 submission; see https://guesswhat.ai

  33. arXiv:1606.08718  [pdf, ps, other

    cs.GT

    Learning Nash Equilibrium for General-Sum Markov Games from Batch Data

    Authors: Julien Pérolat, Florian Strub, Bilal Piot, Olivier Pietquin

    Abstract: This paper addresses the problem of learning a Nash equilibrium in $γ$-discounted multiplayer general-sum Markov Games (MG). A key component of this model is the possibility for the players to either collaborate or team apart to increase their rewards. Building an artificial player for general-sum MGs implies to learn more complex strategies which are impossible to obtain by using techniques devel… ▽ More

    Submitted 6 March, 2017; v1 submitted 28 June, 2016; originally announced June 2016.

    Comments: 20th International Conference on Artificial Intelligence and Statistics (AISTATS) 2017, Fort Lauderdale, Florida, USA. JMLR: W&CP volume 54

    Report number: CRIStAL, UMR 9189

  34. arXiv:1606.07636  [pdf, other

    cs.LG stat.ML

    Is the Bellman residual a bad proxy?

    Authors: Matthieu Geist, Bilal Piot, Olivier Pietquin

    Abstract: This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual. For that purpose, we place ourselves in the framework of policy search algorithms, that are usually designed to maximize the mean value, and derive a method that minimizes the residual… ▽ More

    Submitted 12 December, 2017; v1 submitted 24 June, 2016; originally announced June 2016.

    Comments: Final NIPS 2017 version (title, among other things, changed)

  35. arXiv:1606.01128  [pdf, other

    math.OC cs.LG stat.ML

    Difference of Convex Functions Programming Applied to Control with Expert Data

    Authors: Bilal Piot, Matthieu Geist, Olivier Pietquin

    Abstract: This paper reports applications of Difference of Convex functions (DC) programming to Learning from Demonstrations (LfD) and Reinforcement Learning (RL) with expert data. This is made possible because the norm of the Optimal Bellman Residual (OBR), which is at the heart of many RL and LfD algorithms, is DC. Improvement in performance is demonstrated on two specific algorithms, namely Reward-regula… ▽ More

    Submitted 5 September, 2016; v1 submitted 3 June, 2016; originally announced June 2016.

  36. Kalman Temporal Differences

    Authors: Matthieu Geist, Olivier Pietquin

    Abstract: Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertain… ▽ More

    Submitted 16 January, 2014; originally announced June 2014.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 39, pages 483-532, 2010