Search | arXiv e-print repository

On the importance of data collection for training general goal-reaching policies

Authors: Alexis Jacq, Manu Orsini, Gabriel Dulac-Arnold, Olivier Pietquin, Matthieu Geist, Olivier Bachem

Abstract: Recent advances in ML suggest that the quantity of data available to a model is one of the primary bottlenecks to high performance. Although for language-based tasks there exist almost unlimited amounts of reasonably coherent data to train from, this is generally not the case for Reinforcement Learning, especially when dealing with a novel environment. In effect, even a relatively trivial continuo… ▽ More Recent advances in ML suggest that the quantity of data available to a model is one of the primary bottlenecks to high performance. Although for language-based tasks there exist almost unlimited amounts of reasonably coherent data to train from, this is generally not the case for Reinforcement Learning, especially when dealing with a novel environment. In effect, even a relatively trivial continuous environment has an almost limitless number of states, but simply sampling random states and actions will likely not provide transitions that are interesting or useful for any potential downstream task. How should one generate massive amounts of useful data given only an MDP with no indication of downstream tasks? Are the quantity and quality of data truly transformative to the performance of a general controller? We propose to answer both of these questions. First, we introduce a principled unsupervised exploration method, ChronoGEM, which aims to achieve uniform coverage over the manifold of achievable states, which we believe is the most reasonable goal given no prior task information. Secondly, we investigate the effects of both data quantity and data quality on the training of a downstream goal-achievement policy, and show that both large quantities and high-quality of data are essential to train a general controller: a high-precision pose-achievement policy capable of attaining a large number of poses over numerous continuous control embodiments including humanoid. △ Less

Submitted 20 February, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

arXiv:2203.08542 [pdf, other]

Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to Act

Authors: Alexis Jacq, Johan Ferret, Olivier Pietquin, Matthieu Geist

Abstract: Traditionally, Reinforcement Learning (RL) aims at deciding how to act optimally for an artificial agent. We argue that deciding when to act is equally important. As humans, we drift from default, instinctive or memorized behaviors to focused, thought-out behaviors when required by the situation. To enhance RL agents with this aptitude, we propose to augment the standard Markov Decision Process an… ▽ More Traditionally, Reinforcement Learning (RL) aims at deciding how to act optimally for an artificial agent. We argue that deciding when to act is equally important. As humans, we drift from default, instinctive or memorized behaviors to focused, thought-out behaviors when required by the situation. To enhance RL agents with this aptitude, we propose to augment the standard Markov Decision Process and make a new mode of action available: being lazy, which defers decision-making to a default policy. In addition, we penalize non-lazy actions in order to encourage minimal effort and have agents focus on critical decisions only. We name the resulting formalism lazy-MDPs. We study the theoretical properties of lazy-MDPs, expressing value functions and characterizing optimal solutions. Then we empirically demonstrate that policies learned in lazy-MDPs generally come with a form of interpretability: by construction, they show us the states where the agent takes control over the default policy. We deem those states and corresponding actions important since they explain the difference in performance between the default and the new, lazy policy. With suboptimal policies as default (pretrained or random), we observe that agents are able to get competitive performance in Atari games while only taking control in a limited subset of states. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: AAMAS 2022 (14 pages extended version, added Sec. 7.4 and appendix K)

Journal ref: Autonomous Agents and Multi-Agent Systems (2022)

arXiv:2006.00979 [pdf, other]

Acme: A Research Framework for Distributed Reinforcement Learning

Authors: Matthew W. Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Nikola Momchev, Danila Sinopalnikov, Piotr Stańczyk, Sabela Ramos, Anton Raichuk, Damien Vincent, Léonard Hussenot, Robert Dadashi, Gabriel Dulac-Arnold, Manu Orsini, Alexis Jacq, Johan Ferret, Nino Vieillard, Seyed Kamyar Seyed Ghasemipour, Sertan Girgin, Olivier Pietquin, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang , et al. (14 additional authors not shown)

Abstract: Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce publishe… ▽ More Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce published RL algorithms. To address these concerns this work describes Acme, a framework for constructing novel RL algorithms that is specifically designed to enable agents that are built using simple, modular components that can be used at various scales of execution. While the primary goal of Acme is to provide a framework for algorithm development, a secondary goal is to provide simple reference implementations of important or state-of-the-art algorithms. These implementations serve both as a validation of our design decisions as well as an important contribution to reproducibility in RL research. In this work we describe the major design decisions made within Acme and give further details as to how its components can be used to implement various algorithms. Our experiments provide baselines for a number of common and state-of-the-art algorithms as well as showing how these algorithms can be scaled up for much larger and more complex environments. This highlights one of the primary advantages of Acme, namely that it can be used to implement large, distributed RL algorithms that can run at massive scales while still maintaining the inherent readability of that implementation. This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme. △ Less

Submitted 20 September, 2022; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme

arXiv:1906.09831 [pdf, other]

Foolproof Cooperative Learning

Authors: Alexis Jacq, Julien Perolat, Matthieu Geist, Olivier Pietquin

Abstract: This paper extends the notion of learning equilibrium in game theory from matrix games to stochastic games. We introduce Foolproof Cooperative Learning (FCL), an algorithm that converges to a Tit-for-Tat behavior. It allows cooperative strategies when played against itself while being not exploitable by selfish players. We prove that in repeated symmetric games, this algorithm is a learning equili… ▽ More This paper extends the notion of learning equilibrium in game theory from matrix games to stochastic games. We introduce Foolproof Cooperative Learning (FCL), an algorithm that converges to a Tit-for-Tat behavior. It allows cooperative strategies when played against itself while being not exploitable by selfish players. We prove that in repeated symmetric games, this algorithm is a learning equilibrium. We illustrate the behavior of FCL on symmetric matrix and grid games, and its robustness to selfish learners. △ Less

Submitted 15 October, 2020; v1 submitted 24 June, 2019; originally announced June 2019.

Journal ref: Proceedings of The 12th Asian Conference on Machine Learning, PMLR 129:401-416, 2020

arXiv:1602.06703 [pdf, other]

Cognitive Architecture for Mutual Modelling

Authors: Alexis Jacq, Wafa Johal, Pierre Dillenbourg, Ana Paiva

Abstract: In social robotics, robots needs to be able to be understood by humans. Especially in collaborative tasks where they have to share mutual knowledge. For instance, in an educative scenario, learners share their knowledge and they must adapt their behaviour in order to make sure they are understood by others. Learners display behaviours in order to show their understanding and teachers adapt in orde… ▽ More In social robotics, robots needs to be able to be understood by humans. Especially in collaborative tasks where they have to share mutual knowledge. For instance, in an educative scenario, learners share their knowledge and they must adapt their behaviour in order to make sure they are understood by others. Learners display behaviours in order to show their understanding and teachers adapt in order to make sure that the learners' knowledge is the required one. This ability requires a model of their own mental states perceived by others: \textit{"has the human understood that I(robot) need this object for the task or should I explain it once again ?"} In this paper, we discuss the importance of a cognitive architecture enabling second-order Mutual Modelling for Human-Robot Interaction in educative contexts. △ Less

Submitted 22 February, 2016; originally announced February 2016.

Comments: Presented at "2nd Workshop on Cognitive Architectures for Social Human-Robot Interaction 2016 (arXiv:1602.01868)

Report number: CogArch4sHRI/2016/07

Showing 1–5 of 5 results for author: Jacq, A