Skip to main content

Showing 1–47 of 47 results for author: Dabney, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01800  [pdf, other

    cs.LG cs.AI

    Normalization and effective learning rates in reinforcement learning

    Authors: Clare Lyle, Zeyu Zheng, Khimya Khetarpal, James Martens, Hado van Hasselt, Razvan Pascanu, Will Dabney

    Abstract: Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature, with several works highlighting diverse benefits such as improving loss landscape conditioning and combatting overestimation bias. However, normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network paramet… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.02035  [pdf, other

    cs.LG cs.AI

    A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning

    Authors: Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Avila Pires, Yunhao Tang, Clare Lyle, Mark Rowland, Nicolas Heess, Diana Borsa, Arthur Guez, Will Dabney

    Abstract: Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents. Self-predictive learning provides means to jointly learn a latent representation and dynamics model by bootstrapping from future latent representations (BYOL). Recent work has developed theoretical insights into these algorithms by studying a continuous-time ODE model for self-predictive representation le… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2405.08448  [pdf, other

    cs.LG cs.AI

    Understanding the performance gap between online and offline alignment algorithms

    Authors: Yunhao Tang, Daniel Zhaohan Guo, Zeyu Zheng, Daniele Calandriello, Yuan Cao, Eugene Tarassov, Rémi Munos, Bernardo Ávila Pires, Michal Valko, Yong Cheng, Will Dabney

    Abstract: Reinforcement learning from human feedback (RLHF) is the canonical framework for large language model alignment. However, rising popularity in offline alignment algorithms challenge the need for on-policy sampling in RLHF. Within the context of reward over-optimization, we start with an opening set of experiments that demonstrate the clear advantage of online methods over offline methods. This pro… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  4. arXiv:2402.18762  [pdf, other

    cs.LG

    Disentangling the Causes of Plasticity Loss in Neural Networks

    Authors: Clare Lyle, Zeyu Zheng, Khimya Khetarpal, Hado van Hasselt, Razvan Pascanu, James Martens, Will Dabney

    Abstract: Underpinning the past decades of work on the design, initialization, and optimization of neural networks is a seemingly innocuous assumption: that the network is trained on a \textit{stationary} data distribution. In settings where this assumption is violated, e.g.\ deep reinforcement learning, learning algorithms become unstable and brittle with respect to hyperparameters and even random seeds. O… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  5. arXiv:2402.08530  [pdf, other

    cs.LG cs.AI stat.ML

    A Distributional Analogue to the Successor Representation

    Authors: Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, André Barreto, Will Dabney, Marc G. Bellemare, Mark Rowland

    Abstract: This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this beha… ▽ More

    Submitted 24 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024. First two authors contributed equally

  6. arXiv:2402.07598  [pdf, other

    cs.LG stat.ML

    Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

    Authors: Mark Rowland, Li Kevin Wenliang, Rémi Munos, Clare Lyle, Yunhao Tang, Will Dabney

    Abstract: We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions with a generative model (up to logarithmic factors), resolving an open question of Zhang et al. (2023). Our analysis provides new theoretical results on categorical approaches to distributional RL, and also introduces a new distributiona… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  7. arXiv:2402.05766  [pdf, other

    cs.LG

    Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling

    Authors: Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney

    Abstract: We introduce off-policy distributional Q($λ$), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional Q($λ$) does not apply importance sampling for off-policy learning, which introduces intriguing interactions with signed measures. Such unique properties distributional Q($λ$) from other existing alternatives such as distributional Retrace. We cha… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  8. arXiv:2306.10171  [pdf, other

    cs.LG cs.AI stat.ML

    Bootstrapped Representations in Reinforcement Learning

    Authors: Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G. Bellemare, Will Dabney

    Abstract: In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, auxiliary objectives are often incorporated i… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  9. arXiv:2305.18388  [pdf, other

    cs.LG stat.ML

    The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation

    Authors: Mark Rowland, Yunhao Tang, Clare Lyle, Rémi Munos, Marc G. Bellemare, Will Dabney

    Abstract: We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this task. We reach the surprising conclusion that even if a practitioner has no interest in the return distribution beyond the mean, QTD (which learns predictions abou… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: ICML 2023

  10. arXiv:2305.15555  [pdf, other

    cs.LG cs.AI

    Deep Reinforcement Learning with Plasticity Injection

    Authors: Evgenii Nikishin, Junhyuk Oh, Georg Ostrovski, Clare Lyle, Razvan Pascanu, Will Dabney, André Barreto

    Abstract: A growing body of evidence suggests that neural networks employed in deep reinforcement learning (RL) gradually lose their plasticity, the ability to learn from new data; however, the analysis and mitigation of this phenomenon is hampered by the complex relationship between plasticity, exploration, and performance in RL. This paper introduces plasticity injection, a minimalistic intervention that… ▽ More

    Submitted 3 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023 camera-ready

  11. arXiv:2305.00654  [pdf, other

    cs.LG cs.AI

    Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition

    Authors: Yash Chandak, Shantanu Thakoor, Zhaohan Daniel Guo, Yunhao Tang, Remi Munos, Will Dabney, Diana L Borsa

    Abstract: Representation learning and exploration are among the key challenges for any deep reinforcement learning agent. In this work, we provide a singular value decomposition based method that can be used to obtain representations that preserve the underlying transition structure in the domain. Perhaps interestingly, we show that these representations also capture the relative frequency of state visitati… ▽ More

    Submitted 2 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: Accepted at the 40th International Conference on Machine Learning (ICML 2023)

  12. arXiv:2303.01486  [pdf, other

    cs.LG

    Understanding plasticity in neural networks

    Authors: Clare Lyle, Zeyu Zheng, Evgenii Nikishin, Bernardo Avila Pires, Razvan Pascanu, Will Dabney

    Abstract: Plasticity, the ability of a neural network to quickly change its predictions in response to new information, is essential for the adaptability and robustness of deep reinforcement learning systems. Deep neural networks are known to lose plasticity over the course of training even in relatively simple learning problems, but the mechanisms driving this phenomenon are still poorly understood. This p… ▽ More

    Submitted 27 November, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted to ICML 2023 (oral presentation)

  13. arXiv:2301.04462  [pdf, other

    cs.LG stat.ML

    An Analysis of Quantile Temporal-Difference Learning

    Authors: Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney

    Abstract: We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning. Despite these empirical successes, a theoretical understanding of QTD has proven elusive until now. Unlike classical TD learning, which can be analysed with standard stochastic appro… ▽ More

    Submitted 20 May, 2024; v1 submitted 11 January, 2023; originally announced January 2023.

    Comments: Accepted to JMLR

  14. arXiv:2212.10420  [pdf, other

    cs.AI cs.LG math.ST

    Settling the Reward Hypothesis

    Authors: Michael Bowling, John D. Martin, David Abel, Will Dabney

    Abstract: The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)." We aim to fully settle this hypothesis. This will not conclude with a simple affirmation or refutation, but rather specify completely the implicit requirements on goals and purposes under which the hy… ▽ More

    Submitted 16 September, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  15. arXiv:2212.03319  [pdf, other

    cs.LG cs.AI

    Understanding Self-Predictive Learning for Reinforcement Learning

    Authors: Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

    Abstract: We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirabl… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  16. arXiv:2207.07570  [pdf, other

    cs.LG

    The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning

    Authors: Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare

    Abstract: We study the multi-step off-policy learning approach to distributional RL. Despite the apparent similarity between value-based RL and distributional RL, our study reveals intriguing and fundamental differences between the two cases in the multi-step setting. We identify a novel notion of path-dependent distributional TD error, which is indispensable for principled multi-step distributional RL. The… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

  17. arXiv:2206.08736  [pdf, other

    stat.ML cs.LG

    Generalised Policy Improvement with Geometric Policy Composition

    Authors: Shantanu Thakoor, Mark Rowland, Diana Borsa, Will Dabney, Rémi Munos, André Barreto

    Abstract: We introduce a method for policy improvement that interpolates between the greedy approach of value-based reinforcement learning (RL) and the full planning approach typical of model-based RL. The new method builds on the concept of a geometric horizon model (GHM, also known as a gamma-model), which models the discounted state-visitation distribution of a given policy. We show that we can evaluate… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: ICML 2022

  18. arXiv:2206.02126  [pdf, other

    cs.LG

    Learning Dynamics and Generalization in Reinforcement Learning

    Authors: Clare Lyle, Mark Rowland, Will Dabney, Marta Kwiatkowska, Yarin Gal

    Abstract: Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal difference algorithms to gain novel insight into the tension between these two objectives. We show theoretically that temporal difference learning encourages agents to… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

  19. arXiv:2204.09560  [pdf, other

    cs.LG

    Understanding and Preventing Capacity Loss in Reinforcement Learning

    Authors: Clare Lyle, Mark Rowland, Will Dabney

    Abstract: The reinforcement learning (RL) problem is rife with sources of non-stationarity, making it a notoriously difficult problem domain for the application of neural networks. We identify a mechanism by which non-stationary prediction targets can prevent learning progress in deep RL agents: \textit{capacity loss}, whereby networks trained on a sequence of target values lose their ability to quickly upd… ▽ More

    Submitted 4 May, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

    Comments: Presented at ICLR 2022

  20. arXiv:2111.00876  [pdf, other

    cs.LG cs.AI

    On the Expressivity of Markov Reward

    Authors: David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh

    Abstract: Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajector… ▽ More

    Submitted 18 January, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

    Comments: Accepted to NeurIPS 2021

  21. arXiv:2110.14020  [pdf, other

    cs.LG cs.AI

    The Difficulty of Passive Learning in Deep Reinforcement Learning

    Authors: Georg Ostrovski, Pablo Samuel Castro, Will Dabney

    Abstract: Learning to act from observational data without active environmental interaction is a well-known challenge in Reinforcement Learning (RL). Recent approaches involve constraints on the learned policy or conservative updates, preventing strong deviations from the state-action distribution of the dataset. Although these methods are evaluated using non-linear function approximation, theoretical justif… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: Accepted paper at NeurIPS 2021

  22. arXiv:2103.00107  [pdf, other

    cs.LG cs.AI stat.ML

    Revisiting Peng's Q($λ$) for Modern Reinforcement Learning

    Authors: Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel

    Abstract: Off-policy multi-step reinforcement learning algorithms consist of conservative and non-conservative algorithms: the former actively cut traces, whereas the latter do not. Recently, Munos et al. (2016) proved the convergence of conservative algorithms to an optimal Q-function. In contrast, non-conservative algorithms are thought to be unsafe and have a limited or no theoretical guarantee. Nonethel… ▽ More

    Submitted 26 February, 2021; originally announced March 2021.

    Comments: 26 pages, 7 figures, 2 tables

  23. arXiv:2102.13089  [pdf, other

    cs.LG

    On The Effect of Auxiliary Tasks on Representation Dynamics

    Authors: Clare Lyle, Mark Rowland, Georg Ostrovski, Will Dabney

    Abstract: While auxiliary tasks play a key role in shaping the representations learnt by reinforcement learning agents, much is still unknown about the mechanisms through which this is achieved. This work develops our understanding of the relationship between auxiliary tasks, environment structure, and representations by analysing the dynamics of temporal difference algorithms. Through this approach, we est… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

    Comments: AISTATS 2021

  24. arXiv:2011.09464  [pdf, other

    cs.LG

    Counterfactual Credit Assignment in Model-Free Reinforcement Learning

    Authors: Thomas Mesnard, Théophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Tom Stepleton, Nicolas Heess, Arthur Guez, Éric Moulines, Marcus Hutter, Lars Buesing, Rémi Munos

    Abstract: Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards. In particular, this requires separating skill from luck, i.e. disentangling the effect of an action on rewards from that of external factors and subsequent actions. To achieve this, we adapt the notion of counterfactuals from causality theory to a model-free RL setup. The key idea is to… ▽ More

    Submitted 14 December, 2021; v1 submitted 18 November, 2020; originally announced November 2020.

  25. arXiv:2007.06700  [pdf, other

    cs.LG stat.ML

    Revisiting Fundamentals of Experience Replay

    Authors: William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney

    Abstract: Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding. We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio). Our additive and a… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

    Comments: Published at ICML 2020. First two authors contributed equally and code available at https://github.com/google-research/google-research/tree/master/experience_replay

  26. arXiv:2007.03750  [pdf, other

    cs.AI cs.LG q-bio.NC

    Deep Reinforcement Learning and its Neuroscientific Implications

    Authors: Matthew Botvinick, Jane X. Wang, Will Dabney, Kevin J. Miller, Zeb Kurth-Nelson

    Abstract: The emergence of powerful artificial intelligence is defining new research directions in neuroscience. To date, this research has focused largely on deep neural networks trained using supervised learning, in tasks such as image classification. However, there is another area of recent AI work which has so far received less attention from neuroscientists, but which may have profound neuroscientific… ▽ More

    Submitted 7 July, 2020; originally announced July 2020.

    Comments: 22 pages, 5 figures

  27. arXiv:2006.02243  [pdf, other

    cs.LG stat.ML

    The Value-Improvement Path: Towards Better Representations for Reinforcement Learning

    Authors: Will Dabney, André Barreto, Mark Rowland, Robert Dadashi, John Quan, Marc G. Bellemare, David Silver

    Abstract: In value-based reinforcement learning (RL), unlike in supervised learning, the agent faces not a single, stationary, approximation problem, but a sequence of value prediction problems. Each time the policy improves, the nature of the problem changes, shifting both the distribution of states and their values. In this paper we take a novel perspective, arguing that the value prediction problems face… ▽ More

    Submitted 4 January, 2021; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: AAAI-21

  28. arXiv:2006.01782  [pdf, other

    cs.LG stat.ML

    Temporally-Extended ε-Greedy Exploration

    Authors: Will Dabney, Georg Ostrovski, André Barreto

    Abstract: Recent work on exploration in reinforcement learning (RL) has led to a series of increasingly complex solutions to the problem. This increase in complexity often comes at the expense of generality. Recent empirical studies suggest that, when applied to a broader set of domains, some sophisticated exploration methods are outperformed by simpler counterparts, such as ε-greedy. In this paper we propo… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

  29. arXiv:1912.06910  [pdf, other

    cs.LG cs.AI stat.ML

    Adapting Behaviour for Learning Progress

    Authors: Tom Schaul, Diana Borsa, David Ding, David Szepesvari, Georg Ostrovski, Will Dabney, Simon Osindero

    Abstract: Determining what experience to generate to best facilitate learning (i.e. exploration) is one of the distinguishing features and open challenges in reinforcement learning. The advent of distributed agents that interact with parallel instances of the environment has enabled larger scales and greater flexibility, but has not removed the need to tune exploration to the task, because the ideal data fo… ▽ More

    Submitted 14 December, 2019; originally announced December 2019.

  30. arXiv:1912.02503  [pdf, other

    cs.LG stat.ML

    Hindsight Credit Assignment

    Authors: Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Greg Wayne, Satinder Singh, Doina Precup, Remi Munos

    Abstract: We consider the problem of efficient credit assignment in reinforcement learning. In order to efficiently and meaningfully utilize new data, we propose to explicitly assign credit to past decisions based on the likelihood of them having led to the observed outcome. This approach uses new information in hindsight, rather than employing foresight. Somewhat surprisingly, we show that value functions… ▽ More

    Submitted 5 December, 2019; originally announced December 2019.

    Comments: NeurIPS 2019

  31. arXiv:1910.07479  [pdf, other

    cs.LG stat.ML

    Conditional Importance Sampling for Off-Policy Learning

    Authors: Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana Borsa, Tom Schaul, Rémi Munos, Will Dabney

    Abstract: The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms th… ▽ More

    Submitted 30 July, 2020; v1 submitted 16 October, 2019; originally announced October 2019.

    Comments: AISTATS 2020 camera-ready version

  32. arXiv:1910.07478  [pdf, other

    cs.LG stat.ML

    Adaptive Trade-Offs in Off-Policy Learning

    Authors: Mark Rowland, Will Dabney, Rémi Munos

    Abstract: A great variety of off-policy learning algorithms exist in the literature, and new breakthroughs in this area continue to be made, improving theoretical understanding and yielding state-of-the-art reinforcement learning algorithms. In this paper, we take a unifying view of this space of algorithms, and consider their trade-offs of three fundamental quantities: update variance, fixed-point bias, an… ▽ More

    Submitted 30 July, 2020; v1 submitted 16 October, 2019; originally announced October 2019.

    Comments: AISTATS 2020 camera-ready version

  33. arXiv:1906.05030  [pdf, other

    cs.LG cs.AI stat.ML

    Fast Task Inference with Variational Intrinsic Successor Features

    Authors: Steven Hansen, Will Dabney, Andre Barreto, Tom Van de Wiele, David Warde-Farley, Volodymyr Mnih

    Abstract: It has been established that diverse behaviors spanning the controllable subspace of an Markov decision process can be trained by rewarding a policy for being distinguishable from other policies \citep{gregor2016variational, eysenbach2018diversity, warde2018unsupervised}. However, one limitation of this formulation is generalizing behaviors beyond the finite set being explicitly learned, as is nee… ▽ More

    Submitted 27 January, 2020; v1 submitted 12 June, 2019; originally announced June 2019.

    Comments: Accepted for publication at ICLR 2020

  34. arXiv:1902.09996  [pdf, other

    cs.AI cs.LG stat.ML

    The Termination Critic

    Authors: Anna Harutyunyan, Will Dabney, Diana Borsa, Nicolas Heess, Remi Munos, Doina Precup

    Abstract: In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We propose an algorithm that focuses on the termination condition, as opposed to -- as is common -- the policy. The termination condition is usually trained to optimize a control objective: an option ought to terminate if another has better value. We offer a dif… ▽ More

    Submitted 26 February, 2019; originally announced February 2019.

    Comments: AISTATS 2019

  35. arXiv:1902.08102  [pdf, other

    stat.ML cs.LG

    Statistics and Samples in Distributional Reinforcement Learning

    Authors: Mark Rowland, Robert Dadashi, Saurabh Kumar, Rémi Munos, Marc G. Bellemare, Will Dabney

    Abstract: We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution. Our key insight is that DRL algorithms can be decomposed as the combination of some statistical estimator and a method for imputing a return distribution consistent with that set of statistics. With this new und… ▽ More

    Submitted 21 February, 2019; originally announced February 2019.

  36. arXiv:1901.11530  [pdf, other

    cs.LG cs.AI stat.ML

    A Geometric Perspective on Optimal Representations for Reinforcement Learning

    Authors: Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle

    Abstract: We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functions. We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary po… ▽ More

    Submitted 25 June, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

  37. arXiv:1806.06923  [pdf, other

    cs.LG cs.AI stat.ML

    Implicit Quantile Networks for Distributional Reinforcement Learning

    Authors: Will Dabney, Georg Ostrovski, David Silver, Rémi Munos

    Abstract: In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined re… ▽ More

    Submitted 14 June, 2018; originally announced June 2018.

    Comments: ICML 2018

  38. arXiv:1806.05575  [pdf, other

    cs.LG stat.ML

    Autoregressive Quantile Networks for Generative Modeling

    Authors: Georg Ostrovski, Will Dabney, Rémi Munos

    Abstract: We introduce autoregressive implicit quantile networks (AIQN), a fundamentally different approach to generative modeling than those commonly used, that implicitly captures the distribution using quantile regression. AIQN is able to achieve superior perceptual quality and improvements in evaluation metrics, without incurring a loss of sample diversity. The method can be applied to many existing mod… ▽ More

    Submitted 14 June, 2018; originally announced June 2018.

    Comments: ICML 2018

  39. arXiv:1805.04955  [pdf, other

    cs.LG cs.AI stat.ML

    Low-pass Recurrent Neural Networks - A memory architecture for longer-term correlation discovery

    Authors: Thomas Stepleton, Razvan Pascanu, Will Dabney, Siddhant M. Jayakumar, Hubert Soyer, Remi Munos

    Abstract: Reinforcement learning (RL) agents performing complex tasks must be able to remember observations and actions across sizable time intervals. This is especially true during the initial learning stages, when exploratory behaviour can increase the delay between specific actions and their effects. Many new or popular approaches for learning these distant correlations employ backpropagation through tim… ▽ More

    Submitted 13 May, 2018; originally announced May 2018.

  40. arXiv:1804.08617  [pdf, other

    cs.LG cs.AI stat.ML

    Distributed Distributional Deterministic Policy Gradients

    Authors: Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, Timothy Lillicrap

    Abstract: This work adopts the very successful distributional perspective on reinforcement learning and adapts it to the continuous control setting. We combine this within a distributed framework for off-policy learning in order to develop what we call the Distributed Distributional Deep Deterministic Policy Gradient algorithm, D4PG. We also combine this technique with a number of additional, simple improve… ▽ More

    Submitted 23 April, 2018; originally announced April 2018.

  41. arXiv:1710.10044  [pdf, other

    cs.AI cs.LG stat.ML

    Distributional Reinforcement Learning with Quantile Regression

    Authors: Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos

    Abstract: In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the observed long-term return. Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. In this paper, we build on… ▽ More

    Submitted 27 October, 2017; originally announced October 2017.

  42. arXiv:1710.02298  [pdf, other

    cs.AI cs.LG

    Rainbow: Combining Improvements in Deep Reinforcement Learning

    Authors: Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver

    Abstract: The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari 260… ▽ More

    Submitted 6 October, 2017; originally announced October 2017.

    Comments: Under review as a conference paper at AAAI 2018

  43. arXiv:1707.06887  [pdf, other

    cs.LG cs.AI stat.ML

    A Distributional Perspective on Reinforcement Learning

    Authors: Marc G. Bellemare, Will Dabney, Rémi Munos

    Abstract: In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which models the expectation of this return, or value. Although there is an established body of literature studying the value distribution, thus far it has always been… ▽ More

    Submitted 21 July, 2017; originally announced July 2017.

    Comments: ICML 2017

  44. arXiv:1705.10743  [pdf, other

    cs.LG stat.ML

    The Cramer Distance as a Solution to Biased Wasserstein Gradients

    Authors: Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, Rémi Munos

    Abstract: The Wasserstein probability metric has received much attention from the machine learning community. Unlike the Kullback-Leibler divergence, which strictly measures change in probability, the Wasserstein metric reflects the underlying geometry between outcomes. The value of being sensitive to this geometry has been demonstrated, among others, in ordinal regression and generative modelling. In this… ▽ More

    Submitted 30 May, 2017; originally announced May 2017.

  45. arXiv:1704.04651  [pdf, other

    cs.AI

    The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning

    Authors: Audrunas Gruslys, Will Dabney, Mohammad Gheshlaghi Azar, Bilal Piot, Marc Bellemare, Remi Munos

    Abstract: In this work we present a new agent architecture, called Reactor, which combines multiple algorithmic and architectural contributions to produce an agent with higher sample-efficiency than Prioritized Dueling DQN (Wang et al., 2016) and Categorical DQN (Bellemare et al., 2017), while giving better run-time performance than A3C (Mnih et al., 2016). Our first contribution is a new policy evaluation… ▽ More

    Submitted 19 June, 2018; v1 submitted 15 April, 2017; originally announced April 2017.

  46. arXiv:1606.05312  [pdf, other

    cs.AI

    Successor Features for Transfer in Reinforcement Learning

    Authors: André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, David Silver

    Abstract: Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks. We propose a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same. Our approach rests on two key ideas: "successor features", a value function representation that decouples the dynamics o… ▽ More

    Submitted 12 April, 2018; v1 submitted 16 June, 2016; originally announced June 2016.

    Comments: Published at NIPS 2017

  47. arXiv:1405.6757  [pdf, other

    cs.LG

    Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces

    Authors: Sridhar Mahadevan, Bo Liu, Philip Thomas, Will Dabney, Steve Giguere, Nicholas Jacek, Ian Gemp, Ji Liu

    Abstract: In this paper, we set forth a new vision of reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding important questions that have remained unresolved: (i) how to design reliable, convergent, and robust reinforcement learning algorithms (ii) how to guarantee that reinforcement learning satisfies pre-specified "safety" guarant… ▽ More

    Submitted 26 May, 2014; originally announced May 2014.

    Comments: 121 pages