Zum Hauptinhalt springen

Showing 1–50 of 62 results for author: Bellemare, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.00244  [pdf, other

    cs.CL

    Controlling Large Language Model Agents with Entropic Activation Steering

    Authors: Nate Rahn, Pierluca D'Oro, Marc G. Bellemare

    Abstract: The generality of pretrained large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. To be successful, such agents must form beliefs about how to achieve their goals based on limited interaction with their environment, resulting in uncertainty about the best action to take at each step. In this paper, we study how LLM agents form and act on these b… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  2. arXiv:2402.08530  [pdf, other

    cs.LG cs.AI stat.ML

    A Distributional Analogue to the Successor Representation

    Authors: Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, André Barreto, Will Dabney, Marc G. Bellemare, Mark Rowland

    Abstract: This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this beha… ▽ More

    Submitted 24 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024. First two authors contributed equally

  3. arXiv:2311.17894  [pdf, other

    cond-mat.mes-hall cond-mat.mtrl-sci cs.LG

    Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

    Authors: Max Schwarzer, Jesse Farebrother, Joshua Greaves, Ekin Dogus Cubuk, Rishabh Agarwal, Aaron Courville, Marc G. Bellemare, Sergei Kalinin, Igor Mordatch, Pablo Samuel Castro, Kevin M. Roccapriore

    Abstract: We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimulated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural n… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  4. arXiv:2310.03882  [pdf, other

    cs.LG cs.AI

    Small batch deep reinforcement learning

    Authors: Johan Obando-Ceron, Marc G. Bellemare, Pablo Samuel Castro

    Abstract: In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant pe… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: Published at NeurIPS 2023

  5. arXiv:2309.14597  [pdf, other

    cs.LG

    Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

    Authors: Nate Rahn, Pierluca D'Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare

    Abstract: Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy param… ▽ More

    Submitted 10 April, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: NeurIPS 2023 Accepted Paper. The first two authors contributed equally

  6. arXiv:2306.10171  [pdf, other

    cs.LG cs.AI stat.ML

    Bootstrapped Representations in Reinforcement Learning

    Authors: Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G. Bellemare, Will Dabney

    Abstract: In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, auxiliary objectives are often incorporated i… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  7. arXiv:2305.19452  [pdf, other

    cs.LG cs.AI

    Bigger, Better, Faster: Human-level Atari with human-level efficiency

    Authors: Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro

    Abstract: We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a dis… ▽ More

    Submitted 13 November, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: ICML 2023, revised version

  8. arXiv:2305.18388  [pdf, other

    cs.LG stat.ML

    The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation

    Authors: Mark Rowland, Yunhao Tang, Clare Lyle, Rémi Munos, Marc G. Bellemare, Will Dabney

    Abstract: We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this task. We reach the surprising conclusion that even if a practitioner has no interest in the return distribution beyond the mean, QTD (which learns predictions abou… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: ICML 2023

  9. arXiv:2304.12567  [pdf, other

    cs.LG cs.AI stat.ML

    Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

    Authors: Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G. Bellemare

    Abstract: Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well understood; in practice, however, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treate… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: ICLR 2023. Code and models are available at https://github.com/google-research/google-research/tree/master/pvn 22 pages, 8 figures

  10. arXiv:2301.07385  [pdf, other

    cs.CV

    Three-dimensional reconstruction and characterization of bladder deformations

    Authors: Augustin C. Ogier, Stanislas Rapacchi, Marc-Emmanuel Bellemare

    Abstract: Background and Objective: Pelvic floor disorders are prevalent diseases and patient care remains difficult as the dynamics of the pelvic floor remains poorly known. So far, only 2D dynamic observations of straining exercises at excretion are available in the clinics and the understanding of three-dimensional pelvic organs mechanical defects is not yet achievable. In this context, we proposed a com… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

    Comments: 17 pages, 7 figures, full article paper

  11. arXiv:2301.04462  [pdf, other

    cs.LG stat.ML

    An Analysis of Quantile Temporal-Difference Learning

    Authors: Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney

    Abstract: We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning. Despite these empirical successes, a theoretical understanding of QTD has proven elusive until now. Unlike classical TD learning, which can be analysed with standard stochastic appro… ▽ More

    Submitted 20 May, 2024; v1 submitted 11 January, 2023; originally announced January 2023.

    Comments: Accepted to JMLR

  12. arXiv:2212.04025  [pdf, other

    cs.LG cs.AI stat.ML

    A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces

    Authors: Charline Le Lan, Joshua Greaves, Jesse Farebrother, Mark Rowland, Fabian Pedregosa, Rishabh Agarwal, Marc G. Bellemare

    Abstract: Many machine learning problems encode their data as a matrix with a possibly very large number of rows and columns. In several applications like neuroscience, image compression or deep reinforcement learning, the principal subspace of such a matrix provides a useful, low-dimensional representation of individual data. Here, we are interested in determining the $d$-dimensional principal subspace of… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: 8 pages in main content, 2 pages of bibliography and 5 pages in Appendix

  13. arXiv:2207.07570  [pdf, other

    cs.LG

    The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning

    Authors: Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare

    Abstract: We study the multi-step off-policy learning approach to distributional RL. Despite the apparent similarity between value-based RL and distributional RL, our study reveals intriguing and fundamental differences between the two cases in the multi-step setting. We identify a novel notion of path-dependent distributional TD error, which is indispensable for principled multi-step distributional RL. The… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

  14. arXiv:2206.01626  [pdf, other

    cs.LG cs.AI stat.ML

    Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress

    Authors: Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

    Abstract: Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research. However, RL systems, when applied to large-scale settings, rarely operate tabula rasa. Such large-scale systems undergo multiple design or algorithmic changes during their development cycle and use ad hoc approaches for incorporating these changes without re-training from s… ▽ More

    Submitted 4 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022. Code and agents at https://agarwl.github.io/reincarnating_rl

  15. arXiv:2205.12184  [pdf, other

    cs.LG math.OC stat.ML

    Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning

    Authors: Harley Wiltzer, David Meger, Marc G. Bellemare

    Abstract: Continuous-time reinforcement learning offers an appealing formalism for describing control problems in which the passage of time is not naturally divided into discrete increments. Here we consider the problem of predicting the distribution of returns obtained by an agent interacting in a continuous-time, stochastic environment. Accurate return predictions have proven useful for determining optima… ▽ More

    Submitted 17 June, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022

  16. arXiv:2203.00543  [pdf, other

    cs.LG cs.AI stat.ML

    On the Generalization of Representations in Reinforcement Learning

    Authors: Charline Le Lan, Stephen Tu, Adam Oberman, Rishabh Agarwal, Marc G. Bellemare

    Abstract: In reinforcement learning, state representations are used to tractably deal with large problem spaces. State representations serve both to approximate the value function with few parameters, but also to generalize to newly encountered states. Their features may be learned implicitly (as part of a neural network) or explicitly (for example, the successor representation of \citet{dayan1993improving}… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Comments: Accepted at AISTATS22

  17. arXiv:2110.08654  [pdf, other

    cs.NI

    LEO Satellites in 5G and Beyond Networks: A Review from a Standardization Perspective

    Authors: Tasneem Darwish, Gunes Karabulut Kurt, Halim Yanikomeroglu, Michel Bellemare, Guillaume Lamontagne

    Abstract: Low Earth Orbit (LEO) Satellite Network (SatNet) with their mega-constellations are expected to play a key role in providing ubiquitous Internet and communications services in the future. LEO SatNets will provide wide-area coverage and support service availability, continuity, and scalability. To support the integration of SatNets and terrestrial Fifth Generation (5G)networks and beyond, the satel… ▽ More

    Submitted 16 October, 2021; originally announced October 2021.

  18. arXiv:2109.11052  [pdf, other

    cs.LG

    On Bonus-Based Exploration Methods in the Arcade Learning Environment

    Authors: Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare

    Abstract: Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma's Revenge (Bellemare et al., 2016). Recently, bonus-based exploration methods, which explore by augmenting the environment reward, have reached above-human average performance on such domains. In this paper we reassess popular bonus-base… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

    Comments: Full version of arXiv:1908.02388

    Journal ref: Published as a conference paper at ICLR 2020

  19. arXiv:2108.13264  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Deep Reinforcement Learning at the Edge of the Statistical Precipice

    Authors: Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

    Abstract: Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Lea… ▽ More

    Submitted 5 January, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

    Comments: Outstanding Paper Award at NeurIPS 2021. Website: https://agarwl.github.io/rliable. 28 Pages, 33 Figures

  20. arXiv:2102.01514  [pdf, other

    cs.LG cs.AI stat.ML

    Metrics and continuity in reinforcement learning

    Authors: Charline Le Lan, Marc G. Bellemare, Pablo Samuel Castro

    Abstract: In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible. Instead, researchers often leverage state similarity (whether explicitly or implicitly) to build models that can generalize well from a limited set of samples. The notion of state similarity used, and the neighbourhoods and top… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

    Comments: Accepted at AAAI 2021

  21. Location Management in IP-based Future LEO Satellite Networks: A Review

    Authors: Tasneem Darwish, Gunes Kurt, Halim Yanikomeroglu, Guillaume Lamontagne, Michel Bellemare

    Abstract: Future integrated terrestrial, aerial, and space networks will involve thousands of Low Earth Orbit (LEO) satellites forming a network of mega-constellations, which will play a significant role in providing communication and Internet services everywhere, at any time, and for everything. Due to its very large scale and highly dynamic nature, future LEO satellite networks (SatNets) management is a v… ▽ More

    Submitted 11 June, 2021; v1 submitted 20 January, 2021; originally announced January 2021.

    Comments: Submitted to the Proceedings of the IEEE

    Journal ref: IEEE Open Journal of the Communications Society, vol. 3, pp. 1035-1062, 2022

  22. arXiv:2101.05265  [pdf, other

    cs.LG cs.AI stat.ML

    Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

    Authors: Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare

    Abstract: Reinforcement learning methods trained on few environments rarely learn policies that generalize to unseen environments. To improve generalization, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process. This approach is orthogonal to recent approaches, which rarely exploit this structure explicitly. Specifically, we introduce a theoreti… ▽ More

    Submitted 18 March, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

    Comments: ICLR 2021 (Spotlight). Website: https://agarwl.github.io/pse

  23. Characterization of surface motion patterns in highly deformable soft tissue organs from dynamic MRI: An application to assess 4D bladder motion

    Authors: Karim Makki, Amine Bohi, Augustin . C Ogier, Marc Emmanuel Bellemare

    Abstract: Dynamic MRI may capture temporal anatomical changes in soft tissue organs with high contrast but the obtained sequences usually suffer from limited volume coverage which makes the high resolution reconstruction of organ shape trajectories a major challenge in temporal studies. Because of the variability of abdominal organ shapes across time and subjects, the objective of this study is to go toward… ▽ More

    Submitted 14 November, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: arXiv admin note: text overlap with arXiv:2003.08332

  24. arXiv:2009.06799  [pdf, other

    cs.AI cs.LG

    The Importance of Pessimism in Fixed-Dataset Policy Optimization

    Authors: Jacob Buckman, Carles Gelada, Marc G. Bellemare

    Abstract: We study worst-case guarantees on the expected return of fixed-dataset policy optimization algorithms. Our core contribution is a unified conceptual and mathematical framework for the study of algorithms in this regime. This analysis reveals that for naive approaches, the possibility of erroneous value overestimation leads to a difficult-to-satisfy requirement: in order to guarantee that we select… ▽ More

    Submitted 29 November, 2020; v1 submitted 14 September, 2020; originally announced September 2020.

  25. arXiv:2007.05520  [pdf, other

    cs.LG cs.AI stat.ML

    Representations for Stable Off-Policy Reinforcement Learning

    Authors: Dibya Ghosh, Marc G. Bellemare

    Abstract: Reinforcement learning with function approximation can be unstable and even divergent, especially when combined with off-policy learning and Bellman updates. In deep reinforcement learning, these issues have been dealt with empirically by adapting and regularizing the representation, in particular with auxiliary tasks. This suggests that representation learning may provide a means to guarantee sta… ▽ More

    Submitted 2 October, 2020; v1 submitted 10 July, 2020; originally announced July 2020.

    Comments: ICML 2020

  26. arXiv:2006.02243  [pdf, other

    cs.LG stat.ML

    The Value-Improvement Path: Towards Better Representations for Reinforcement Learning

    Authors: Will Dabney, André Barreto, Mark Rowland, Robert Dadashi, John Quan, Marc G. Bellemare, David Silver

    Abstract: In value-based reinforcement learning (RL), unlike in supervised learning, the agent faces not a single, stationary, approximation problem, but a sequence of value prediction problems. Each time the policy improves, the nature of the problem changes, shifting both the distribution of states and their values. In this paper we take a novel perspective, arguing that the value prediction problems face… ▽ More

    Submitted 4 January, 2021; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: AAAI-21

  27. arXiv:2003.12239  [pdf, other

    cs.LG cs.AI stat.ML

    A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

    Authors: Philip Amortila, Doina Precup, Prakash Panangaden, Marc G. Bellemare

    Abstract: We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD($λ$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions,… ▽ More

    Submitted 27 March, 2020; originally announced March 2020.

    Comments: AISTATS 2020

  28. arXiv:2003.08332  [pdf, other

    cs.CV math.AP

    A new geodesic-based feature for characterization of 3D shapes: application to soft tissue organ temporal deformations

    Authors: Karim Makki, Amine Bohi, Augustin C. Ogier, Marc-Emmanuel Bellemare

    Abstract: In this paper, we propose a method for characterizing 3D shapes from point clouds and we show a direct application on a study of organ temporal deformations. As an example, we characterize the behavior of a bladder during a forced respiratory motion with a reduced number of 3D surface points: first, a set of equidistant points representing the vertices of quadrilateral mesh for the surface in the… ▽ More

    Submitted 18 March, 2020; originally announced March 2020.

  29. arXiv:2003.04069  [pdf, other

    cs.LG stat.ML

    Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

    Authors: Ahmed Touati, Adrien Ali Taiga, Marc G. Bellemare

    Abstract: Despite the wealth of research into provably efficient reinforcement learning algorithms, most works focus on tabular representation and thus struggle to handle exponentially or infinitely large state-action spaces. In this paper, we consider episodic reinforcement learning with a continuous state-action space which is assumed to be equipped with a natural metric that characterizes the proximity b… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

  30. arXiv:2002.12499  [pdf, other

    cs.LG cs.AI stat.ML

    On Catastrophic Interference in Atari 2600 Games

    Authors: William Fedus, Dibya Ghosh, John D. Martin, Marc G. Bellemare, Yoshua Bengio, Hugo Larochelle

    Abstract: Model-free deep reinforcement learning is sample inefficient. One hypothesis -- speculated, but not confirmed -- is that catastrophic interference within an environment inhibits learning. We test this hypothesis through a large-scale empirical study in the Arcade Learning Environment (ALE) and, indeed, find supporting evidence. We show that interference causes performance to plateau; the network c… ▽ More

    Submitted 9 June, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

    Comments: First two authors contributed equally. Code available to reproduce experiments at https://github.com/google-research/google-research/tree/master/memento

  31. arXiv:1911.12511  [pdf, other

    cs.AI cs.LG

    Algorithmic Improvements for Deep Reinforcement Learning applied to Interactive Fiction

    Authors: Vishal Jain, William Fedus, Hugo Larochelle, Doina Precup, Marc G. Bellemare

    Abstract: Text-based games are a natural challenge domain for deep reinforcement learning algorithms. Their state and action spaces are combinatorially large, their reward function is sparse, and they are partially observable: the agent is informed of the consequences of its actions through textual feedback. In this paper we emphasize this latter point and consider the design of a deep reinforcement learnin… ▽ More

    Submitted 27 November, 2019; originally announced November 2019.

    Comments: To appear in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20). Accepted for Oral presentation

  32. arXiv:1908.02388  [pdf, other

    cs.LG stat.ML

    Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment

    Authors: Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare

    Abstract: This paper provides an empirical evaluation of recently developed exploration algorithms within the Arcade Learning Environment (ALE). We study the use of different reward bonuses that incentives exploration in reinforcement learning. We do so by fixing the learning algorithm used and focusing only on the impact of the different exploration bonuses in the agent's performance. We use Rainbow, the s… ▽ More

    Submitted 24 September, 2021; v1 submitted 6 August, 2019; originally announced August 2019.

    Comments: Accepted at the second Exploration in Reinforcement Learning Workshop at the 36th International Conference on Machine Learning, Long Beach, California. The full version arxiv.org/abs/2109.11052 was published as a conference paper at ICLR 2020

  33. arXiv:1906.02736  [pdf, other

    cs.LG stat.ML

    DeepMDP: Learning Continuous Latent Space Models for Representation Learning

    Authors: Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. Bellemare

    Abstract: Many reinforcement learning (RL) tasks provide the agent with high-dimensional observations that can be simplified into low-dimensional continuous states. To formalize this process, we introduce the concept of a DeepMDP, a parameterized latent space model that is trained via the minimization of two tractable losses: prediction of rewards and prediction of the distribution over next latent states.… ▽ More

    Submitted 6 June, 2019; originally announced June 2019.

    Comments: 13 pages main text, 16 pages appendix. ICML 2019

  34. arXiv:1902.08102  [pdf, other

    stat.ML cs.LG

    Statistics and Samples in Distributional Reinforcement Learning

    Authors: Mark Rowland, Robert Dadashi, Saurabh Kumar, Rémi Munos, Marc G. Bellemare, Will Dabney

    Abstract: We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution. Our key insight is that DRL algorithms can be decomposed as the combination of some statistical estimator and a method for imputing a return distribution consistent with that set of statistics. With this new und… ▽ More

    Submitted 21 February, 2019; originally announced February 2019.

  35. arXiv:1902.06865  [pdf, other

    stat.ML cs.LG

    Hyperbolic Discounting and Learning over Multiple Horizons

    Authors: William Fedus, Carles Gelada, Yoshua Bengio, Marc G. Bellemare, Hugo Larochelle

    Abstract: Reinforcement learning (RL) typically defines a discount factor as part of the Markov Decision Process. The discount factor values future rewards by an exponential scheme that leads to theoretical convergence guarantees of the Bellman equation. However, evidence from psychology, economics and neuroscience suggests that humans and animals instead have hyperbolic time-preferences. In this work we re… ▽ More

    Submitted 28 February, 2019; v1 submitted 18 February, 2019; originally announced February 2019.

  36. arXiv:1902.03149  [pdf, other

    cs.LG stat.ML

    Distributional reinforcement learning with linear function approximation

    Authors: Marc G. Bellemare, Nicolas Le Roux, Pablo Samuel Castro, Subhodeep Moitra

    Abstract: Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited. One exception is Rowland et al. (2018)'s analysis of the C51 algorithm in terms of the Cramér distance, but their results only apply to the tabular setting and ignore C51's use of a softmax to produce normalized distributions. In this paper we adapt the Cramé… ▽ More

    Submitted 8 February, 2019; originally announced February 2019.

    Comments: To appear

    Journal ref: Proceedings of AISTATS 2019

  37. The Hanabi Challenge: A New Frontier for AI Research

    Authors: Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, Michael Bowling

    Abstract: From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains… ▽ More

    Submitted 6 December, 2019; v1 submitted 1 February, 2019; originally announced February 2019.

    Comments: 32 pages, 5 figures, In Press (Artificial Intelligence)

  38. arXiv:1901.11530  [pdf, other

    cs.LG cs.AI stat.ML

    A Geometric Perspective on Optimal Representations for Reinforcement Learning

    Authors: Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle

    Abstract: We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functions. We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary po… ▽ More

    Submitted 25 June, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

  39. arXiv:1901.11528  [pdf, other

    cs.HC cs.AI cs.CL cs.LG

    Shaping the Narrative Arc: An Information-Theoretic Approach to Collaborative Dialogue

    Authors: Kory W. Mathewson, Pablo Samuel Castro, Colin Cherry, George Foster, Marc G. Bellemare

    Abstract: We consider the problem of designing an artificial agent capable of interacting with humans in collaborative dialogue to produce creative, engaging narratives. In this task, the goal is to establish universe details, and to collaborate on an interesting story in that universe, through a series of natural dialogue exchanges. Our model can augment any probabilistic conversational agent by allowing i… ▽ More

    Submitted 31 January, 2019; originally announced January 2019.

    Comments: 20 pages, 9 figures

  40. arXiv:1901.11524  [pdf, other

    cs.LG cs.AI stat.ML

    The Value Function Polytope in Reinforcement Learning

    Authors: Robert Dadashi, Adrien Ali Taïga, Nicolas Le Roux, Dale Schuurmans, Marc G. Bellemare

    Abstract: We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes. Our main contribution is the characterization of the nature of its shape: a general polytope (Aigner et al., 2010). To demonstrate this result, we exhibit several properties of the structural relationship between policies and value functions including the line theorem… ▽ More

    Submitted 15 May, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

  41. arXiv:1901.11084  [pdf, other

    cs.LG stat.ML

    A Comparative Analysis of Expected and Distributional Reinforcement Learning

    Authors: Clare Lyle, Pablo Samuel Castro, Marc G. Bellemare

    Abstract: Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results relative to the standard approach which models expected values (expected RL). However, aside from convergence guarantees, there have been few theoretical results investigating the reasons behind the improvements distributional RL provides. In this paper we begin… ▽ More

    Submitted 21 February, 2019; v1 submitted 30 January, 2019; originally announced January 2019.

    Comments: To appear in the Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence

  42. arXiv:1901.09455  [pdf, other

    cs.LG stat.ML

    Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift

    Authors: Carles Gelada, Marc G. Bellemare

    Abstract: In this paper we revisit the method of off-policy corrections for reinforcement learning (COP-TD) pioneered by Hallak et al. (2017). Under this method, online updates to the value function are reweighted to avoid divergence issues typical of off-policy learning. While Hallak et al.'s solution is appealing, it cannot easily be transferred to nonlinear function approximation. First, it requires a pr… ▽ More

    Submitted 27 January, 2019; originally announced January 2019.

    Comments: AAAI 2019

  43. arXiv:1812.07069  [pdf, other

    cs.NE

    An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents

    Authors: Felipe Petroski Such, Vashisht Madhavan, Rosanne Liu, Rui Wang, Pablo Samuel Castro, Yulun Li, Jiale Zhi, Ludwig Schubert, Marc G. Bellemare, Jeff Clune, Joel Lehman

    Abstract: Much human and computational effort has aimed to improve how deep reinforcement learning algorithms perform on benchmarks such as the Atari Learning Environment. Comparatively less effort has focused on understanding what has been learned by such methods, and investigating and comparing the representations learned by different families of reinforcement learning (RL) algorithms. Sources of friction… ▽ More

    Submitted 29 May, 2019; v1 submitted 17 December, 2018; originally announced December 2018.

  44. arXiv:1812.06110  [pdf, other

    cs.LG cs.AI

    Dopamine: A Research Framework for Deep Reinforcement Learning

    Authors: Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, Marc G. Bellemare

    Abstract: Deep reinforcement learning (deep RL) research has grown significantly in recent years. A number of software offerings now exist that provide stable, comprehensive implementations for benchmarking. At the same time, recent deep RL research has become more diverse in its goals. In this paper we introduce Dopamine, a new research framework for deep RL that aims to support some of that diversity. Dop… ▽ More

    Submitted 14 December, 2018; originally announced December 2018.

  45. arXiv:1811.12560  [pdf, other

    cs.LG cs.AI stat.ML

    An Introduction to Deep Reinforcement Learning

    Authors: Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare, Joelle Pineau

    Abstract: Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. This field of research has been able to solve a wide range of complex decision-making tasks that were previously out of reach for a machine. Thus, deep RL opens up many new applications in domains such as healthcare, robotics, smart grids, finance, and many more. This manuscript provides an introductio… ▽ More

    Submitted 3 December, 2018; v1 submitted 29 November, 2018; originally announced November 2018.

    Journal ref: Foundations and Trends in Machine Learning: Vol. 11, No. 3-4, 2018

  46. arXiv:1811.07004  [pdf, ps, other

    cs.AI cs.LG

    The Barbados 2018 List of Open Issues in Continual Learning

    Authors: Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

    Abstract: We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments. The purpose of this report is to sketch a research outline, share some of the most important open issues we are facing, and stimulate further discussion in the community. The content is based on some of our discussions during a week-… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

    Comments: NIPS Continual Learning Workshop 2018

  47. arXiv:1808.09819  [pdf, other

    cs.LG cs.AI stat.ML

    Approximate Exploration through State Abstraction

    Authors: Adrien Ali Taïga, Aaron Courville, Marc G. Bellemare

    Abstract: Although exploration in reinforcement learning is well understood from a theoretical point of view, provably correct methods remain impractical. In this paper we study the interplay between exploration and approximation, what we call approximate exploration. Our main goal is to further our theoretical understanding of pseudo-count based exploration bonuses (Bellemare et al., 2016), a practical exp… ▽ More

    Submitted 24 January, 2019; v1 submitted 29 August, 2018; originally announced August 2018.

  48. arXiv:1807.11622  [pdf, other

    cs.LG cs.AI stat.ML

    Count-Based Exploration with the Successor Representation

    Authors: Marlos C. Machado, Marc G. Bellemare, Michael Bowling

    Abstract: In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required. Our approach is based on the successor representation (SR), which was originally introduced as a representation defining state generalization by… ▽ More

    Submitted 26 November, 2019; v1 submitted 30 July, 2018; originally announced July 2018.

    Comments: This paper appears in the Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020)

  49. arXiv:1710.10044  [pdf, other

    cs.AI cs.LG stat.ML

    Distributional Reinforcement Learning with Quantile Regression

    Authors: Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos

    Abstract: In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the observed long-term return. Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. In this paper, we build on… ▽ More

    Submitted 27 October, 2017; originally announced October 2017.

  50. arXiv:1709.06009  [pdf, other

    cs.LG

    Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents

    Authors: Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, Michael Bowling

    Abstract: The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games. It supports a variety of different problem settings and it has been receiving increasing attention from the scientific community, leading to some high-profile success stories such as the much publicized Deep Q-Networks (DQN). In t… ▽ More

    Submitted 30 November, 2017; v1 submitted 18 September, 2017; originally announced September 2017.