Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: Wiltzer, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.08530  [pdf, other

    cs.LG cs.AI stat.ML

    A Distributional Analogue to the Successor Representation

    Authors: Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, André Barreto, Will Dabney, Marc G. Bellemare, Mark Rowland

    Abstract: This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this beha… ▽ More

    Submitted 24 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024. First two authors contributed equally

  2. arXiv:2309.14597  [pdf, other

    cs.LG

    Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

    Authors: Nate Rahn, Pierluca D'Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare

    Abstract: Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy param… ▽ More

    Submitted 10 April, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: NeurIPS 2023 Accepted Paper. The first two authors contributed equally

  3. arXiv:2205.12184  [pdf, other

    cs.LG math.OC stat.ML

    Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning

    Authors: Harley Wiltzer, David Meger, Marc G. Bellemare

    Abstract: Continuous-time reinforcement learning offers an appealing formalism for describing control problems in which the passage of time is not naturally divided into discrete increments. Here we consider the problem of predicting the distribution of returns obtained by an agent interacting in a continuous-time, stochastic environment. Accurate return predictions have proven useful for determining optima… ▽ More

    Submitted 17 June, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022