Zum Hauptinhalt springen

Showing 1–12 of 12 results for author: Hunt, J J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2210.01542  [pdf, other

    cs.LG cs.AI

    Hyperbolic Deep Reinforcement Learning

    Authors: Edoardo Cetin, Benjamin Chamberlain, Michael Bronstein, Jonathan J Hunt

    Abstract: We propose a new class of deep reinforcement learning (RL) algorithms that model latent representations in hyperbolic space. Sequential decision-making requires reasoning about the possible future consequences of current behavior. Consequently, capturing the relationship between key evolving features for a given task is conducive to recovering effective policies. To this end, hyperbolic geometry p… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Comments: Preprint

  2. arXiv:2208.06193  [pdf, other

    cs.LG stat.ML

    Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

    Authors: Zhendong Wang, Jonathan J Hunt, Mingyuan Zhou

    Abstract: Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously collected static dataset, is an important paradigm of RL. Standard RL methods often perform poorly in this regime due to the function approximation errors on out-of-distribution actions. While a variety of regularization methods have been proposed to mitigate this issue, they are often constrained by poli… ▽ More

    Submitted 25 August, 2023; v1 submitted 12 August, 2022; originally announced August 2022.

    Comments: ICLR 2023

  3. arXiv:2202.08812  [pdf, other

    cs.IR cs.LG

    Should I send this notification? Optimizing push notifications decision making by modeling the future

    Authors: Conor O'Brien, Huasen Wu, Shaodan Zhai, Dalin Guo, Wenzhe Shi, Jonathan J Hunt

    Abstract: Most recommender systems are myopic, that is they optimize based on the immediate response of the user. This may be misaligned with the true objective, such as creating long term user satisfaction. In this work we focus on mobile push notifications, where the long term effects of recommender system decisions can be particularly strong. For example, sending too many or irrelevant notifications may… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

  4. arXiv:2201.12666  [pdf, other

    cs.LG cs.CR cs.IR

    Challenges and approaches to privacy preserving post-click conversion prediction

    Authors: Conor O'Brien, Arvind Thiagarajan, Sourav Das, Rafael Barreto, Chetan Verma, Tim Hsu, James Neufield, Jonathan J Hunt

    Abstract: Online advertising has typically been more personalized than offline advertising, through the use of machine learning models and real-time auctions for ad targeting. One specific task, predicting the likelihood of conversion (i.e.\ the probability a user will purchase the advertised product), is crucial to the advertising ecosystem for both targeting and pricing ads. Currently, these models are of… ▽ More

    Submitted 29 January, 2022; originally announced January 2022.

  5. arXiv:2201.07681  [pdf, ps, other

    cs.IR cs.LG

    Learning to Rank For Push Notifications Using Pairwise Expected Regret

    Authors: Yuguang Yue, Yuanpu Xie, Huasen Wu, Haofeng Jia, Shaodan Zhai, Wenzhe Shi, Jonathan J Hunt

    Abstract: Listwise ranking losses have been widely studied in recommender systems. However, new paradigms of content consumption present new challenges for ranking methods. In this work we contribute an analysis of learning to rank for personalized mobile push notifications and discuss the unique challenges this presents compared to traditional ranking problems. To address these challenges, we introduce a n… ▽ More

    Submitted 19 January, 2022; originally announced January 2022.

  6. An Analysis Of Entire Space Multi-Task Models For Post-Click Conversion Prediction

    Authors: Conor O'Brien, Kin Sum Liu, James Neufeld, Rafael Barreto, Jonathan J Hunt

    Abstract: Industrial recommender systems are frequently tasked with approximating probabilities for multiple, often closely related, user actions. For example, predicting if a user will click on an advertisement and if they will then purchase the advertised product. The conceptual similarity between these tasks has promoted the use of multi-task learning: a class of algorithms that aim to bring positive ind… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

    Comments: RecSys 21 Late Breaking Results

  7. arXiv:2009.05524  [pdf, other

    cs.AI cs.LG

    Physically Embedded Planning Problems: New Challenges for Reinforcement Learning

    Authors: Mehdi Mirza, Andrew Jaegle, Jonathan J. Hunt, Arthur Guez, Saran Tunyasuvunakool, Alistair Muldal, Théophane Weber, Peter Karkus, Sébastien Racanière, Lars Buesing, Timothy Lillicrap, Nicolas Heess

    Abstract: Recent work in deep reinforcement learning (RL) has produced algorithms capable of mastering challenging games such as Go, chess, or shogi. In these works the RL agent directly observes the natural state of the game and controls that state directly with its actions. However, when humans play such games, they do not just reason about the moves but also interact with their physical environment. They… ▽ More

    Submitted 29 October, 2020; v1 submitted 11 September, 2020; originally announced September 2020.

    Comments: 17 pages + appendix. Updated text and references

  8. arXiv:1812.02216  [pdf, other

    cs.LG stat.ML

    Composing Entropic Policies using Divergence Correction

    Authors: Jonathan J Hunt, Andre Barreto, Timothy P Lillicrap, Nicolas Heess

    Abstract: Composing previously mastered skills to solve novel tasks promises dramatic improvements in the data efficiency of reinforcement learning. Here, we analyze two recent works composing behaviors represented in the form of action-value functions and show that they perform poorly in some situations. As part of this analysis, we extend an important generalization of policy improvement to the maximum en… ▽ More

    Submitted 5 July, 2019; v1 submitted 5 December, 2018; originally announced December 2018.

  9. arXiv:1610.09027  [pdf, other

    cs.LG

    Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes

    Authors: Jack W Rae, Jonathan J Hunt, Tim Harley, Ivo Danihelka, Andrew Senior, Greg Wayne, Alex Graves, Timothy P Lillicrap

    Abstract: Neural networks augmented with external memory have the ability to learn algorithmic solutions to complex tasks. These models appear promising for applications such as language modeling and machine translation. However, they scale poorly in both space and time as the amount of memory grows --- limiting their applicability to real-world domains. Here, we present an end-to-end differentiable memory… ▽ More

    Submitted 27 October, 2016; originally announced October 2016.

    Comments: in 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain

  10. arXiv:1606.05312  [pdf, other

    cs.AI

    Successor Features for Transfer in Reinforcement Learning

    Authors: André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, David Silver

    Abstract: Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks. We propose a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same. Our approach rests on two key ideas: "successor features", a value function representation that decouples the dynamics o… ▽ More

    Submitted 12 April, 2018; v1 submitted 16 June, 2016; originally announced June 2016.

    Comments: Published at NIPS 2017

  11. arXiv:1512.04455  [pdf, other

    cs.LG

    Memory-based control with recurrent neural networks

    Authors: Nicolas Heess, Jonathan J Hunt, Timothy P Lillicrap, David Silver

    Abstract: Partially observed control problems are a challenging aspect of reinforcement learning. We extend two related, model-free algorithms for continuous control -- deterministic policy gradient and stochastic value gradient -- to solve partially observed domains using recurrent neural networks trained with backpropagation through time. We demonstrate that this approach, coupled with long-short term m… ▽ More

    Submitted 14 December, 2015; originally announced December 2015.

    Comments: NIPS Deep Reinforcement Learning Workshop 2015

  12. arXiv:1509.02971  [pdf, other

    cs.LG stat.ML

    Continuous control with deep reinforcement learning

    Authors: Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra

    Abstract: We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic pr… ▽ More

    Submitted 5 July, 2019; v1 submitted 9 September, 2015; originally announced September 2015.

    Comments: 10 pages + supplementary