Zum Hauptinhalt springen

Showing 1–15 of 15 results for author: Veeriah, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2308.09175  [pdf, other

    cs.AI cs.LG

    Diversifying AI: Towards Creative Chess with AlphaZero

    Authors: Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh

    Abstract: In recent years, Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots, hallucinate, and struggle to generalize to new situations. This work explores whether AI can benefit from creative decision-making mechanisms when pushed to the limits of its computational rationality. In par… ▽ More

    Submitted 31 July, 2024; v1 submitted 17 August, 2023; originally announced August 2023.

  2. arXiv:2302.01275  [pdf, other

    cs.LG

    ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs

    Authors: Ted Moskovitz, Brendan O'Donoghue, Vivek Veeriah, Sebastian Flennerhag, Satinder Singh, Tom Zahavy

    Abstract: In recent years, Reinforcement Learning (RL) has been applied to real-world problems with increasing success. Such applications often require to put constraints on the agent's behavior. Existing algorithms for constrained RL (CRL) rely on gradient descent-ascent, but this approach comes with a caveat. While these algorithms are guaranteed to converge on average, they do not guarantee last-iterate… ▽ More

    Submitted 5 March, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

  3. arXiv:2202.04772  [pdf, other

    cs.LG cs.AI

    GrASP: Gradient-Based Affordance Selection for Planning

    Authors: Vivek Veeriah, Zeyu Zheng, Richard Lewis, Satinder Singh

    Abstract: Planning with a learned model is arguably a key component of intelligence. There are several challenges in realizing such a component in large-scale reinforcement learning (RL) problems. One such challenge is dealing effectively with continuous action spaces when using tree-search planning (e.g., it is not feasible to consider every action even at just the root node of the tree). In this paper we… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

  4. arXiv:2102.06741  [pdf, other

    cs.LG cs.AI

    Discovery of Options via Meta-Learned Subgoals

    Authors: Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh

    Abstract: Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster. However, despite prior work on this topic, the problem of discovering options through interaction with an environment remains a challenge. In this paper, we introduce a novel meta-gradient approach for discovering useful options in multi-task RL environments. Our approach is based… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

  5. arXiv:2102.04897  [pdf, other

    cs.LG cs.AI

    Learning State Representations from Random Deep Action-conditional Predictions

    Authors: Zeyu Zheng, Vivek Veeriah, Risto Vuorio, Richard Lewis, Satinder Singh

    Abstract: Our main contribution in this work is an empirical finding that random General Value Functions (GVFs), i.e., deep action-conditional predictions -- random both in what feature of observations they predict as well as in the sequence of actions the predictions are conditioned upon -- form good auxiliary tasks for reinforcement learning (RL) problems. In particular, we show that random deep action-co… ▽ More

    Submitted 5 November, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021

  6. arXiv:2007.06703  [pdf, other

    cs.LG cs.AI

    Learning Retrospective Knowledge with Reverse Reinforcement Learning

    Authors: Shangtong Zhang, Vivek Veeriah, Shimon Whiteson

    Abstract: We present a Reverse Reinforcement Learning (Reverse RL) approach for representing retrospective knowledge. General Value Functions (GVFs) have enjoyed great success in representing predictive knowledge, i.e., answering questions about possible future outcomes such as "how much fuel will be consumed in expectation if we drive from A to B?". GVFs, however, cannot answer questions like "how much fue… ▽ More

    Submitted 1 November, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: NeurIPS 2020

  7. arXiv:2002.12928  [pdf, other

    stat.ML cs.LG

    A Self-Tuning Actor-Critic Algorithm

    Authors: Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

    Abstract: Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain. In this paper, we take a step towards addressing this issue by using metagradients to automatically adapt hyperparameters online by meta-gradient descent (Xu et al., 2018). We apply our algorithm, Self-… ▽ More

    Submitted 14 April, 2021; v1 submitted 28 February, 2020; originally announced February 2020.

  8. arXiv:1912.07045  [pdf, other

    cs.AI

    How Should an Agent Practice?

    Authors: Janarthanan Rajendran, Richard Lewis, Vivek Veeriah, Honglak Lee, Satinder Singh

    Abstract: We present a method for learning intrinsic reward functions to drive the learning of an agent during periods of practice in which extrinsic task rewards are not available. During practice, the environment may differ from the one available for training and evaluation with extrinsic rewards. We refer to this setup of alternating periods of practice and objective evaluation as practice-match, drawing… ▽ More

    Submitted 15 December, 2019; originally announced December 2019.

    Comments: AAAI-2020

  9. arXiv:1909.04607  [pdf, other

    cs.AI cs.LG

    Discovery of Useful Questions as Auxiliary Tasks

    Authors: Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Richard Lewis, Janarthanan Rajendran, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

    Abstract: Arguably, intelligent agents ought to be able to discover their own questions so that in learning answers for them they learn unanticipated useful knowledge and skills; this departs from the focus in much of machine learning on agents learning answers to externally defined questions. We present a novel method for a reinforcement learning (RL) agent to discover questions formulated as general value… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

  10. arXiv:1903.03252  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning

    Authors: Alex Kearney, Vivek Veeriah, Jaden Travnik, Patrick M. Pilarski, Richard S. Sutton

    Abstract: There is a long history of using meta learning as representation learning, specifically for determining the relevance of inputs. In this paper, we examine an instance of meta-learning in which feature relevance is learned by adapting step size parameters of stochastic gradient descent---building on a variety of prior work in stochastic approximation, machine learning, and artificial neural network… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

  11. arXiv:1806.09605  [pdf, other

    cs.LG cs.AI stat.ML

    Many-Goals Reinforcement Learning

    Authors: Vivek Veeriah, Junhyuk Oh, Satinder Singh

    Abstract: All-goals updating exploits the off-policy nature of Q-learning to update all possible goals an agent could have from each transition in the world, and was introduced into Reinforcement Learning (RL) by Kaelbling (1993). In prior work this was mostly explored in small-state RL problems that allowed tabular representations and where all possible goals could be explicitly enumerated and learned sepa… ▽ More

    Submitted 22 June, 2018; originally announced June 2018.

  12. arXiv:1804.03334  [pdf, other

    cs.LG stat.ML

    TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent

    Authors: Alex Kearney, Vivek Veeriah, Jaden B. Travnik, Richard S. Sutton, Patrick M. Pilarski

    Abstract: In this paper, we introduce a method for adapting the step-sizes of temporal difference (TD) learning. The performance of TD methods often depends on well chosen step-sizes, yet few algorithms have been developed for setting the step-size automatically for TD learning. An important limitation of current methods is that they adapt a single step-size shared by all the weights of the learning system.… ▽ More

    Submitted 10 April, 2018; originally announced April 2018.

    Comments: Version as submitted to the 31st Conference on Neural Information Processing Systems (NIPS 2017) on May 19, 2017. 9 pages, 5 figures. Extended version in preparation for journal submission

  13. arXiv:1612.02879  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks

    Authors: Vivek Veeriah, Shangtong Zhang, Richard S. Sutton

    Abstract: Representations are fundamental to artificial intelligence. The performance of a learning system depends on the type of representation used for representing the data. Typically, these representations are hand-engineered using domain knowledge. More recently, the trend is to learn these representations through stochastic gradient descent in multi-layer neural networks, which is called backprop. Lea… ▽ More

    Submitted 27 April, 2017; v1 submitted 8 December, 2016; originally announced December 2016.

  14. arXiv:1606.02807  [pdf, other

    cs.HC cs.AI

    Face valuing: Training user interfaces with facial expressions and reinforcement learning

    Authors: Vivek Veeriah, Patrick M. Pilarski, Richard S. Sutton

    Abstract: An important application of interactive machine learning is extending or amplifying the cognitive and physical capabilities of a human. To accomplish this, machines need to learn about their human users' intentions and adapt to their preferences. In most current research, a user has conveyed preferences to a machine using explicit corrective or instructive feedback; explicit feedback imposes a cog… ▽ More

    Submitted 8 June, 2016; originally announced June 2016.

    Comments: 7 pages, 4 figures, IJCAI 2016 - Interactive Machine Learning Workshop

  15. arXiv:1504.06678  [pdf, ps, other

    cs.CV

    Differential Recurrent Neural Networks for Action Recognition

    Authors: Vivek Veeriah, Naifan Zhuang, Guo-Jun Qi

    Abstract: The long short-term memory (LSTM) neural network is capable of processing complex sequential information since it utilizes special gating schemes for learning representations from long input sequences. It has the potential to model any sequential time-series data, where the current hidden state has to be considered in the context of the past hidden states. This property makes LSTM an ideal choice… ▽ More

    Submitted 24 April, 2015; originally announced April 2015.