Zum Hauptinhalt springen

Showing 1–8 of 8 results for author: Sahni, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.09187  [pdf, other

    cs.LG

    Vision-Language Models as a Source of Rewards

    Authors: Kate Baumli, Satinder Baveja, Feryal Behbahani, Harris Chan, Gheorghe Comanici, Sebastian Flennerhag, Maxime Gazeau, Kristian Holsheimer, Dan Horgan, Michael Laskin, Clare Lyle, Hussain Masoom, Kay McKinney, Volodymyr Mnih, Alexander Neitz, Dmitry Nikulin, Fabio Pardo, Jack Parker-Holder, John Quan, Tim Rocktäschel, Himanshu Sahni, Tom Schaul, Yannick Schroecker, Stephen Spencer, Richie Steigerwald , et al. (2 additional authors not shown)

    Abstract: Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of… ▽ More

    Submitted 12 July, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 10 pages, 5 figures

  2. arXiv:2210.14215  [pdf, other

    cs.LG cs.AI

    In-context Reinforcement Learning with Algorithm Distillation

    Authors: Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih

    Abstract: We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transf… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

  3. arXiv:2103.06371  [pdf, other

    cs.AI

    Hard Attention Control By Mutual Information Maximization

    Authors: Himanshu Sahni, Charles Isbell

    Abstract: Biological agents have adopted the principle of attention to limit the rate of incoming information from the environment. One question that arises is if an artificial agent has access to only a limited view of its surroundings, how can it control its attention to effectively solve tasks? We propose an approach for learning how to control a hard attention window by maximizing the mutual information… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

  4. arXiv:2002.09505  [pdf, other

    cs.LG cs.AI stat.ML

    Estimating Q(s,s') with Deep Deterministic Dynamics Gradients

    Authors: Ashley D. Edwards, Himanshu Sahni, Rosanne Liu, Jane Hung, Ankit Jain, Rui Wang, Adrien Ecoffet, Thomas Miconi, Charles Isbell, Jason Yosinski

    Abstract: In this paper, we introduce a novel form of value function, $Q(s, s')$, that expresses the utility of transitioning from a state $s$ to a neighboring state $s'$ and then acting optimally thereafter. In order to derive an optimal policy, we develop a forward dynamics model that learns to make next-state predictions that maximize this value. This formulation decouples actions from values while still… ▽ More

    Submitted 25 August, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: Accepted into ICML 2020

  5. arXiv:1901.11529  [pdf, other

    cs.AI

    Addressing Sample Complexity in Visual Tasks Using HER and Hallucinatory GANs

    Authors: Himanshu Sahni, Toby Buckley, Pieter Abbeel, Ilya Kuzovkin

    Abstract: Reinforcement Learning (RL) algorithms typically require millions of environment interactions to learn successful policies in sparse reward settings. Hindsight Experience Replay (HER) was introduced as a technique to increase sample efficiency by reimagining unsuccessful trajectories as successful ones by altering the originally intended goals. However, it cannot be directly applied to visual envi… ▽ More

    Submitted 29 October, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

    Comments: To appear in Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. Code available at https://github.com/offworld-projects/research-halgan

  6. arXiv:1805.07914  [pdf, other

    cs.LG stat.ML

    Imitating Latent Policies from Observation

    Authors: Ashley D. Edwards, Himanshu Sahni, Yannick Schroecker, Charles L. Isbell

    Abstract: In this paper, we describe a novel approach to imitation learning that infers latent policies directly from state observations. We introduce a method that characterizes the causal effects of latent actions on observations while simultaneously predicting their likelihood. We then outline an action alignment procedure that leverages a small amount of environment interactions to determine a mapping b… ▽ More

    Submitted 13 May, 2019; v1 submitted 21 May, 2018; originally announced May 2018.

    Comments: Accepted to ICML 2019

  7. arXiv:1711.11289  [pdf, other

    cs.AI

    Learning to Compose Skills

    Authors: Himanshu Sahni, Saurabh Kumar, Farhan Tejani, Charles Isbell

    Abstract: We present a differentiable framework capable of learning a wide variety of compositions of simple policies that we call skills. By recursively composing skills with themselves, we can create hierarchies that display complex behavior. Skill networks are trained to generate skill-state embeddings that are provided as inputs to a trainable composition function, which in turn outputs a policy for the… ▽ More

    Submitted 30 November, 2017; originally announced November 2017.

    Comments: Presented at NIPS 2017 Deep RL Symposium

  8. arXiv:1705.08997  [pdf, other

    cs.AI cs.LG stat.ML

    State Space Decomposition and Subgoal Creation for Transfer in Deep Reinforcement Learning

    Authors: Himanshu Sahni, Saurabh Kumar, Farhan Tejani, Yannick Schroecker, Charles Isbell

    Abstract: Typical reinforcement learning (RL) agents learn to complete tasks specified by reward functions tailored to their domain. As such, the policies they learn do not generalize even to similar domains. To address this issue, we develop a framework through which a deep RL agent learns to generalize policies from smaller, simpler domains to more complex ones using a recurrent attention mechanism. The t… ▽ More

    Submitted 24 May, 2017; originally announced May 2017.

    Comments: 5 pages, 6 figures; 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2017), Ann Arbor, Michigan