Zum Hauptinhalt springen

Showing 1–8 of 8 results for author: Nagarajan, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.02355  [pdf, other

    cs.LG cs.AI

    When is Offline Policy Selection Sample Efficient for Reinforcement Learning?

    Authors: Vincent Liu, Prabhat Nagarajan, Andrew Patterson, Martha White

    Abstract: Offline reinforcement learning algorithms often require careful hyperparameter tuning. Consequently, before deployment, we need to select amongst a set of candidate policies. As yet, however, there is little understanding about the fundamental limits of this offline policy selection (OPS) problem. In this work we aim to provide clarity on when sample efficient OPS is possible, primarily by connect… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  2. arXiv:2010.06505  [pdf, other

    cs.SE

    A Lean and Highly-automated Model-Based Software Development Process Based on DO-178C/DO-331

    Authors: Konstantin Dmitriev, Shanza Ali Zafar, Kevin Schmiechen, Yi Lai, Micheal Saleab, Pranav Nagarajan, Daniel Dollinger, Markus Hochstrasser, Stephan Myschik, Florian Holzapfel

    Abstract: The emergence of a global market for urban air mobility and unmanned aerial systems has attracted many startups across the world. These organizations have little training or experience in the traditional processes used in civil aviation for the development of software and electronic hardware. They are also constrained in the resources they can allocate for dedicated teams of professionals to follo… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

  3. arXiv:2007.08082  [pdf, other

    cs.RO cs.AI cs.DC cs.LG stat.ML

    Distributed Reinforcement Learning of Targeted Grasping with Active Vision for Mobile Manipulators

    Authors: Yasuhiro Fujita, Kota Uenishi, Avinash Ummadisingu, Prabhat Nagarajan, Shimpei Masuda, Mario Ynocente Castro

    Abstract: Developing personal robots that can perform a diverse range of manipulation tasks in unstructured environments necessitates solving several challenges for robotic grasping systems. We take a step towards this broader goal by presenting the first RL-based system, to our knowledge, for a mobile manipulator that can (a) achieve targeted grasping generalizing to unseen target objects, (b) learn comple… ▽ More

    Submitted 14 October, 2020; v1 submitted 15 July, 2020; originally announced July 2020.

    Comments: Accepted at IROS 2020

  4. arXiv:2002.00149  [pdf, other

    cs.LG cs.AI

    Periodic Intra-Ensemble Knowledge Distillation for Reinforcement Learning

    Authors: Zhang-Wei Hong, Prabhat Nagarajan, Guilherme Maeda

    Abstract: Off-policy ensemble reinforcement learning (RL) methods have demonstrated impressive results across a range of RL benchmark tasks. Recent works suggest that directly imitating experts' policies in a supervised manner before or during the course of training enables faster policy improvement for an RL agent. Motivated by these recent insights, we propose Periodic Intra-Ensemble Knowledge Distillatio… ▽ More

    Submitted 1 February, 2020; originally announced February 2020.

    Comments: 8 pages

  5. arXiv:1912.04201  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Latent State Spaces for Planning through Reward Prediction

    Authors: Aaron Havens, Yi Ouyang, Prabhat Nagarajan, Yasuhiro Fujita

    Abstract: Model-based reinforcement learning methods typically learn models for high-dimensional state spaces by aiming to reconstruct and predict the original observations. However, drawing inspiration from model-free reinforcement learning, we propose learning a latent dynamics model directly from rewards. In this work, we introduce a model-based planning framework which learns a latent reward prediction… ▽ More

    Submitted 9 December, 2019; originally announced December 2019.

    Comments: Deep RL Workshop, Neurips 2019, Vancouver

  6. arXiv:1912.03905  [pdf, other

    cs.LG cs.AI stat.ML

    ChainerRL: A Deep Reinforcement Learning Library

    Authors: Yasuhiro Fujita, Prabhat Nagarajan, Toshiki Kataoka, Takahiro Ishikawa

    Abstract: In this paper, we introduce ChainerRL, an open-source deep reinforcement learning (DRL) library built using Python and the Chainer deep learning framework. ChainerRL implements a comprehensive set of DRL algorithms and techniques drawn from state-of-the-art research in the field. To foster reproducible research, and for instructional purposes, ChainerRL provides scripts that closely replicate the… ▽ More

    Submitted 11 April, 2021; v1 submitted 9 December, 2019; originally announced December 2019.

    Comments: Journal of Machine Learning Research

    Journal ref: Journal of Machine Learning Research 22(77) (2021) 1-14;

  7. arXiv:1904.06387  [pdf, other

    cs.LG stat.ML

    Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations

    Authors: Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum

    Abstract: A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator. This is because IRL typically seeks a reward function that makes the demonstrator appear near-optimal, rather than inferring the underlying intentions of the demonstrator that may have been poorly executed in practice. In this paper, we introduce a novel reward-… ▽ More

    Submitted 8 July, 2019; v1 submitted 12 April, 2019; originally announced April 2019.

    Comments: In proceedings of Thirty-sixth International Conference on Machine Learning (ICML 2019)

  8. arXiv:1809.05676  [pdf, other

    cs.AI

    Deterministic Implementations for Reproducibility in Deep Reinforcement Learning

    Authors: Prabhat Nagarajan, Garrett Warnell, Peter Stone

    Abstract: While deep reinforcement learning (DRL) has led to numerous successes in recent years, reproducing these successes can be extremely challenging. One reproducibility challenge particularly relevant to DRL is nondeterminism in the training process, which can substantially affect the results. Motivated by this challenge, we study the positive impacts of deterministic implementations in eliminating no… ▽ More

    Submitted 9 June, 2019; v1 submitted 15 September, 2018; originally announced September 2018.

    Comments: 17 Pages