Zum Hauptinhalt springen

Showing 1–50 of 56 results for author: Niekum, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15599  [pdf, other

    cs.LG cs.AI

    Pareto-Optimal Learning from Preferences with Hidden Context

    Authors: Ryan Boldi, Li Ding, Lee Spector, Scott Niekum

    Abstract: Ensuring AI models align with human values is essential for their safety and functionality. Reinforcement learning from human feedback (RLHF) uses human preferences to achieve this alignment. However, preferences sourced from diverse populations can result in point estimates of human values that may be sub-optimal or unfair to specific groups. We propose Pareto Optimal Preference Learning (POPL),… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2406.08805  [pdf, other

    cs.LG cs.AI cs.RO

    A Dual Approach to Imitation Learning from Observations with Offline Datasets

    Authors: Harshit Sikchi, Caleb Chuck, Amy Zhang, Scott Niekum

    Abstract: Demonstrations are an effective alternative to task specification for learning agents in settings where designing a reward function is difficult. However, demonstrating expert behavior in the action space of the agent becomes unwieldy when robots have complex, unintuitive morphologies. We consider the practical setting where an agent has a dataset of prior interactions with the environment and is… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Under submission. 23 pages

  3. arXiv:2406.02900  [pdf, other

    cs.LG cs.AI cs.CL

    Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

    Authors: Rafael Rafailov, Yaswanth Chittepu, Ryan Park, Harshit Sikchi, Joey Hejna, Bradley Knox, Chelsea Finn, Scott Niekum

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and brittle process. In the classical RLHF framework, a reward model is first trained to represent human preferences, which is in turn used by an online reinforcement learning (RL) algorithm to optimize the LLM. A prominent issue with such methods… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  4. arXiv:2405.03113  [pdf, other

    cs.RO cs.AI

    Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning

    Authors: Caleb Chuck, Carl Qi, Michael J. Munje, Shuozhe Li, Max Rudolph, Chang Shi, Siddhant Agarwal, Harshit Sikchi, Abhinav Peri, Sarthak Dayal, Evan Kuo, Kavan Mehta, Anthony Wang, Peter Stone, Amy Zhang, Scott Niekum

    Abstract: Reinforcement Learning is a promising tool for learning complex policies even in fast-moving and object-interactive domains where human teleoperation or hard-coded policies might fail. To effectively reflect this challenging category of tasks, we introduce a dynamic, interactive RL testbed based on robot air hockey. By augmenting air hockey with a large family of tasks ranging from easy tasks like… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  5. arXiv:2405.01511  [pdf, other

    cs.CL

    D2PO: Discriminator-Guided DPO with Response Evaluation Models

    Authors: Prasann Singhal, Nathan Lambert, Scott Niekum, Tanya Goyal, Greg Durrett

    Abstract: Varied approaches for aligning language models have been proposed, including supervised fine-tuning, RLHF, and direct optimization methods such as DPO. Although DPO has rapidly gained popularity due to its straightforward training process and competitive results, there is an open question of whether there remain practical advantages of using a discriminator, like a reward model, to evaluate respon… ▽ More

    Submitted 6 August, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 20 pages, 12 figures, Accepted to COLM 2024

  6. arXiv:2404.10883  [pdf, other

    cs.AI cs.LG stat.ME

    Automated Discovery of Functional Actual Causes in Complex Environments

    Authors: Caleb Chuck, Sankaran Vaidyanathan, Stephen Giguere, Amy Zhang, David Jensen, Scott Niekum

    Abstract: Reinforcement learning (RL) algorithms often struggle to learn policies that generalize to novel situations due to issues such as causal confusion, overfitting to irrelevant factors, and failure to isolate control of state factors. These issues stem from a common source: a failure to accurately identify and exploit state-specific causal relationships in the environment. While some prior works in R… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  7. arXiv:2403.16369  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Action-based Representations Using Invariance

    Authors: Max Rudolph, Caleb Chuck, Kevin Black, Misha Lvovsky, Scott Niekum, Amy Zhang

    Abstract: Robust reinforcement learning agents using high-dimensional observations must be able to identify relevant state features amidst many exogeneous distractors. A representation that captures controllability identifies these state elements by determining what affects agent control. While methods such as inverse dynamics and mutual information capture controllability for a limited number of timesteps,… ▽ More

    Submitted 24 June, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

    Comments: Published at the Reinforcement Learning Conference 2024

  8. arXiv:2311.02013  [pdf, other

    cs.LG cs.AI cs.RO

    SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning

    Authors: Harshit Sikchi, Rohan Chitnis, Ahmed Touati, Alborz Geramifard, Amy Zhang, Scott Niekum

    Abstract: Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions. Offline GCRL is pivotal for developing generalist agents capable of leveraging pre-existing datasets to learn diverse and reusable skills without hand-engineering reward functions. However, contemporary approaches to… ▽ More

    Submitted 28 February, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: Published at International Conference of Learning Representations (ICLR) 2024. 26 pages

  9. arXiv:2310.13639  [pdf, other

    cs.LG cs.AI

    Contrastive Preference Learning: Learning from Human Feedback without RL

    Authors: Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in two phases: first, use human preferences to learn a reward function and second, align the model by optimizing the learned reward via reinforcement learning (RL). This paradigm assumes that human preferences are distributed according to rewa… ▽ More

    Submitted 30 April, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. Code released at https://github.com/jhejna/cpl

  10. arXiv:2310.02456  [pdf, other

    cs.LG cs.AI

    Learning Optimal Advantage from Preferences and Mistaking it for Reward

    Authors: W. Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson, Serena Booth, Anca Dragan, Peter Stone, Scott Niekum

    Abstract: We consider algorithms for learning reward functions from human preferences over pairs of trajectory segments, as used in reinforcement learning from human feedback (RLHF). Most recent work assumes that human preferences are generated based only upon the reward accrued within those segments, or their partial return. Recent work casts doubt on the validity of this assumption, proposing an alternati… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 8 pages (16 pages with references and appendix), 11 figures

    ACM Class: I.2.6; I.2.8

  11. arXiv:2307.02728  [pdf, other

    cs.LG cs.AI cs.RO

    Hierarchical Empowerment: Towards Tractable Empowerment-Based Skill Learning

    Authors: Andrew Levy, Sreehari Rammohan, Alessandro Allievi, Scott Niekum, George Konidaris

    Abstract: General purpose agents will require large repertoires of skills. Empowerment -- the maximum mutual information between skills and states -- provides a pathway for learning large collections of distinct skills, but mutual information is difficult to optimize. We introduce a new framework, Hierarchical Empowerment, that makes computing empowerment more tractable by integrating concepts from Goal-Con… ▽ More

    Submitted 3 October, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: Additional baseline comparisons

  12. arXiv:2306.09509  [pdf, other

    cs.AI cs.RO

    Granger-Causal Hierarchical Skill Discovery

    Authors: Caleb Chuck, Kevin Black, Aditya Arjun, Yuke Zhu, Scott Niekum

    Abstract: Reinforcement Learning (RL) has demonstrated promising results in learning policies for complex tasks, but it often suffers from low sample efficiency and limited transferability. Hierarchical RL (HRL) methods aim to address the difficulty of learning long-horizon tasks by decomposing policies into skills, abstracting states, and reusing skills in new tasks. However, many HRL methods require some… ▽ More

    Submitted 18 March, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Accepted TMLR 2024

  13. arXiv:2302.08560  [pdf, other

    cs.LG cs.AI cs.RO

    Dual RL: Unification and New Methods for Reinforcement and Imitation Learning

    Authors: Harshit Sikchi, Qinqing Zheng, Amy Zhang, Scott Niekum

    Abstract: The goal of reinforcement learning (RL) is to find a policy that maximizes the expected cumulative return. It has been shown that this objective can be represented as an optimization problem of state-action visitation distribution under linear constraints. The dual problem of this formulation, which we refer to as dual RL, is unconstrained and easier to optimize. In this work, we first cast severa… ▽ More

    Submitted 26 January, 2024; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: Published as a conference paper (spotlight) at ICLR 2024. 48 pages

  14. arXiv:2301.09770  [pdf, other

    cs.AI

    Language-guided Task Adaptation for Imitation Learning

    Authors: Prasoon Goyal, Raymond J. Mooney, Scott Niekum

    Abstract: We introduce a novel setting, wherein an agent needs to learn a task from a demonstration of a related task with the difference between the tasks communicated in natural language. The proposed setting allows reusing demonstrations from other tasks, by providing low effort language descriptions, and can also be used to provide feedback to correct agent errors, which are both important desiderata fo… ▽ More

    Submitted 23 January, 2023; originally announced January 2023.

  15. arXiv:2211.00352  [pdf, other

    cs.RO

    Understanding Acoustic Patterns of Human Teachers Demonstrating Manipulation Tasks to Robots

    Authors: Akanksha Saran, Kush Desai, Mai Lee Chang, Rudolf Lioutikov, Andrea Thomaz, Scott Niekum

    Abstract: Humans use audio signals in the form of spoken language or verbal reactions effectively when teaching new skills or tasks to other humans. While demonstrations allow humans to teach robots in a natural way, learning from trajectories alone does not leverage other available modalities including audio from human teachers. To effectively utilize audio cues accompanying human demonstrations, first it… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: IROS 2022

  16. arXiv:2206.02231  [pdf, other

    cs.LG cs.AI eess.SY

    Models of human preference for learning reward functions

    Authors: W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, Alessandro Allievi

    Abstract: The utility of reinforcement learning is limited by the alignment of reward functions with the interests of human stakeholders. One promising method for alignment is to learn the reward function from human-generated preferences between pairs of trajectory segments, a type of reinforcement learning from human feedback (RLHF). These human preferences are typically assumed to be informed solely by pa… ▽ More

    Submitted 6 September, 2023; v1 submitted 5 June, 2022; originally announced June 2022.

    Comments: 16 pages (40 pages with references and appendix), 23 figures

    ACM Class: I.2.6; I.2.8

  17. arXiv:2206.00695  [pdf, other

    cs.LG

    Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL

    Authors: Wonjoon Goo, Scott Niekum

    Abstract: We introduce an offline reinforcement learning (RL) algorithm that explicitly clones a behavior policy to constrain value learning. In offline RL, it is often important to prevent a policy from selecting unobserved actions, since the consequence of these actions cannot be presumed without additional information about the environment. One straightforward way to implement such a constraint is to exp… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

  18. arXiv:2204.11134  [pdf, other

    cs.RO cs.AI

    Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?

    Authors: Yuchen Cui, Scott Niekum, Abhinav Gupta, Vikash Kumar, Aravind Rajeswaran

    Abstract: Task specification is at the core of programming autonomous robots. A low-effort modality for task specification is critical for engagement of non-expert end-users and ultimate adoption of personalized robot agents. A widely studied approach to task specification is through goals, using either compact state vectors or goal images from the same robot scene. The former is hard to interpret for non-e… ▽ More

    Submitted 23 April, 2022; originally announced April 2022.

    Comments: 30 pages with appendix, published as a conference paper at L4DC 2022

  19. arXiv:2202.03481  [pdf, other

    cs.LG cs.AI cs.RO

    A Ranking Game for Imitation Learning

    Authors: Harshit Sikchi, Akanksha Saran, Wonjoon Goo, Scott Niekum

    Abstract: We propose a new framework for imitation learning -- treating imitation as a two-player ranking-based game between a policy and a reward. In this game, the reward agent learns to satisfy pairwise performance rankings between behaviors, while the policy agent learns to maximize this reward. In imitation learning, near-optimal expert data can be difficult to obtain, and even in the limit of infinite… ▽ More

    Submitted 16 January, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: Published in Transactions on Machine Learning Research 2023. 38 pages

  20. arXiv:2111.03936  [pdf, other

    cs.LG

    SOPE: Spectrum of Off-Policy Estimators

    Authors: Christina J. Yuan, Yash Chandak, Stephen Giguere, Philip S. Thomas, Scott Niekum

    Abstract: Many sequential decision making problems are high-stakes and require off-policy evaluation (OPE) of a new policy using historical data collected using some other policy. One of the most common OPE techniques that provides unbiased estimates is trajectory based importance sampling (IS). However, due to the high variance of trajectory IS estimates, importance sampling methods based on state-action v… ▽ More

    Submitted 2 December, 2021; v1 submitted 6 November, 2021; originally announced November 2021.

    Comments: Accepted at Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021)

  21. arXiv:2110.02304  [pdf, other

    cs.LG cs.RO

    You Only Evaluate Once: a Simple Baseline Algorithm for Offline RL

    Authors: Wonjoon Goo, Scott Niekum

    Abstract: The goal of offline reinforcement learning (RL) is to find an optimal policy given prerecorded trajectories. Many current approaches customize existing off-policy RL algorithms, especially actor-critic algorithms in which policy evaluation and improvement are iterated. However, the convergence of such approaches is not guaranteed due to the use of complex non-linear function approximation and an i… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

    Comments: In proceedings of 5th Annual Conference on Robot Learning (CoRL) 2021

  22. arXiv:2108.05875  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    Distributional Depth-Based Estimation of Object Articulation Models

    Authors: Ajinkya Jain, Stephen Giguere, Rudolf Lioutikov, Scott Niekum

    Abstract: We propose a method that efficiently learns distributions over articulation model parameters directly from depth images without the need to know articulation model categories a priori. By contrast, existing methods that learn articulation models from raw observations typically only predict point estimates of the model parameters, which are insufficient to guarantee the safe manipulation of articul… ▽ More

    Submitted 25 October, 2021; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: In the proceedings of the 5th Annual Conference on Robot Learning (CoRL), 2021. Project webpage: https://pearl-utexas.github.io/DUST-net/ . 18 pages, 10 figures, 4 tables

  23. arXiv:2107.00116  [pdf, other

    cs.LG

    On the Benefits of Inducing Local Lipschitzness for Robust Generative Adversarial Imitation Learning

    Authors: Farzan Memarian, Abolfazl Hashemi, Scott Niekum, Ufuk Topcu

    Abstract: We explore methodologies to improve the robustness of generative adversarial imitation learning (GAIL) algorithms to observation noise. Towards this objective, we study the effect of local Lipschitzness of the discriminator and the generator on the robustness of policies learned by GAIL. In many robotics applications, the learned policies by GAIL typically suffer from a degraded performance at tes… ▽ More

    Submitted 15 January, 2024; v1 submitted 30 June, 2021; originally announced July 2021.

  24. arXiv:2106.02972  [pdf, other

    cs.AI cs.CL cs.LG

    Zero-shot Task Adaptation using Natural Language

    Authors: Prasoon Goyal, Raymond J. Mooney, Scott Niekum

    Abstract: Imitation learning and instruction-following are two common approaches to communicate a user's intent to a learning agent. However, as the complexity of tasks grows, it could be beneficial to use both demonstrations and language to communicate with an agent. In this work, we propose a novel setting where an agent is given both a demonstration and a description, and must combine information from bo… ▽ More

    Submitted 5 June, 2021; originally announced June 2021.

  25. arXiv:2105.13345  [pdf, other

    cs.LG

    Adversarial Intrinsic Motivation for Reinforcement Learning

    Authors: Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

    Abstract: Learning with an objective to minimize the mismatch with a reference distribution has been shown to be useful for generative modeling and imitation learning. In this paper, we investigate whether one such objective, the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectively for reinforcement learning (RL) tasks. Specifically,… ▽ More

    Submitted 28 October, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

  26. arXiv:2104.12820  [pdf, other

    cs.LG

    Universal Off-Policy Evaluation

    Authors: Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas

    Abstract: When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return… ▽ More

    Submitted 2 November, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

    Comments: Accepted at Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021)

  27. arXiv:2103.04529  [pdf, other

    cs.LG cs.RO

    Self-Supervised Online Reward Shaping in Sparse-Reward Environments

    Authors: Farzan Memarian, Wonjoon Goo, Rudolf Lioutikov, Scott Niekum, Ufuk Topcu

    Abstract: We introduce Self-supervised Online Reward Shaping (SORS), which aims to improve the sample efficiency of any RL algorithm in sparse-reward environments by automatically densifying rewards. The proposed framework alternates between classification-based reward inference and policy update steps -- the original sparse reward provides a self-supervisory signal for reward inference by ranking trajector… ▽ More

    Submitted 25 July, 2021; v1 submitted 7 March, 2021; originally announced March 2021.

    Comments: Accepted for publication in IROS 2021

  28. arXiv:2102.08442  [pdf, other

    cs.RO

    SCAPE: Learning Stiffness Control from Augmented Position Control Experiences

    Authors: Mincheol Kim, Scott Niekum, Ashish D. Deshpande

    Abstract: We introduce a sample-efficient method for learning state-dependent stiffness control policies for dexterous manipulation. The ability to control stiffness facilitates safe and reliable manipulation by providing compliance and robustness to uncertainties. Most current reinforcement learning approaches to achieve robotic manipulation have exclusively focused on position control, often due to the di… ▽ More

    Submitted 14 September, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: Accepted at CoRL 2021

  29. arXiv:2012.01557  [pdf, other

    cs.LG

    Value Alignment Verification

    Authors: Daniel S. Brown, Jordan Schneider, Anca D. Dragan, Scott Niekum

    Abstract: As humans interact with autonomous agents to perform increasingly complicated, potentially risky tasks, it is important to be able to efficiently evaluate an agent's performance and correctness. In this paper we formalize and theoretically analyze the problem of efficient value alignment verification: how to efficiently test whether the behavior of another agent is aligned with a human's values. T… ▽ More

    Submitted 11 June, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: In proceedings International Conference on Machine Learning (ICML) 2021

  30. arXiv:2009.13649  [pdf, other

    cs.HC cs.RO

    The EMPATHIC Framework for Task Learning from Implicit Human Feedback

    Authors: Yuchen Cui, Qiping Zhang, Alessandro Allievi, Peter Stone, Scott Niekum, W. Bradley Knox

    Abstract: Reactions such as gestures, facial expressions, and vocalizations are an abundant, naturally occurring channel of information that humans provide during interactions. A robot or other agent could leverage an understanding of such implicit human feedback to improve its task performance at no cost to the human. This approach contrasts with common agent teaching methods based on demonstrations, criti… ▽ More

    Submitted 7 December, 2020; v1 submitted 28 September, 2020; originally announced September 2020.

    Comments: Conference on Robot Learning 2020

  31. arXiv:2008.10518  [pdf, other

    cs.RO cs.AI

    ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory

    Authors: Ajinkya Jain, Rudolf Lioutikov, Caleb Chuck, Scott Niekum

    Abstract: Robots in human environments will need to interact with a wide variety of articulated objects such as cabinets, drawers, and dishwashers while assisting humans in performing day-to-day tasks. Existing methods either require objects to be textured or need to know the articulation model category a priori for estimating the model parameters for an articulated object. We propose ScrewNet, a novel appr… ▽ More

    Submitted 19 July, 2021; v1 submitted 24 August, 2020; originally announced August 2020.

    Comments: Presented at ICRA'21. Project webpage: https://pearl-utexas.github.io/ScrewNet/

  32. arXiv:2007.15543  [pdf, other

    cs.LG cs.AI stat.ML

    PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping Pixels to Rewards

    Authors: Prasoon Goyal, Scott Niekum, Raymond J. Mooney

    Abstract: Reinforcement learning (RL), particularly in sparse reward settings, often requires prohibitively large numbers of interactions with the environment, thereby limiting its applicability to complex problems. To address this, several prior approaches have used natural language to guide the agent's exploration. However, these approaches typically operate on structured representations of the environmen… ▽ More

    Submitted 19 November, 2020; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: Conference on Robot Learning (CoRL), 2020

  33. arXiv:2007.12315  [pdf, other

    cs.LG stat.ML

    Bayesian Robust Optimization for Imitation Learning

    Authors: Daniel S. Brown, Scott Niekum, Marek Petrik

    Abstract: One of the main challenges in imitation learning is determining what action an agent should take when outside the state distribution of the demonstrations. Inverse reinforcement learning (IRL) can enable generalization to new states by learning a parameterized reward function, but these approaches still face uncertainty over the true reward function and corresponding optimal policy. Existing safe… ▽ More

    Submitted 29 February, 2024; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: In proceedings NeurIPS 2020

  34. arXiv:2002.12500  [pdf, other

    cs.LG cs.AI

    Efficiently Guiding Imitation Learning Agents with Human Gaze

    Authors: Akanksha Saran, Ruohan Zhang, Elaine Schaertl Short, Scott Niekum

    Abstract: Human gaze is known to be an intention-revealing signal in human demonstrations of tasks. In this work, we use gaze cues from human demonstrators to enhance the performance of agents trained via three popular imitation learning methods -- behavioral cloning (BC), behavioral cloning from observation (BCO), and Trajectory-ranked Reward EXtrapolation (T-REX). Based on similarities between the attenti… ▽ More

    Submitted 21 April, 2021; v1 submitted 27 February, 2020; originally announced February 2020.

    Comments: AAMAS 2021

  35. arXiv:2002.09089  [pdf, other

    cs.LG stat.ML

    Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

    Authors: Daniel S. Brown, Russell Coleman, Ravi Srinivasan, Scott Niekum

    Abstract: Bayesian reward learning from demonstrations enables rigorous safety and uncertainty analysis when performing imitation learning. However, Bayesian reward learning methods are typically computationally intractable for complex control problems. We propose Bayesian Reward Extrapolation (Bayesian REX), a highly efficient Bayesian reward learning algorithm that scales to high-dimensional imitation lea… ▽ More

    Submitted 17 December, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: In proceedings ICML 2020

  36. arXiv:2002.03272  [pdf, other

    cs.LG stat.ML

    Local Nonparametric Meta-Learning

    Authors: Wonjoon Goo, Scott Niekum

    Abstract: A central goal of meta-learning is to find a learning rule that enables fast adaptation across a set of tasks, by learning the appropriate inductive bias for that set. Most meta-learning algorithms try to find a \textit{global} learning rule that encodes this inductive bias. However, a global learning rule represented by a fixed-size representation is prone to meta-underfitting or -overfitting sin… ▽ More

    Submitted 8 February, 2020; originally announced February 2020.

  37. arXiv:1912.04472  [pdf, other

    cs.LG cs.AI stat.ML

    Deep Bayesian Reward Learning from Preferences

    Authors: Daniel S. Brown, Scott Niekum

    Abstract: Bayesian inverse reinforcement learning (IRL) methods are ideal for safe imitation learning, as they allow a learning agent to reason about reward uncertainty and the safety of a learned policy. However, Bayesian IRL is computationally intractable for high-dimensional problems because each sample from the posterior requires solving an entire Markov Decision Process (MDP). While there exist non-Bay… ▽ More

    Submitted 9 December, 2019; originally announced December 2019.

    Comments: Workshop on Safety and Robustness in Decision Making at the 33rd Conference on Neural Information Processing Systems (NeurIPS) 2019

  38. arXiv:1907.09014  [pdf, other

    cs.RO

    Learning Hybrid Object Kinematics for Efficient Hierarchical Planning Under Uncertainty

    Authors: Ajinkya Jain, Scott Niekum

    Abstract: Sudden changes in the dynamics of robotic tasks, such as contact with an object or the latching of a door, are often viewed as inconvenient discontinuities that make manipulation difficult. However, when these transitions are well-understood, they can be leveraged to reduce uncertainty or aid manipulation---for example, wiggling a screw to determine if it is fully inserted or not. Current model-fr… ▽ More

    Submitted 5 August, 2020; v1 submitted 21 July, 2019; originally announced July 2019.

    Comments: Accepted in IROS'20

  39. arXiv:1907.07202  [pdf, other

    cs.RO

    Understanding Teacher Gaze Patterns for Robot Learning

    Authors: Akanksha Saran, Elaine Schaertl Short, Andrea Thomaz, Scott Niekum

    Abstract: Human gaze is known to be a strong indicator of underlying human intentions and goals during manipulation tasks. This work studies gaze patterns of human teachers demonstrating tasks to robots and proposes ways in which such patterns can be used to enhance robot learning. Using both kinesthetic teaching and video demonstrations, we identify novel intention-revealing gaze behaviors during teaching.… ▽ More

    Submitted 29 November, 2021; v1 submitted 16 July, 2019; originally announced July 2019.

    Comments: Updated acknowledgements. Published in Conference on Robot Learning (CoRL), 2019

    Journal ref: PMLR (2019) 1247-1258

  40. arXiv:1907.03976  [pdf, other

    cs.LG stat.ML

    Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations

    Authors: Daniel S. Brown, Wonjoon Goo, Scott Niekum

    Abstract: The performance of imitation learning is typically upper-bounded by the performance of the demonstrator. While recent empirical results demonstrate that ranked demonstrations allow for better-than-demonstrator performance, preferences over demonstrations may be difficult to obtain, and little is known theoretically about when such methods can be expected to successfully extrapolate beyond the perf… ▽ More

    Submitted 14 October, 2019; v1 submitted 9 July, 2019; originally announced July 2019.

    Comments: In proceedings of 3rd Conference on Robot Learning (CoRL) 2019

  41. arXiv:1907.03146  [pdf, other

    cs.RO

    A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms

    Authors: Oliver Kroemer, Scott Niekum, George Konidaris

    Abstract: A key challenge in intelligent robotics is creating robots that are capable of directly interacting with the world around them to achieve their goals. The last decade has seen substantial growth in research on the problem of robot manipulation, which aims to exploit the increasing availability of affordable robot arms and grippers to create robots capable of directly interacting with the world to… ▽ More

    Submitted 6 November, 2020; v1 submitted 6 July, 2019; originally announced July 2019.

  42. arXiv:1906.01408  [pdf, other

    cs.LG cs.AI stat.ML

    Hypothesis-Driven Skill Discovery for Hierarchical Deep Reinforcement Learning

    Authors: Caleb Chuck, Supawit Chockchowwat, Scott Niekum

    Abstract: Deep reinforcement learning (DRL) is capable of learning high-performing policies on a variety of complex high-dimensional tasks, ranging from video games to robotic manipulation. However, standard DRL methods often suffer from poor sample efficiency, partially because they aim to be entirely problem-agnostic. In this work, we introduce a novel approach to exploration and hierarchical skill learni… ▽ More

    Submitted 3 March, 2020; v1 submitted 27 May, 2019; originally announced June 2019.

    Comments: Submitted to IROS 2020

  43. arXiv:1905.02780  [pdf, other

    cs.LG cs.RO stat.ML

    Uncertainty-Aware Data Aggregation for Deep Imitation Learning

    Authors: Yuchen Cui, David Isele, Scott Niekum, Kikuo Fujimura

    Abstract: Estimating statistical uncertainties allows autonomous agents to communicate their confidence during task execution and is important for applications in safety-critical domains such as autonomous driving. In this work, we present the uncertainty-aware imitation learning (UAIL) algorithm for improving end-to-end control systems via data aggregation. UAIL applies Monte Carlo Dropout to estimate unce… ▽ More

    Submitted 7 May, 2019; originally announced May 2019.

    Comments: Accepted to International Conference on Robotics and Automation 2019

  44. arXiv:1904.06387  [pdf, other

    cs.LG stat.ML

    Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations

    Authors: Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum

    Abstract: A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator. This is because IRL typically seeks a reward function that makes the demonstrator appear near-optimal, rather than inferring the underlying intentions of the demonstrator that may have been poorly executed in practice. In this paper, we introduce a novel reward-… ▽ More

    Submitted 8 July, 2019; v1 submitted 12 April, 2019; originally announced April 2019.

    Comments: In proceedings of Thirty-sixth International Conference on Machine Learning (ICML 2019)

  45. arXiv:1903.02020  [pdf, other

    cs.LG cs.AI stat.ML

    Using Natural Language for Reward Shaping in Reinforcement Learning

    Authors: Prasoon Goyal, Scott Niekum, Raymond J. Mooney

    Abstract: Recent reinforcement learning (RL) approaches have shown strong performance in complex domains such as Atari games, but are often highly sample inefficient. A common approach to reduce interaction time with the environment is to use reward shaping, which involves carefully designing reward functions that provide the agent intermediate rewards for progress towards the goal. However, designing appro… ▽ More

    Submitted 31 May, 2019; v1 submitted 5 March, 2019; originally announced March 2019.

    Comments: IJCAI 2019

  46. arXiv:1901.02161  [pdf, other

    cs.LG stat.ML

    Risk-Aware Active Inverse Reinforcement Learning

    Authors: Daniel S. Brown, Yuchen Cui, Scott Niekum

    Abstract: Active learning from demonstration allows a robot to query a human for specific types of input to achieve efficient learning. Existing work has explored a variety of active query strategies; however, to our knowledge, none of these strategies directly minimize the performance risk of the policy the robot is learning. Utilizing recent advances in performance bounds for inverse reinforcement learnin… ▽ More

    Submitted 3 June, 2019; v1 submitted 8 January, 2019; originally announced January 2019.

    Comments: In proceedings of the 2nd Conference on Robot Learning (CoRL) 2018

  47. arXiv:1811.03563  [pdf, other

    cs.RO

    LAAIR: A Layered Architecture for Autonomous Interactive Robots

    Authors: Yuqian Jiang, Nick Walker, Minkyu Kim, Nicolas Brissonneau, Daniel S. Brown, Justin W. Hart, Scott Niekum, Luis Sentis, Peter Stone

    Abstract: When developing general purpose robots, the overarching software architecture can greatly affect the ease of accomplishing various tasks. Initial efforts to create unified robot systems in the 1990s led to hybrid architectures, emphasizing a hierarchy in which deliberative plans direct the use of reactive skills. However, since that time there has been significant progress in the low-level skills… ▽ More

    Submitted 8 November, 2018; v1 submitted 8 November, 2018; originally announced November 2018.

    Comments: Presented at LTA AAAI-FSS, 2018

  48. arXiv:1810.01036  [pdf, other

    cs.RO

    Towards Online Learning from Corrective Demonstrations

    Authors: Reymundo A. Gutierrez, Elaine Schaertl Short, Scott Niekum, Andrea L. Thomaz

    Abstract: Robots operating in real-world human environments will likely encounter task execution failures. To address this, we would like to allow co-present humans to refine the robot's task model as errors are encountered. Existing approaches to task model modification require reasoning over the entire dataset and model, limiting the rate of corrective updates. We introduce the State-Indexed Task Updates… ▽ More

    Submitted 1 October, 2018; originally announced October 2018.

    Comments: Presented at AI-HRI AAAI-FSS, 2018 (arXiv:1809.06606)

    Report number: AI-HRI/2018/12

  49. arXiv:1806.11244  [pdf, other

    cs.LG cs.RO stat.ML

    One-Shot Learning of Multi-Step Tasks from Observation via Activity Localization in Auxiliary Video

    Authors: Wonjoon Goo, Scott Niekum

    Abstract: Due to burdensome data requirements, learning from demonstration often falls short of its promise to allow users to quickly and naturally program robots. Demonstrations are inherently ambiguous and incomplete, making correct generalization to unseen situations difficult without a large number of demonstrations in varying conditions. By contrast, humans are often able to learn complex tasks from a… ▽ More

    Submitted 26 April, 2019; v1 submitted 28 June, 2018; originally announced June 2018.

    Comments: ICRA 2019

  50. arXiv:1806.01347  [pdf, other

    cs.LG cs.AI stat.ML

    Importance Sampling Policy Evaluation with an Estimated Behavior Policy

    Authors: Josiah P. Hanna, Scott Niekum, Peter Stone

    Abstract: We consider the problem of off-policy evaluation in Markov decision processes. Off-policy evaluation is the task of evaluating the expected return of one policy with data generated by a different, behavior policy. Importance sampling is a technique for off-policy evaluation that re-weights off-policy returns to account for differences in the likelihood of the returns between the two policies. In t… ▽ More

    Submitted 9 May, 2019; v1 submitted 4 June, 2018; originally announced June 2018.

    Comments: Accepted to ICML 2019