Zum Hauptinhalt springen

Showing 1–16 of 16 results for author: Huang, S H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.11374  [pdf, other

    cs.RO

    Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots

    Authors: Thomas Lampe, Abbas Abdolmaleki, Sarah Bechtle, Sandy H. Huang, Jost Tobias Springenberg, Michael Bloesch, Oliver Groth, Roland Hafner, Tim Hertweck, Michael Neunert, Markus Wulfmeier, Jingwei Zhang, Francesco Nori, Nicolas Heess, Martin Riedmiller

    Abstract: Reinforcement learning solely from an agent's self-generated data is often believed to be infeasible for learning on real robots, due to the amount of data needed. However, if done right, agents learning from real data can be surprisingly efficient through re-using previously collected sub-optimal data. In this paper we demonstrate how the increased understanding of off-policy learning methods and… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  2. arXiv:2305.16498  [pdf, other

    cs.LG

    Coherent Soft Imitation Learning

    Authors: Joe Watson, Sandy H. Huang, Nicolas Heess

    Abstract: Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward. Such methods enable agents to learn complex tasks from humans that are difficult to capture with hand-designed reward functions. Choosing BC or IRL for imitation depends on the quality and state-action coverage of the demonstrations, as… ▽ More

    Submitted 6 December, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: 51 pages, 49 figures. DeepMind internship report. Accepted as a spotlight paper at Advances in Neural Information Processing Systems 2023

  3. Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

    Authors: Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Jan Humplik, Markus Wulfmeier, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber, Nicole Hurley , et al. (3 additional authors not shown)

    Abstract: We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. The resulting agent exhibits robust… ▽ More

    Submitted 11 April, 2024; v1 submitted 26 April, 2023; originally announced April 2023.

    Comments: Project website: https://sites.google.com/view/op3-soccer

  4. arXiv:2206.08353  [pdf, other

    cs.LG stat.ML

    Towards Understanding How Machines Can Learn Causal Overhypotheses

    Authors: Eliza Kosoy, David M. Chan, Adrian Liu, Jasmine Collins, Bryanna Kaufmann, Sandy Han Huang, Jessica B. Hamrick, John Canny, Nan Rosemary Ke, Alison Gopnik

    Abstract: Recent work in machine learning and cognitive science has suggested that understanding causal information is essential to the development of intelligence. The extensive literature in cognitive science using the ``blicket detector'' environment shows that children are adept at many kinds of causal inference and learning. We propose to adapt that environment for machine learning agents. One of the k… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

  5. arXiv:2202.10430  [pdf, other

    cs.LG cs.AI cs.NE

    Learning Causal Overhypotheses through Exploration in Children and Computational Models

    Authors: Eliza Kosoy, Adrian Liu, Jasmine Collins, David M Chan, Jessica B Hamrick, Nan Rosemary Ke, Sandy H Huang, Bryanna Kaufmann, John Canny, Alison Gopnik

    Abstract: Despite recent progress in reinforcement learning (RL), RL algorithms for exploration still remain an active area of research. Existing methods often focus on state-based metrics, which do not consider the underlying causal structures of the environment, and while recent research has begun to explore RL environments for causal learning, these environments primarily leverage causal information thro… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

  6. arXiv:2106.08199  [pdf, other

    cs.LG cs.RO

    On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning

    Authors: Abbas Abdolmaleki, Sandy H. Huang, Giulia Vezzani, Bobak Shahriari, Jost Tobias Springenberg, Shruti Mishra, Dhruva TB, Arunkumar Byravan, Konstantinos Bousmalis, Andras Gyorgy, Csaba Szepesvari, Raia Hadsell, Nicolas Heess, Martin Riedmiller

    Abstract: Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step. This includes ideas as far ranging as exploration bonuses, entropy regularization, and regularization toward teachers or data priors. Often, the task reward and au… ▽ More

    Submitted 1 August, 2023; v1 submitted 15 June, 2021; originally announced June 2021.

  7. arXiv:2101.11935  [pdf, other

    cs.LG eess.IV

    A Machine Learning Challenge for Prognostic Modelling in Head and Neck Cancer Using Multi-modal Data

    Authors: Michal Kazmierski, Mattea Welch, Sejin Kim, Chris McIntosh, Princess Margaret Head, Neck Cancer Group, Katrina Rey-McIntyre, Shao Hui Huang, Tirth Patel, Tony Tadic, Michael Milosevic, Fei-Fei Liu, Andrew Hope, Scott Bratman, Benjamin Haibe-Kains

    Abstract: Accurate prognosis for an individual patient is a key component of precision oncology. Recent advances in machine learning have enabled the development of models using a wider range of data, including imaging. Radiomics aims to extract quantitative predictive and prognostic biomarkers from routine medical imaging, but evidence for computed tomography radiomics for prognosis remains inconclusive. W… ▽ More

    Submitted 28 January, 2021; originally announced January 2021.

    Comments: 27 pages, 7 figures, under review

  8. arXiv:2005.07513  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    A Distributional View on Multi-Objective Policy Optimization

    Authors: Abbas Abdolmaleki, Sandy H. Huang, Leonard Hasenclever, Michael Neunert, H. Francis Song, Martina Zambelli, Murilo F. Martins, Nicolas Heess, Raia Hadsell, Martin Riedmiller

    Abstract: Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over objectives in their native units. In this paper we propose a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for obj… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

  9. arXiv:2001.04077  [pdf, other

    cs.LG cs.CV

    Residual Attention Net for Superior Cross-Domain Time Sequence Modeling

    Authors: Seth H. Huang, Xu Lingjie, Jiang Congwei

    Abstract: We present a novel architecture, residual attention net (RAN), which merges a sequence architecture, universal transformer, and a computer vision architecture, residual net, with a high-way architecture for cross-domain sequence modeling. The architecture aims at addressing the long dependency issue often faced by recurrent-neural-net-based structures. This paper serves as a proof-of-concept for a… ▽ More

    Submitted 13 January, 2020; originally announced January 2020.

  10. arXiv:1911.02320  [pdf, other

    cs.RO cs.HC cs.LG

    Nonverbal Robot Feedback for Human Teachers

    Authors: Sandy H. Huang, Isabella Huang, Ravi Pandya, Anca D. Dragan

    Abstract: Robots can learn preferences from human demonstrations, but their success depends on how informative these demonstrations are. Being informative is unfortunately very challenging, because during teaching, people typically get no transparency into what the robot already knows or has learned so far. In contrast, human students naturally provide a wealth of nonverbal feedback that reveals their level… ▽ More

    Submitted 6 November, 2019; originally announced November 2019.

    Comments: CoRL 2019

  11. arXiv:1906.12330  [pdf, other

    cs.SI cs.CL cs.LG

    Graph Star Net for Generalized Multi-Task Learning

    Authors: Lu Haonan, Seth H. Huang, Tian Ye, Guo Xiuyan

    Abstract: In this work, we present graph star net (GraphStar), a novel and unified graph neural net architecture which utilizes message-passing relay and attention mechanism for multiple prediction tasks - node classification, graph classification and link prediction. GraphStar addresses many earlier challenges facing graph neural nets and achieves non-local representation without increasing the model depth… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

  12. arXiv:1903.08542  [pdf, other

    cs.RO

    Learning Gentle Object Manipulation with Curiosity-Driven Deep Reinforcement Learning

    Authors: Sandy H. Huang, Martina Zambelli, Jackie Kay, Murilo F. Martins, Yuval Tassa, Patrick M. Pilarski, Raia Hadsell

    Abstract: Robots must know how to be gentle when they need to interact with fragile objects, or when the robot itself is prone to wear and tear. We propose an approach that enables deep reinforcement learning to train policies that are gentle, both during exploration and task execution. In a reward-based learning environment, a natural approach involves augmenting the (task) reward with a penalty for non-ge… ▽ More

    Submitted 20 March, 2019; originally announced March 2019.

  13. arXiv:1812.09376  [pdf, other

    cs.AI

    Human-AI Learning Performance in Multi-Armed Bandits

    Authors: Ravi Pandya, Sandy H. Huang, Dylan Hadfield-Menell, Anca D. Dragan

    Abstract: People frequently face challenging decision-making problems in which outcomes are uncertain or unknown. Artificial intelligence (AI) algorithms exist that can outperform humans at learning such tasks. Thus, there is an opportunity for AI agents to assist people in learning these tasks more effectively. In this work, we use a multi-armed bandit as a controlled setting in which to explore this direc… ▽ More

    Submitted 21 December, 2018; originally announced December 2018.

    Comments: Artificial Intelligence, Ethics and Society (AIES) 2019

  14. arXiv:1810.08174  [pdf, other

    cs.RO

    Establishing Appropriate Trust via Critical States

    Authors: Sandy H. Huang, Kush Bhatia, Pieter Abbeel, Anca D. Dragan

    Abstract: In order to effectively interact with or supervise a robot, humans need to have an accurate mental model of its capabilities and how it acts. Learned neural network policies make that particularly challenging. We propose an approach for helping end-users build a mental model of such policies. Our key observation is that for most tasks, the essence of the policy is captured in a few critical states… ▽ More

    Submitted 18 October, 2018; originally announced October 2018.

    Comments: IROS 2018

  15. Expressing Robot Incapability

    Authors: Minae Kwon, Sandy H. Huang, Anca D. Dragan

    Abstract: Our goal is to enable robots to express their incapability, and to do so in a way that communicates both what they are trying to accomplish and why they are unable to accomplish it. We frame this as a trajectory optimization problem: maximize the similarity between the motion expressing incapability and what would amount to successful task execution, while obeying the physical limits of the robot.… ▽ More

    Submitted 12 June, 2020; v1 submitted 18 October, 2018; originally announced October 2018.

    Comments: HRI 2018

  16. Enabling Robots to Communicate their Objectives

    Authors: Sandy H. Huang, David Held, Pieter Abbeel, Anca D. Dragan

    Abstract: The overarching goal of this work is to efficiently enable end-users to correctly anticipate a robot's behavior in novel situations. Since a robot's behavior is often a direct result of its underlying objective function, our insight is that end-users need to have an accurate mental model of this objective function in order to understand and predict what the robot will do. While people naturally de… ▽ More

    Submitted 18 October, 2018; v1 submitted 11 February, 2017; originally announced February 2017.

    Comments: RSS 2017