Zum Hauptinhalt springen

Showing 1–42 of 42 results for author: Wulfmeier, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.02425  [pdf, other

    cs.RO cs.AI

    Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning

    Authors: Dhruva Tirumala, Markus Wulfmeier, Ben Moran, Sandy Huang, Jan Humplik, Guy Lever, Tuomas Haarnoja, Leonard Hasenclever, Arunkumar Byravan, Nathan Batchelor, Neil Sreendra, Kushal Patel, Marlon Gwira, Francesco Nori, Martin Riedmiller, Nicolas Heess

    Abstract: We apply multi-agent deep reinforcement learning (RL) to train end-to-end robot soccer policies with fully onboard computation and sensing via egocentric RGB vision. This setting reflects many challenges of real-world robotics, including active perception, agile full-body control, and long-horizon planning in a dynamic, partially-observable, multi-agent domain. We rely on large-scale, simulation-b… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  2. arXiv:2404.04253  [pdf, other

    cs.LG cs.AI cs.RO

    Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution

    Authors: Tim Seyde, Peter Werner, Wilko Schwarting, Markus Wulfmeier, Daniela Rus

    Abstract: Recent reinforcement learning approaches have shown surprisingly strong capabilities of bang-bang policies for solving continuous control benchmarks. The underlying coarse action space discretizations often yield favourable exploration characteristics while final performance does not visibly suffer in the absence of action penalization in line with optimal control theory. In robotics applications,… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  3. arXiv:2402.06102  [pdf, other

    cs.RO cs.LG

    Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning

    Authors: Mohak Bhardwaj, Thomas Lampe, Michael Neunert, Francesco Romano, Abbas Abdolmaleki, Arunkumar Byravan, Markus Wulfmeier, Martin Riedmiller, Jonas Buchli

    Abstract: Recent advances in real-world applications of reinforcement learning (RL) have relied on the ability to accurately simulate systems at scale. However, domains such as fluid dynamical systems exhibit complex dynamic phenomena that are hard to simulate at high integration rates, limiting the direct application of modern deep RL algorithms to often expensive or safety critical hardware. In this work,… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  4. arXiv:2312.11374  [pdf, other

    cs.RO

    Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots

    Authors: Thomas Lampe, Abbas Abdolmaleki, Sarah Bechtle, Sandy H. Huang, Jost Tobias Springenberg, Michael Bloesch, Oliver Groth, Roland Hafner, Tim Hertweck, Michael Neunert, Markus Wulfmeier, Jingwei Zhang, Francesco Nori, Nicolas Heess, Martin Riedmiller

    Abstract: Reinforcement learning solely from an agent's self-generated data is often believed to be infeasible for learning on real robots, due to the amount of data needed. However, if done right, agents learning from real data can be surprisingly efficient through re-using previously collected sub-optimal data. In this paper we demonstrate how the increased understanding of off-policy learning methods and… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  5. arXiv:2312.01939  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Foundations for Transfer in Reinforcement Learning: A Taxonomy of Knowledge Modalities

    Authors: Markus Wulfmeier, Arunkumar Byravan, Sarah Bechtle, Karol Hausman, Nicolas Heess

    Abstract: Contemporary artificial intelligence systems exhibit rapidly growing abilities accompanied by the growth of required resources, expansive datasets and corresponding investments into computing infrastructure. Although earlier successes predominantly focus on constrained settings, recent strides in fundamental research and applications aspire to create increasingly general systems. This evolving lan… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  6. arXiv:2311.15951  [pdf, other

    cs.LG cs.AI cs.RO

    Replay across Experiments: A Natural Extension of Off-Policy RL

    Authors: Dhruva Tirumala, Thomas Lampe, Jose Enrique Chen, Tuomas Haarnoja, Sandy Huang, Guy Lever, Ben Moran, Tim Hertweck, Leonard Hasenclever, Martin Riedmiller, Nicolas Heess, Markus Wulfmeier

    Abstract: Replaying data is a principal mechanism underlying the stability and data efficiency of off-policy reinforcement learning (RL). We present an effective yet simple framework to extend the use of replays across multiple experiments, minimally adapting the RL workflow for sizeable improvements in controller performance and research iteration times. At its core, Replay Across Experiments (RaE) involve… ▽ More

    Submitted 28 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

  7. arXiv:2309.07578  [pdf, other

    cs.LG cs.AI cs.RO

    Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning

    Authors: Cristina Pinneri, Sarah Bechtle, Markus Wulfmeier, Arunkumar Byravan, Jingwei Zhang, William F. Whitney, Martin Riedmiller

    Abstract: We present a novel approach to address the challenge of generalization in offline reinforcement learning (RL), where the agent learns from a fixed dataset without any additional interaction with the environment. Specifically, we aim to improve the agent's ability to generalize to out-of-distribution goals. To achieve this, we propose to learn a dynamics model and check if it is equivariant with re… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  8. arXiv:2308.07741  [pdf, other

    cs.RO cs.LG

    Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World

    Authors: Nico Gürtler, Felix Widmaier, Cansu Sancaktar, Sebastian Blaes, Pavel Kolev, Stefan Bauer, Manuel Wüthrich, Markus Wulfmeier, Martin Riedmiller, Arthur Allshire, Qiang Wang, Robert McCarthy, Hangyeol Kim, Jongchan Baek, Wookyong Kwon, Shanliang Qian, Yasunori Toshimitsu, Mike Yan Michelis, Amirhossein Kazemipour, Arman Raayatsanati, Hehui Zheng, Barnabas Gavin Cangan, Bernhard Schölkopf, Georg Martius

    Abstract: Experimentation on real robots is demanding in terms of time and costs. For this reason, a large part of the reinforcement learning (RL) community uses simulators to develop and benchmark algorithms. However, insights gained in simulation do not necessarily translate to real robots, in particular for tasks involving complex interactions with the environment. The Real Robot Challenge 2022 therefore… ▽ More

    Submitted 24 November, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: Typo in author list fixed

  9. arXiv:2307.09668  [pdf, other

    cs.RO cs.AI cs.LG

    Towards A Unified Agent with Foundation Models

    Authors: Norman Di Palo, Arunkumar Byravan, Leonard Hasenclever, Markus Wulfmeier, Nicolas Heess, Martin Riedmiller

    Abstract: Language Models and Vision Language Models have recently demonstrated unprecedented capabilities in terms of understanding human intentions, reasoning, scene understanding, and planning-like behaviour, in text form, among many others. In this work, we investigate how to embed and leverage such abilities in Reinforcement Learning (RL) agents. We design a framework that uses language as the core rea… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  10. arXiv:2305.11290  [pdf, other

    cs.LG

    Massively Scalable Inverse Reinforcement Learning in Google Maps

    Authors: Matt Barnes, Matthew Abueg, Oliver F. Lange, Matt Deeds, Jason Trader, Denali Molitor, Markus Wulfmeier, Shawn O'Banion

    Abstract: Inverse reinforcement learning (IRL) offers a powerful and general framework for learning humans' latent preferences in route recommendation, yet no approach has successfully addressed planetary-scale problems with hundreds of millions of states and demonstration trajectories. In this paper, we introduce scaling techniques based on graph compression, spatial parallelization, and improved initializ… ▽ More

    Submitted 5 March, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

  11. Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

    Authors: Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Jan Humplik, Markus Wulfmeier, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber, Nicole Hurley , et al. (3 additional authors not shown)

    Abstract: We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. The resulting agent exhibits robust… ▽ More

    Submitted 11 April, 2024; v1 submitted 26 April, 2023; originally announced April 2023.

    Comments: Project website: https://sites.google.com/view/op3-soccer

  12. arXiv:2211.13743  [pdf, other

    cs.LG cs.AI cs.RO

    SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration

    Authors: Giulia Vezzani, Dhruva Tirumala, Markus Wulfmeier, Dushyant Rao, Abbas Abdolmaleki, Ben Moran, Tuomas Haarnoja, Jan Humplik, Roland Hafner, Michael Neunert, Claudio Fantacci, Tim Hertweck, Thomas Lampe, Fereshteh Sadeghi, Nicolas Heess, Martin Riedmiller

    Abstract: The ability to effectively reuse prior knowledge is a key requirement when building general and flexible Reinforcement Learning (RL) agents. Skill reuse is one of the most common approaches, but current methods have considerable limitations.For example, fine-tuning an existing policy frequently fails, as the policy can degrade rapidly early in training. In a similar vein, distillation of expert be… ▽ More

    Submitted 11 January, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

  13. arXiv:2210.12566  [pdf, other

    cs.LG cs.AI cs.RO

    Solving Continuous Control via Q-learning

    Authors: Tim Seyde, Peter Werner, Wilko Schwarting, Igor Gilitschenski, Martin Riedmiller, Daniela Rus, Markus Wulfmeier

    Abstract: While there has been substantial success for solving continuous control with actor-critic methods, simpler critic-only methods such as Q-learning find limited application in the associated high-dimensional action spaces. However, most actor-critic methods come at the cost of added complexity: heuristics for stabilisation, compute requirements and wider hyperparameter search spaces. We show that a… ▽ More

    Submitted 25 September, 2023; v1 submitted 22 October, 2022; originally announced October 2022.

  14. arXiv:2209.01947  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    MO2: Model-Based Offline Options

    Authors: Sasha Salter, Markus Wulfmeier, Dhruva Tirumala, Nicolas Heess, Martin Riedmiller, Raia Hadsell, Dushyant Rao

    Abstract: The ability to discover useful behaviours from past experience and transfer them to new tasks is considered a core component of natural embodied intelligence. Inspired by neuroscience, discovering behaviours that switch at bottleneck states have been long sought after for inducing plans of minimum description length across tasks. Prior approaches have either only supported online, on-policy, bottl… ▽ More

    Submitted 5 September, 2022; originally announced September 2022.

    Comments: Accepted at 1st Conference on Lifelong Learning Agents (CoLLAs) Conference Track, 2022

  15. arXiv:2204.05893  [pdf, other

    cs.RO cs.AI cs.LG

    Forgetting and Imbalance in Robot Lifelong Learning with Off-policy Data

    Authors: Wenxuan Zhou, Steven Bohez, Jan Humplik, Abbas Abdolmaleki, Dushyant Rao, Markus Wulfmeier, Tuomas Haarnoja, Nicolas Heess

    Abstract: Robots will experience non-stationary environment dynamics throughout their lifetime: the robot dynamics can change due to wear and tear, or its surroundings may change over time. Eventually, the robots should perform well in all of the environment variations it has encountered. At the same time, it should still be able to learn fast in a new environment. We identify two challenges in Reinforcemen… ▽ More

    Submitted 18 August, 2022; v1 submitted 12 April, 2022; originally announced April 2022.

    Comments: Published at 1st Conference on Lifelong Learning Agents, 2022

  16. arXiv:2203.17138  [pdf, other

    cs.RO cs.AI cs.LG

    Imitate and Repurpose: Learning Reusable Robot Movement Skills From Human and Animal Behaviors

    Authors: Steven Bohez, Saran Tunyasuvunakool, Philemon Brakel, Fereshteh Sadeghi, Leonard Hasenclever, Yuval Tassa, Emilio Parisotto, Jan Humplik, Tuomas Haarnoja, Roland Hafner, Markus Wulfmeier, Michael Neunert, Ben Moran, Noah Siegel, Andrea Huber, Francesco Romano, Nathan Batchelor, Federico Casarini, Josh Merel, Raia Hadsell, Nicolas Heess

    Abstract: We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a movement skill module. Once learned, this skill module can be reused for complex downstream tasks. Importantly, due to the prior imposed by the MoCap data, our appro… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: 30 pages, 9 figures, 8 tables, 14 videos at https://bit.ly/robot-npmp , submitted to Science Robotics

  17. arXiv:2201.11861  [pdf, other

    cs.LG

    The Challenges of Exploration for Offline Reinforcement Learning

    Authors: Nathan Lambert, Markus Wulfmeier, William Whitney, Arunkumar Byravan, Michael Bloesch, Vibhavari Dasagi, Tim Hertweck, Martin Riedmiller

    Abstract: Offline Reinforcement Learning (ORL) enablesus to separately study the two interlinked processes of reinforcement learning: collecting informative experience and inferring optimal behaviour. The second step has been widely studied in the offline setting, but just as critical to data-efficient RL is the collection of informative data. The task-agnostic setting for data collection, where the task is… ▽ More

    Submitted 18 February, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

  18. arXiv:2112.05062  [pdf, other

    cs.LG cs.AI cs.RO

    Learning Transferable Motor Skills with Hierarchical Latent Mixture Policies

    Authors: Dushyant Rao, Fereshteh Sadeghi, Leonard Hasenclever, Markus Wulfmeier, Martina Zambelli, Giulia Vezzani, Dhruva Tirumala, Yusuf Aytar, Josh Merel, Nicolas Heess, Raia Hadsell

    Abstract: For robots operating in the real world, it is desirable to learn reusable behaviours that can effectively be transferred and adapted to numerous tasks and scenarios. We propose an approach to learn abstract motor skills from data using a hierarchical mixture latent variable model. In contrast to existing work, our method exploits a three-level hierarchy of both discrete and continuous latent varia… ▽ More

    Submitted 14 March, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

  19. arXiv:2112.00597  [pdf, other

    cs.RO stat.ML

    Wish you were here: Hindsight Goal Selection for long-horizon dexterous manipulation

    Authors: Todor Davchev, Oleg Sushkov, Jean-Baptiste Regli, Stefan Schaal, Yusuf Aytar, Markus Wulfmeier, Jon Scholz

    Abstract: Complex sequential tasks in continuous-control settings often require agents to successfully traverse a set of "narrow passages" in their state space. Solving such tasks with a sparse reward in a sample-efficient manner poses a challenge to modern reinforcement learning (RL) due to the associated long-horizon nature of the problem and the lack of sufficient positive signal during learning. Various… ▽ More

    Submitted 22 March, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

    Journal ref: International Conference on Learning Representations (ICLR 2022)

  20. arXiv:2111.02552  [pdf, other

    cs.LG cs.AI cs.RO

    Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies

    Authors: Tim Seyde, Igor Gilitschenski, Wilko Schwarting, Bartolomeo Stellato, Martin Riedmiller, Markus Wulfmeier, Daniela Rus

    Abstract: Reinforcement learning (RL) for continuous control typically employs distributions whose support covers the entire action space. In this work, we investigate the colloquially known phenomenon that trained agents often prefer actions at the boundaries of that space. We draw theoretical connections to the emergence of bang-bang behavior in optimal control, and provide extensive empirical evaluation… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

  21. arXiv:2109.08603  [pdf, other

    cs.LG cs.NE cs.RO

    Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration

    Authors: Oliver Groth, Markus Wulfmeier, Giulia Vezzani, Vibhavari Dasagi, Tim Hertweck, Roland Hafner, Nicolas Heess, Martin Riedmiller

    Abstract: Curiosity-based reward schemes can present powerful exploration mechanisms which facilitate the discovery of solutions for complex, sparse or long-horizon tasks. However, as the agent learns to reach previously unexplored spaces and the objective adapts to reward new areas, many behaviours emerge only to disappear due to being overwritten by the constantly shifting objective. We argue that merely… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

    Comments: 14 pages, 7 figures, 2 tables

    ACM Class: I.2.6; I.2.9

  22. arXiv:2105.12196  [pdf, other

    cs.AI cs.MA cs.NE cs.RO

    From Motor Control to Team Play in Simulated Humanoid Football

    Authors: Siqi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever, Luke Marris, Saran Tunyasuvunakool, H. Francis Song, Markus Wulfmeier, Paul Muller, Tuomas Haarnoja, Brendan D. Tracey, Karl Tuyls, Thore Graepel, Nicolas Heess

    Abstract: Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents… ▽ More

    Submitted 25 May, 2021; originally announced May 2021.

  23. arXiv:2011.01758  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Representation Matters: Improving Perception and Exploration for Robotics

    Authors: Markus Wulfmeier, Arunkumar Byravan, Tim Hertweck, Irina Higgins, Ankush Gupta, Tejas Kulkarni, Malcolm Reynolds, Denis Teplyashin, Roland Hafner, Thomas Lampe, Martin Riedmiller

    Abstract: Projecting high-dimensional environment observations into lower-dimensional structured representations can considerably improve data-efficiency for reinforcement learning in domains with limited data such as robotics. Can a single generally useful representation be found? In order to answer this question, it is important to understand how the representation will be used by the agent and what prope… ▽ More

    Submitted 21 March, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: Published at ICRA 2021

  24. arXiv:2010.15492  [pdf, other

    cs.RO

    "What, not how": Solving an under-actuated insertion task from scratch

    Authors: Giulia Vezzani, Michael Neunert, Markus Wulfmeier, Rae Jeong, Thomas Lampe, Noah Siegel, Roland Hafner, Abbas Abdolmaleki, Martin Riedmiller, Francesco Nori

    Abstract: Robot manipulation requires a complex set of skills that need to be carefully combined and coordinated to solve a task. Yet, most ReinforcementLearning (RL) approaches in robotics study tasks which actually consist only of a single manipulation skill, such as grasping an object or inserting a pre-grasped object. As a result the skill ('how' to solve the task) but not the actual goal of a complete… ▽ More

    Submitted 30 October, 2020; v1 submitted 29 October, 2020; originally announced October 2020.

  25. arXiv:2008.12228  [pdf, other

    cs.RO cs.AI cs.LG stat.ML

    Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion

    Authors: Roland Hafner, Tim Hertweck, Philipp Klöppner, Michael Bloesch, Michael Neunert, Markus Wulfmeier, Saran Tunyasuvunakool, Nicolas Heess, Martin Riedmiller

    Abstract: Modern Reinforcement Learning (RL) algorithms promise to solve difficult motor control problems directly from raw sensory inputs. Their attraction is due in part to the fact that they can represent a general class of methods that allow to learn a solution with a reasonably set reward and minimal prior knowledge, even in situations where it is difficult or expensive for a human expert. For RL to tr… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

  26. arXiv:2007.15588  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Data-efficient Hindsight Off-policy Option Learning

    Authors: Markus Wulfmeier, Dushyant Rao, Roland Hafner, Thomas Lampe, Abbas Abdolmaleki, Tim Hertweck, Michael Neunert, Dhruva Tirumala, Noah Siegel, Nicolas Heess, Martin Riedmiller

    Abstract: We introduce Hindsight Off-policy Options (HO2), a data-efficient option learning algorithm. Given any trajectory, HO2 infers likely option choices and backpropagates through the dynamic programming inference procedure to robustly train all policy components off-policy and end-to-end. The approach outperforms existing option learning methods on common benchmarks. To better understand the option fr… ▽ More

    Submitted 15 June, 2021; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: Published at ICML2021

  27. arXiv:2005.07541  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Simple Sensor Intentions for Exploration

    Authors: Tim Hertweck, Martin Riedmiller, Michael Bloesch, Jost Tobias Springenberg, Noah Siegel, Markus Wulfmeier, Roland Hafner, Nicolas Heess

    Abstract: Modern reinforcement learning algorithms can learn solutions to increasingly difficult control problems while at the same time reduce the amount of prior knowledge needed for their application. One of the remaining challenges is the definition of reward schemes that appropriately facilitate exploration without biasing the solution in undesirable ways, and that can be implemented on real robotic sy… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

  28. arXiv:2001.00449  [pdf, other

    cs.LG cs.RO stat.ML

    Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics

    Authors: Michael Neunert, Abbas Abdolmaleki, Markus Wulfmeier, Thomas Lampe, Jost Tobias Springenberg, Roland Hafner, Francesco Romano, Jonas Buchli, Nicolas Heess, Martin Riedmiller

    Abstract: Many real-world control problems involve both discrete decision variables - such as the choice of control modes, gear switching or digital outputs - as well as continuous decision variables - such as velocity setpoints, control gains or analogue outputs. However, when defining the corresponding optimal control or reinforcement learning problem, it is commonly approximated with fully continuous or… ▽ More

    Submitted 2 January, 2020; originally announced January 2020.

    Comments: Presented at the 3rd Conference on Robot Learning (CoRL 2019), Osaka, Japan. Video: https://youtu.be/eUqQDLQXb7I

  29. arXiv:1911.10866  [pdf, other

    cs.LG stat.ML

    Disentangled Cumulants Help Successor Representations Transfer to New Tasks

    Authors: Christopher Grimm, Irina Higgins, Andre Barreto, Denis Teplyashin, Markus Wulfmeier, Tim Hertweck, Raia Hadsell, Satinder Singh

    Abstract: Biological intelligence can learn to solve many diverse tasks in a data efficient manner by re-using basic knowledge and skills from one task to another. Furthermore, many of such skills are acquired without explicit supervision in an intrinsically driven fashion. This is in contrast to the state-of-the-art reinforcement learning agents, which typically start learning each new task from scratch an… ▽ More

    Submitted 25 November, 2019; originally announced November 2019.

  30. arXiv:1911.08363  [pdf, other

    cs.AI cs.LG

    Attention-Privileged Reinforcement Learning

    Authors: Sasha Salter, Dushyant Rao, Markus Wulfmeier, Raia Hadsell, Ingmar Posner

    Abstract: Image-based Reinforcement Learning is known to suffer from poor sample efficiency and generalisation to unseen visuals such as distractors (task-independent aspects of the observation space). Visual domain randomisation encourages transfer by training over visual factors of variation that may be encountered in the target domain. This increases learning complexity, can negatively impact learning ra… ▽ More

    Submitted 11 January, 2021; v1 submitted 19 November, 2019; originally announced November 2019.

    Comments: Published at Conference on Robot Learning (CoRL) 2020

  31. arXiv:1906.11228  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Compositional Transfer in Hierarchical Reinforcement Learning

    Authors: Markus Wulfmeier, Abbas Abdolmaleki, Roland Hafner, Jost Tobias Springenberg, Michael Neunert, Tim Hertweck, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller

    Abstract: The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements. We introduce Regularized Hierarchical Policy Optimization (RHPO) to improve data-efficiency for domains with multiple dominant tasks and ultimately reduce required platform time. To this end, we employ compositional inductive biases on multip… ▽ More

    Submitted 19 May, 2020; v1 submitted 26 June, 2019; originally announced June 2019.

    Comments: Robotics Science and Systems 2020

  32. arXiv:1904.07346  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Efficient Supervision for Robot Learning via Imitation, Simulation, and Adaptation

    Authors: Markus Wulfmeier

    Abstract: Recent successes in machine learning have led to a shift in the design of autonomous systems, improving performance on existing tasks and rendering new applications possible. Data-focused approaches gain relevance across diverse, intricate applications when developing data collection and curation pipelines becomes more effective than manual behaviour design. The following work aims at increasing t… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: Dissertation Summary

  33. arXiv:1806.06003  [pdf, other

    stat.ML cs.AI cs.LG cs.RO

    On Machine Learning and Structure for Mobile Robots

    Authors: Markus Wulfmeier

    Abstract: Due to recent advances - compute, data, models - the role of learning in autonomous systems has expanded significantly, rendering new applications possible for the first time. While some of the most significant benefits are obtained in the perception modules of the software stack, other aspects continue to rely on known manual procedures based on prior knowledge on geometry, dynamics, kinematics e… ▽ More

    Submitted 15 June, 2018; originally announced June 2018.

    Comments: Informal Review

  34. arXiv:1806.05502  [pdf, other

    stat.ML cs.AI cs.CV cs.LG

    Scrutinizing and De-Biasing Intuitive Physics with Neural Stethoscopes

    Authors: Fabian B. Fuchs, Oliver Groth, Adam R. Kosiorek, Alex Bewley, Markus Wulfmeier, Andrea Vedaldi, Ingmar Posner

    Abstract: Visually predicting the stability of block towers is a popular task in the domain of intuitive physics. While previous work focusses on prediction accuracy, a one-dimensional performance measure, we provide a broader analysis of the learned physical understanding of the final model and how the learning process can be guided. To this end, we introduce neural stethoscopes as a general purpose framew… ▽ More

    Submitted 6 September, 2019; v1 submitted 14 June, 2018; originally announced June 2018.

  35. arXiv:1803.01840  [pdf, other

    cs.LG stat.ML

    TACO: Learning Task Decomposition via Temporal Alignment for Control

    Authors: Kyriacos Shiarlis, Markus Wulfmeier, Sasha Salter, Shimon Whiteson, Ingmar Posner

    Abstract: Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks. By reusing the corresponding sub-policies within and between tasks, they provide training data for each policy from different high-level tasks and compose them to perform novel ones. Existing approaches to modular LfD focus either on learning a single high-level t… ▽ More

    Submitted 10 August, 2018; v1 submitted 2 March, 2018; originally announced March 2018.

    Comments: 12 Pages. Published at ICML 2018

  36. arXiv:1712.07436  [pdf, other

    stat.ML cs.CV cs.RO

    Incremental Adversarial Domain Adaptation for Continually Changing Environments

    Authors: Markus Wulfmeier, Alex Bewley, Ingmar Posner

    Abstract: Continuous appearance shifts such as changes in weather and lighting conditions can impact the performance of deployed machine learning models. While unsupervised domain adaptation aims to address this challenge, current approaches do not utilise the continuity of the occurring shifts. In particular, many robotics applications exhibit these conditions and thus facilitate the potential to increment… ▽ More

    Submitted 24 February, 2018; v1 submitted 20 December, 2017; originally announced December 2017.

    Comments: International Conference on Robotics and Automation 2018

  37. arXiv:1707.07907  [pdf, other

    cs.AI

    Mutual Alignment Transfer Learning

    Authors: Markus Wulfmeier, Ingmar Posner, Pieter Abbeel

    Abstract: Training robots for operation in the real world is a complex, time consuming and potentially expensive task. Despite significant success of reinforcement learning in games and simulations, research in real robot applications has not been able to match similar progress. While sample complexity can be reduced by training policies in simulation, such policies can perform sub-optimally on the real pla… ▽ More

    Submitted 26 September, 2017; v1 submitted 25 July, 2017; originally announced July 2017.

  38. arXiv:1707.05300  [pdf, other

    cs.AI cs.LG cs.NE cs.RO

    Reverse Curriculum Generation for Reinforcement Learning

    Authors: Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, Pieter Abbeel

    Abstract: Many relevant tasks require an agent to reach a certain state, or to manipulate objects into a desired configuration. For example, we might want a robot to align and assemble a gear onto an axle or insert and turn a key in a lock. These goal-oriented tasks present a considerable challenge for reinforcement learning, since their natural reward function is sparse and prohibitive amounts of explorati… ▽ More

    Submitted 23 July, 2018; v1 submitted 17 July, 2017; originally announced July 2017.

    Comments: Published at the 1st Conference on Robot Learning (CoRL 2017)

  39. arXiv:1703.01461  [pdf, other

    cs.RO cs.LG

    Addressing Appearance Change in Outdoor Robotics with Adversarial Domain Adaptation

    Authors: Markus Wulfmeier, Alex Bewley, Ingmar Posner

    Abstract: Appearance changes due to weather and seasonal conditions represent a strong impediment to the robust implementation of machine learning systems in outdoor robotics. While supervised learning optimises a model for the training domain, it will deliver degraded performance in application domains that underlie distributional shifts caused by these changes. Traditionally, this problem has been address… ▽ More

    Submitted 17 September, 2017; v1 submitted 4 March, 2017; originally announced March 2017.

    Comments: In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017)

  40. arXiv:1612.04318  [pdf, other

    cs.RO cs.AI cs.LG

    Incorporating Human Domain Knowledge into Large Scale Cost Function Learning

    Authors: Markus Wulfmeier, Dushyant Rao, Ingmar Posner

    Abstract: Recent advances have shown the capability of Fully Convolutional Neural Networks (FCN) to model cost functions for motion planning in the context of learning driving preferences purely based on demonstration data from human drivers. While pure learning from demonstrations in the framework of Inverse Reinforcement Learning (IRL) is a promising approach, we can benefit from well informed human prior… ▽ More

    Submitted 13 December, 2016; originally announced December 2016.

    Comments: Neural Information Processing Systems 2016, Deep Reinforcement Learning Workshop

  41. arXiv:1607.02329  [pdf, other

    cs.RO cs.LG

    Watch This: Scalable Cost-Function Learning for Path Planning in Urban Environments

    Authors: Markus Wulfmeier, Dominic Zeng Wang, Ingmar Posner

    Abstract: In this work, we present an approach to learn cost maps for driving in complex urban environments from a very large number of demonstrations of driving behaviour by human experts. The learned cost maps are constructed directly from raw sensor measurements, bypassing the effort of manually designing cost maps as well as features. When deploying the learned cost maps, the trajectories generated not… ▽ More

    Submitted 8 July, 2016; originally announced July 2016.

    Comments: Accepted for publication in the Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2016)

  42. arXiv:1507.04888  [pdf, other

    cs.LG

    Maximum Entropy Deep Inverse Reinforcement Learning

    Authors: Markus Wulfmeier, Peter Ondruska, Ingmar Posner

    Abstract: This paper presents a general framework for exploiting the representational capacity of neural networks to approximate complex, nonlinear reward functions in the context of solving the inverse reinforcement learning (IRL) problem. We show in this context that the Maximum Entropy paradigm for IRL lends itself naturally to the efficient training of deep architectures. At test time, the approach lead… ▽ More

    Submitted 11 March, 2016; v1 submitted 17 July, 2015; originally announced July 2015.