-
Mind Mansion: Exploring Metaphorical Interactions to Engage with Negative Thoughts in Virtual Reality
Authors:
Julian Rasch,
Michelle Johanna Zender,
Sophia Sakel,
Nadine Wagener
Abstract:
Recurrent negative thoughts can significantly disrupt daily life and contribute to negative emotional states. Facing, confronting, and noticing such thoughts without support can be challenging. To provide a playful setting and leverage the technical maturation of Virtual Reality (VR), our VR experience, Mind Mansion, places the user in an initially cluttered virtual apartment. Here we utilize esta…
▽ More
Recurrent negative thoughts can significantly disrupt daily life and contribute to negative emotional states. Facing, confronting, and noticing such thoughts without support can be challenging. To provide a playful setting and leverage the technical maturation of Virtual Reality (VR), our VR experience, Mind Mansion, places the user in an initially cluttered virtual apartment. Here we utilize established concepts from traditional therapy and metaphors identified in prior works to let users engage metaphorically with representations of thoughts, gradually sorting the space, fostering awareness of thoughts, and supporting mental self-care. The results of our user study (n = 30) reveal that Mind Mansion encourages the exploration of alternative perspectives, fosters acceptance, and potentially offers new coping mechanisms. Our findings suggest that this VR intervention can reduce negative affect and improve overall emotional awareness.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
H-GAP: Humanoid Control with a Generalist Planner
Authors:
Zhengyao Jiang,
Yingchen Xu,
Nolan Wagener,
Yicheng Luo,
Michael Janner,
Edward Grefenstette,
Tim Rocktäschel,
Yuandong Tian
Abstract:
Humanoid control is an important research challenge offering avenues for integration into human-centric infrastructures and enabling physics-driven humanoid animations. The daunting challenges in this field stem from the difficulty of optimizing in high-dimensional action spaces and the instability introduced by the bipedal morphology of humanoids. However, the extensive collection of human motion…
▽ More
Humanoid control is an important research challenge offering avenues for integration into human-centric infrastructures and enabling physics-driven humanoid animations. The daunting challenges in this field stem from the difficulty of optimizing in high-dimensional action spaces and the instability introduced by the bipedal morphology of humanoids. However, the extensive collection of human motion-captured data and the derived datasets of humanoid trajectories, such as MoCapAct, paves the way to tackle these challenges. In this context, we present Humanoid Generalist Autoencoding Planner (H-GAP), a state-action trajectory generative model trained on humanoid trajectories derived from human motion-captured data, capable of adeptly handling downstream control tasks with Model Predictive Control (MPC). For 56 degrees of freedom humanoid, we empirically demonstrate that H-GAP learns to represent and generate a wide range of motor behaviours. Further, without any learning from online interactions, it can also flexibly transfer these behaviors to solve novel downstream control tasks via planning. Notably, H-GAP excels established MPC baselines that have access to the ground truth dynamics model, and is superior or comparable to offline RL methods trained for individual tasks. Finally, we do a series of empirical studies on the scaling properties of H-GAP, showing the potential for performance gains via additional data but not computing. Code and videos are available at https://ycxuyingchen.github.io/hgap/.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Acquire Driving Scenarios Efficiently: A Framework for Prospective Assessment of Cost-Optimal Scenario Acquisition
Authors:
Christoph Glasmacher,
Michael Schuldes,
Hendrik Weber,
Nicolas Wagener,
Lutz Eckstein
Abstract:
Scenario-based testing is becoming increasingly important in safety assurance for automated driving. However, comprehensive and sufficiently complete coverage of the scenario space requires significant effort and resources if using only real-world data. To address this issue, driving scenario generation methods are developed and used more frequently, but the benefit of substituting generated data…
▽ More
Scenario-based testing is becoming increasingly important in safety assurance for automated driving. However, comprehensive and sufficiently complete coverage of the scenario space requires significant effort and resources if using only real-world data. To address this issue, driving scenario generation methods are developed and used more frequently, but the benefit of substituting generated data for real-world data has not yet been quantified. Additionally, the coverage of a set of concrete scenarios within a given logical scenario space has not been predicted yet. This paper proposes a methodology to quantify the cost-optimal usage of scenario generation approaches to reach a certainly complete scenario space coverage under given quality constraints and parametrization. Therefore, individual process steps for scenario generation and usage are investigated and evaluated using a meta model for the abstraction of knowledge-based and data-driven methods. Furthermore, a methodology is proposed to fit the meta model including the prediction of reachable complete coverage, quality criteria, and costs. Finally, the paper exemplary examines the suitability of a hybrid generation model under technical, economical, and quality constraints in comparison to different real-world scenario mining methods.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation
Authors:
Xiangyun Meng,
Nathan Hatch,
Alexander Lambert,
Anqi Li,
Nolan Wagener,
Matthew Schmittle,
JoonHo Lee,
Wentao Yuan,
Zoey Chen,
Samuel Deng,
Greg Okopal,
Dieter Fox,
Byron Boots,
Amirreza Shaban
Abstract:
Effective use of camera-based vision systems is essential for robust performance in autonomous off-road driving, particularly in the high-speed regime. Despite success in structured, on-road settings, current end-to-end approaches for scene prediction have yet to be successfully adapted for complex outdoor terrain. To this end, we present TerrainNet, a vision-based terrain perception system for se…
▽ More
Effective use of camera-based vision systems is essential for robust performance in autonomous off-road driving, particularly in the high-speed regime. Despite success in structured, on-road settings, current end-to-end approaches for scene prediction have yet to be successfully adapted for complex outdoor terrain. To this end, we present TerrainNet, a vision-based terrain perception system for semantic and geometric terrain prediction for aggressive, off-road navigation. The approach relies on several key insights and practical considerations for achieving reliable terrain modeling. The network includes a multi-headed output representation to capture fine- and coarse-grained terrain features necessary for estimating traversability. Accurate depth estimation is achieved using self-supervised depth completion with multi-view RGB and stereo inputs. Requirements for real-time performance and fast inference speeds are met using efficient, learned image feature projections. Furthermore, the model is trained on a large-scale, real-world off-road dataset collected across a variety of diverse outdoor environments. We show how TerrainNet can also be used for costmap prediction and provide a detailed framework for integration into a planning module. We demonstrate the performance of TerrainNet through extensive comparison to current state-of-the-art baselines for camera-only scene prediction. Finally, we showcase the effectiveness of integrating TerrainNet within a complete autonomous-driving stack by conducting a real-world vehicle test in a challenging off-road scenario.
△ Less
Submitted 29 May, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control
Authors:
Nolan Wagener,
Andrey Kolobov,
Felipe Vieira Frujeri,
Ricky Loynd,
Ching-An Cheng,
Matthew Hausknecht
Abstract:
Simulated humanoids are an appealing research domain due to their physical capabilities. Nonetheless, they are also challenging to control, as a policy must drive an unstable, discontinuous, and high-dimensional physical system. One widely studied approach is to utilize motion capture (MoCap) data to teach the humanoid agent low-level skills (e.g., standing, walking, and running) that can then be…
▽ More
Simulated humanoids are an appealing research domain due to their physical capabilities. Nonetheless, they are also challenging to control, as a policy must drive an unstable, discontinuous, and high-dimensional physical system. One widely studied approach is to utilize motion capture (MoCap) data to teach the humanoid agent low-level skills (e.g., standing, walking, and running) that can then be re-used to synthesize high-level behaviors. However, even with MoCap data, controlling simulated humanoids remains very hard, as MoCap data offers only kinematic information. Finding physical control inputs to realize the demonstrated motions requires computationally intensive methods like reinforcement learning. Thus, despite the publicly available MoCap data, its utility has been limited to institutions with large-scale compute. In this work, we dramatically lower the barrier for productive research on this topic by training and releasing high-quality agents that can track over three hours of MoCap data for a simulated humanoid in the dm_control physics-based environment. We release MoCapAct (Motion Capture with Actions), a dataset of these expert agents and their rollouts, which contain proprioceptive observations and actions. We demonstrate the utility of MoCapAct by using it to train a single hierarchical policy capable of tracking the entire MoCap dataset within dm_control and show the learned low-level component can be re-used to efficiently learn downstream high-level tasks. Finally, we use MoCapAct to train an autoregressive GPT model and show that it can control a simulated humanoid to perform natural motion completion given a motion prompt.
Videos of the results and links to the code and dataset are available at https://microsoft.github.io/MoCapAct.
△ Less
Submitted 13 January, 2023; v1 submitted 15 August, 2022;
originally announced August 2022.
-
Consistent Dropout for Policy Gradient Reinforcement Learning
Authors:
Matthew Hausknecht,
Nolan Wagener
Abstract:
Dropout has long been a staple of supervised learning, but is rarely used in reinforcement learning. We analyze why naive application of dropout is problematic for policy-gradient learning algorithms and introduce consistent dropout, a simple technique to address this instability. We demonstrate consistent dropout enables stable training with A2C and PPO in both continuous and discrete action envi…
▽ More
Dropout has long been a staple of supervised learning, but is rarely used in reinforcement learning. We analyze why naive application of dropout is problematic for policy-gradient learning algorithms and introduce consistent dropout, a simple technique to address this instability. We demonstrate consistent dropout enables stable training with A2C and PPO in both continuous and discrete action environments across a wide range of dropout probabilities. Finally, we show that consistent dropout enables the online training of complex architectures such as GPT without needing to disable the model's native dropout.
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
-
Safe Reinforcement Learning Using Advantage-Based Intervention
Authors:
Nolan Wagener,
Byron Boots,
Ching-An Cheng
Abstract:
Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. Although much recent research has focused on the development of safe reinforcement learning (RL) algorithms that produce a safe policy after training, ensuring safety during training as well remains an open problem. A fundamental challenge is performing exploration while still s…
▽ More
Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. Although much recent research has focused on the development of safe reinforcement learning (RL) algorithms that produce a safe policy after training, ensuring safety during training as well remains an open problem. A fundamental challenge is performing exploration while still satisfying constraints in an unknown Markov decision process (MDP). In this work, we address this problem for the chance-constrained setting. We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training and optimizes the agent's policy using off-the-shelf RL algorithms designed for unconstrained MDPs. Our method comes with strong guarantees on safety during both training and deployment (i.e., after training and without the intervention mechanism) and policy performance compared to the optimal safety-constrained policy. In our experiments, we show that SAILR violates constraints far less during training than standard safe RL and constrained MDP approaches and converges to a well-performing policy that can be deployed safely without intervention. Our code is available at https://github.com/nolanwagener/safe_rl.
△ Less
Submitted 19 July, 2021; v1 submitted 16 June, 2021;
originally announced June 2021.
-
6-Layer Model for a Structured Description and Categorization of Urban Traffic and Environment
Authors:
Maike Scholtes,
Lukas Westhofen,
Lara Ruth Turner,
Katrin Lotto,
Michael Schuldes,
Hendrik Weber,
Nicolas Wagener,
Christian Neurohr,
Martin Bollmann,
Franziska Körtke,
Johannes Hiller,
Michael Hoss,
Julian Bock,
Lutz Eckstein
Abstract:
Verification and validation of automated driving functions impose large challenges. Currently, scenario-based approaches are investigated in research and industry, aiming at a reduction of testing efforts by specifying safety relevant scenarios. To define those scenarios and operate in a complex real-world design domain, a structured description of the environment is needed. Within the PEGASUS res…
▽ More
Verification and validation of automated driving functions impose large challenges. Currently, scenario-based approaches are investigated in research and industry, aiming at a reduction of testing efforts by specifying safety relevant scenarios. To define those scenarios and operate in a complex real-world design domain, a structured description of the environment is needed. Within the PEGASUS research project, the 6-Layer Model (6LM) was introduced for the description of highway scenarios. This paper refines the 6LM and extends it to urban traffic and environment. As defined in PEGASUS, the 6LM provides the possibility to categorize the environment and, therefore, functions as a structured basis for subsequent scenario description. The model enables a structured description and categorization of the general environment, without incorporating any knowledge or anticipating any functions of actors. Beyond that, there is a variety of other applications of the 6LM, which are elaborated in this paper. The 6LM includes a description of the road network and traffic guidance objects, roadside structures, temporary modifications of the former, dynamic objects, environmental conditions and digital information. The work at hand specifies each layer by categorizing its items. Guidelines are formulated and explanatory examples are given to standardize the application of the model for an objective environment description. In contrast to previous publications, the model and its design are described in far more detail. Finally, the holistic description of the 6LM presented includes remarks on possible future work when expanding the concept to machine perception aspects.
△ Less
Submitted 2 February, 2021; v1 submitted 9 December, 2020;
originally announced December 2020.
-
An Online Learning Approach to Model Predictive Control
Authors:
Nolan Wagener,
Ching-An Cheng,
Jacob Sacks,
Byron Boots
Abstract:
Model predictive control (MPC) is a powerful technique for solving dynamic control tasks. In this paper, we show that there exists a close connection between MPC and online learning, an abstract theoretical framework for analyzing online decision making in the optimization literature. This new perspective provides a foundation for leveraging powerful online learning algorithms to design MPC algori…
▽ More
Model predictive control (MPC) is a powerful technique for solving dynamic control tasks. In this paper, we show that there exists a close connection between MPC and online learning, an abstract theoretical framework for analyzing online decision making in the optimization literature. This new perspective provides a foundation for leveraging powerful online learning algorithms to design MPC algorithms. Specifically, we propose a new algorithm based on dynamic mirror descent (DMD), an online learning algorithm that is designed for non-stationary setups. Our algorithm, Dynamic Mirror Descent Model Predictive Control (DMD-MPC), represents a general family of MPC algorithms that includes many existing techniques as special instances. DMD-MPC also provides a fresh perspective on previous heuristics used in MPC and suggests a principled way to design new MPC algorithms. In the experimental section of this paper, we demonstrate the flexibility of DMD-MPC, presenting a set of new MPC algorithms on a simple simulated cartpole and a simulated and real-world aggressive driving task. Videos of the real-world experiments can be found at https://youtu.be/vZST3v0_S9w and https://youtu.be/MhuqiHo2t98.
△ Less
Submitted 9 October, 2019; v1 submitted 24 February, 2019;
originally announced February 2019.
-
Fast Policy Learning through Imitation and Reinforcement
Authors:
Ching-An Cheng,
Xinyan Yan,
Nolan Wagener,
Byron Boots
Abstract:
Imitation learning (IL) consists of a set of tools that leverage expert demonstrations to quickly learn policies. However, if the expert is suboptimal, IL can yield policies with inferior performance compared to reinforcement learning (RL). In this paper, we aim to provide an algorithm that combines the best aspects of RL and IL. We accomplish this by formulating several popular RL and IL algorith…
▽ More
Imitation learning (IL) consists of a set of tools that leverage expert demonstrations to quickly learn policies. However, if the expert is suboptimal, IL can yield policies with inferior performance compared to reinforcement learning (RL). In this paper, we aim to provide an algorithm that combines the best aspects of RL and IL. We accomplish this by formulating several popular RL and IL algorithms in a common mirror descent framework, showing that these algorithms can be viewed as a variation on a single approach. We then propose LOKI, a strategy for policy learning that first performs a small but random number of IL iterations before switching to a policy gradient RL method. We show that if the switching time is properly randomized, LOKI can learn to outperform a suboptimal expert and converge faster than running policy gradient from scratch. Finally, we evaluate the performance of LOKI experimentally in several simulated environments.
△ Less
Submitted 25 May, 2018;
originally announced May 2018.
-
Learning Contact-Rich Manipulation Skills with Guided Policy Search
Authors:
Sergey Levine,
Nolan Wagener,
Pieter Abbeel
Abstract:
Autonomous learning of object manipulation skills can enable robots to acquire rich behavioral repertoires that scale to the variety of objects found in the real world. However, current motion skill learning methods typically restrict the behavior to a compact, low-dimensional representation, limiting its expressiveness and generality. In this paper, we extend a recently developed policy search me…
▽ More
Autonomous learning of object manipulation skills can enable robots to acquire rich behavioral repertoires that scale to the variety of objects found in the real world. However, current motion skill learning methods typically restrict the behavior to a compact, low-dimensional representation, limiting its expressiveness and generality. In this paper, we extend a recently developed policy search method \cite{la-lnnpg-14} and use it to learn a range of dynamic manipulation behaviors with highly general policy representations, without using known models or example demonstrations. Our approach learns a set of trajectories for the desired motion skill by using iteratively refitted time-varying linear models, and then unifies these trajectories into a single control policy that can generalize to new situations. To enable this method to run on a real robot, we introduce several improvements that reduce the sample count and automate parameter selection. We show that our method can acquire fast, fluent behaviors after only minutes of interaction time, and can learn robust controllers for complex tasks, including putting together a toy airplane, stacking tight-fitting lego blocks, placing wooden rings onto tight-fitting pegs, inserting a shoe tree into a shoe, and screwing bottle caps onto bottles.
△ Less
Submitted 26 February, 2015; v1 submitted 22 January, 2015;
originally announced January 2015.