Search | arXiv e-print repository

On the functional graph of $f(X)=X(X^{q-1}-c)^{q+1},$ over quadratic extensions of finite fields

Authors: Josimar J. R. Aguirre, Abílio Lemos, Victor G. L. Neumann

Abstract: Let $\mathbb{F}_{q}$ be the finite field with $q$ elements. In this paper we will describe the dynamics of the map $f(X)=X(X^{q-1}-c)^{q+1},$ with $c\in\mathbb{F}_{q}^{\ast},$ over the finite field $\mathbb{F}_{q^2}$. Let $\mathbb{F}_{q}$ be the finite field with $q$ elements. In this paper we will describe the dynamics of the map $f(X)=X(X^{q-1}-c)^{q+1},$ with $c\in\mathbb{F}_{q}^{\ast},$ over the finite field $\mathbb{F}_{q^2}$. △ Less

Submitted 16 August, 2024; originally announced August 2024.

MSC Class: 12E20; 11T06; 05C20

arXiv:2407.15660 [pdf, other]

MuTT: A Multimodal Trajectory Transformer for Robot Skills

Authors: Claudius Kienle, Benjamin Alt, Onur Celik, Philipp Becker, Darko Katic, Rainer Jäkel, Gerhard Neumann

Abstract: High-level robot skills represent an increasingly popular paradigm in robot programming. However, configuring the skills' parameters for a specific task remains a manual and time-consuming endeavor. Existing approaches for learning or optimizing these parameters often require numerous real-world executions or do not work in dynamic environments. To address these challenges, we propose MuTT, a nove… ▽ More High-level robot skills represent an increasingly popular paradigm in robot programming. However, configuring the skills' parameters for a specific task remains a manual and time-consuming endeavor. Existing approaches for learning or optimizing these parameters often require numerous real-world executions or do not work in dynamic environments. To address these challenges, we propose MuTT, a novel encoder-decoder transformer architecture designed to predict environment-aware executions of robot skills by integrating vision, trajectory, and robot skill parameters. Notably, we pioneer the fusion of vision and trajectory, introducing a novel trajectory projection. Furthermore, we illustrate MuTT's efficacy as a predictor when combined with a model-based robot skill optimizer. This approach facilitates the optimization of robot skill parameters for the current environment, without the need for real-world executions during optimization. Designed for compatibility with any representation of robot skills, MuTT demonstrates its versatility across three comprehensive experiments, showcasing superior performance across two different skill representations. △ Less

Submitted 22 August, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

arXiv:2406.15131 [pdf, other]

KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty

Authors: Philipp Becker, Niklas Freymuth, Gerhard Neumann

Abstract: Probabilistic State Space Models (SSMs) are essential for Reinforcement Learning (RL) from high-dimensional, partial information as they provide concise representations for control. Yet, they lack the computational efficiency of their recent deterministic counterparts such as S4 or Mamba. We propose KalMamba, an efficient architecture to learn representations for RL that combines the strengths of… ▽ More Probabilistic State Space Models (SSMs) are essential for Reinforcement Learning (RL) from high-dimensional, partial information as they provide concise representations for control. Yet, they lack the computational efficiency of their recent deterministic counterparts such as S4 or Mamba. We propose KalMamba, an efficient architecture to learn representations for RL that combines the strengths of probabilistic SSMs with the scalability of deterministic SSMs. KalMamba leverages Mamba to learn the dynamics parameters of a linear Gaussian SSM in a latent space. Inference in this latent space amounts to standard Kalman filtering and smoothing. We realize these operations using parallel associative scanning, similar to Mamba, to obtain a principled, highly efficient, and scalable probabilistic SSM. Our experiments show that KalMamba competes with state-of-the-art SSM approaches in RL while significantly improving computational efficiency, especially on longer interaction sequences. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14161 [pdf, other]

Iterative Sizing Field Prediction for Adaptive Mesh Generation From Expert Demonstrations

Authors: Niklas Freymuth, Philipp Dahlinger, Tobias Würth, Philipp Becker, Aleksandar Taranovic, Onno Grönheim, Luise Kärger, Gerhard Neumann

Abstract: Many engineering systems require accurate simulations of complex physical systems. Yet, analytical solutions are only available for simple problems, necessitating numerical approximations such as the Finite Element Method (FEM). The cost and accuracy of the FEM scale with the resolution of the underlying computational mesh. To balance computational speed and accuracy meshes with adaptive resolutio… ▽ More Many engineering systems require accurate simulations of complex physical systems. Yet, analytical solutions are only available for simple problems, necessitating numerical approximations such as the Finite Element Method (FEM). The cost and accuracy of the FEM scale with the resolution of the underlying computational mesh. To balance computational speed and accuracy meshes with adaptive resolution are used, allocating more resources to critical parts of the geometry. Currently, practitioners often resort to hand-crafted meshes, which require extensive expert knowledge and are thus costly to obtain. Our approach, Adaptive Meshing By Expert Reconstruction (AMBER), views mesh generation as an imitation learning problem. AMBER combines a graph neural network with an online data acquisition scheme to predict the projected sizing field of an expert mesh on a given intermediate mesh, creating a more accurate subsequent mesh. This iterative process ensures efficient and accurate imitation of expert mesh resolutions on arbitrary new geometries during inference. We experimentally validate AMBER on heuristic 2D meshes and 3D meshes provided by a human expert, closely matching the provided demonstrations and outperforming a single-step CNN baseline. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Accepted as a workshop paper in AI4Science@ICML 2024

arXiv:2406.12538 [pdf, other]

Variational Distillation of Diffusion Policies into Mixture of Experts

Authors: Hongyi Zhou, Denis Blessing, Ge Li, Onur Celik, Xiaogang Jia, Gerhard Neumann, Rudolf Lioutikov

Abstract: This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions. This ability allows Diffusion Models to r… ▽ More This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions. This ability allows Diffusion Models to replicate the inherent diversity in human behavior, making them the preferred models in behavior learning such as Learning from Human Demonstrations (LfD). However, diffusion models come with some drawbacks, including the intractability of likelihoods and long inference times due to their iterative sampling process. The inference times, in particular, pose a significant challenge to real-time applications such as robot control. In contrast, MoEs effectively address the aforementioned issues while retaining the ability to represent complex distributions but are notoriously difficult to train. VDD is the first method that distills pre-trained diffusion models into MoE models, and hence, combines the expressiveness of Diffusion Models with the benefits of Mixture Models. Specifically, VDD leverages a decompositional upper bound of the variational objective that allows the training of each expert separately, resulting in a robust optimization scheme for MoEs. VDD demonstrates across nine complex behavior learning tasks, that it is able to: i) accurately distill complex distributions learned by the diffusion model, ii) outperform existing state-of-the-art distillation methods, and iii) surpass conventional methods for training MoE. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.08440 [pdf, other]

Adaptive Swarm Mesh Refinement using Deep Reinforcement Learning with Local Rewards

Authors: Niklas Freymuth, Philipp Dahlinger, Tobias Würth, Simon Reisch, Luise Kärger, Gerhard Neumann

Abstract: Simulating physical systems is essential in engineering, but analytical solutions are limited to straightforward problems. Consequently, numerical methods like the Finite Element Method (FEM) are widely used. However, the FEM becomes computationally expensive as problem complexity and accuracy demands increase. Adaptive Mesh Refinement (AMR) improves the FEM by dynamically allocating mesh elements… ▽ More Simulating physical systems is essential in engineering, but analytical solutions are limited to straightforward problems. Consequently, numerical methods like the Finite Element Method (FEM) are widely used. However, the FEM becomes computationally expensive as problem complexity and accuracy demands increase. Adaptive Mesh Refinement (AMR) improves the FEM by dynamically allocating mesh elements on the domain, balancing computational speed and accuracy. Classical AMR depends on heuristics or expensive error estimators, limiting its use in complex simulations. While learning-based AMR methods are promising, they currently only scale to simple problems. In this work, we formulate AMR as a system of collaborating, homogeneous agents that iteratively split into multiple new agents. This agent-wise perspective enables a spatial reward formulation focused on reducing the maximum mesh element error. Our approach, Adaptive Swarm Mesh Refinement (ASMR), offers efficient, stable optimization and generates highly adaptive meshes at user-defined resolution during inference. Extensive experiments, including volumetric meshes and Neumann boundary conditions, demonstrate that ASMR exceeds heuristic approaches and learned baselines, matching the performance of expensive error-based oracle AMR strategies. ASMR additionally generalizes to different domains during inference, and produces meshes that simulate up to 2 orders of magnitude faster than uniform refinements in more demanding settings. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Submitted to Journal of Machine Learning Research (JMLR)

arXiv:2406.08234 [pdf, other]

MaIL: Improving Imitation Learning with Mamba

Authors: Xiaogang Jia, Qian Wang, Atalay Donat, Bowen Xing, Ge Li, Hongyi Zhou, Onur Celik, Denis Blessing, Rudolf Lioutikov, Gerhard Neumann

Abstract: This work introduces Mamba Imitation Learning (MaIL), a novel imitation learning (IL) architecture that offers a computationally efficient alternative to state-of-the-art (SoTA) Transformer policies. Transformer-based policies have achieved remarkable results due to their ability in handling human-recorded data with inherently non-Markovian behavior. However, their high performance comes with the… ▽ More This work introduces Mamba Imitation Learning (MaIL), a novel imitation learning (IL) architecture that offers a computationally efficient alternative to state-of-the-art (SoTA) Transformer policies. Transformer-based policies have achieved remarkable results due to their ability in handling human-recorded data with inherently non-Markovian behavior. However, their high performance comes with the drawback of large models that complicate effective training. While state space models (SSMs) have been known for their efficiency, they were not able to match the performance of Transformers. Mamba significantly improves the performance of SSMs and rivals against Transformers, positioning it as an appealing alternative for IL policies. MaIL leverages Mamba as a backbone and introduces a formalism that allows using Mamba in the encoder-decoder structure. This formalism makes it a versatile architecture that can be used as a standalone policy or as part of a more advanced architecture, such as a diffuser in the diffusion process. Extensive evaluations on the LIBERO IL benchmark and three real robot experiments show that MaIL: i) outperforms Transformers in all LIBERO tasks, ii) achieves good performance even with small datasets, iii) is able to effectively process multi-modal sensory inputs, iv) is more robust to input noise compared to Transformers. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07423 [pdf, other]

Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling

Authors: Denis Blessing, Xiaogang Jia, Johannes Esslinger, Francisco Vargas, Gerhard Neumann

Abstract: Monte Carlo methods, Variational Inference, and their combinations play a pivotal role in sampling from intractable probability distributions. However, current studies lack a unified evaluation framework, relying on disparate performance measures and limited method comparisons across diverse tasks, complicating the assessment of progress and hindering the decision-making of practitioners. In respo… ▽ More Monte Carlo methods, Variational Inference, and their combinations play a pivotal role in sampling from intractable probability distributions. However, current studies lack a unified evaluation framework, relying on disparate performance measures and limited method comparisons across diverse tasks, complicating the assessment of progress and hindering the decision-making of practitioners. In response to these challenges, our work introduces a benchmark that evaluates sampling methods using a standardized task suite and a broad range of performance criteria. Moreover, we study existing metrics for quantifying mode collapse and introduce novel metrics for this purpose. Our findings provide insights into strengths and weaknesses of existing sampling methods, serving as a valuable reference for future developments. The code is publicly available here. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2403.06966 [pdf, other]

Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts

Authors: Onur Celik, Aleksandar Taranovic, Gerhard Neumann

Abstract: Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy. However, learning diverse skills is challenging in RL due to the commonly used Gaussian policy parameterization. We propose \textbf{Di}verse \textbf{Skil}l \textbf{L}earning (Di-SkilL\footnote{Videos and code are available on the project webpage: \url{https://alrhub.github.io/di-skill-website/}}), an RL meth… ▽ More Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy. However, learning diverse skills is challenging in RL due to the commonly used Gaussian policy parameterization. We propose \textbf{Di}verse \textbf{Skil}l \textbf{L}earning (Di-SkilL\footnote{Videos and code are available on the project webpage: \url{https://alrhub.github.io/di-skill-website/}}), an RL method for learning diverse skills using Mixture of Experts, where each expert formalizes a skill as a contextual motion primitive. Di-SkilL optimizes each expert and its associate context distribution to a maximum entropy objective that incentivizes learning diverse skills in similar contexts. The per-expert context distribution enables automatic curricula learning, allowing each expert to focus on its best-performing sub-region of the context space. To overcome hard discontinuities and multi-modalities without any prior knowledge of the environment's unknown context probability space, we leverage energy-based models to represent the per-expert context distributions and demonstrate how we can efficiently train them using the standard policy gradient objective. We show on challenging robot simulation tasks that Di-SkilL can learn diverse and performant skills. △ Less

Submitted 10 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: International conference on machine learning (ICML)

arXiv:2403.04453 [pdf, other]

Vlearn: Off-Policy Learning with Efficient State-Value Function Estimation

Authors: Fabian Otto, Philipp Becker, Ngo Anh Vien, Gerhard Neumann

Abstract: Existing off-policy reinforcement learning algorithms often rely on an explicit state-action-value function representation, which can be problematic in high-dimensional action spaces due to the curse of dimensionality. This reliance results in data inefficiency as maintaining a state-action-value function in such spaces is challenging. We present an efficient approach that utilizes only a state-va… ▽ More Existing off-policy reinforcement learning algorithms often rely on an explicit state-action-value function representation, which can be problematic in high-dimensional action spaces due to the curse of dimensionality. This reliance results in data inefficiency as maintaining a state-action-value function in such spaces is challenging. We present an efficient approach that utilizes only a state-value function as the critic for off-policy deep reinforcement learning. This approach, which we refer to as Vlearn, effectively circumvents the limitations of existing methods by eliminating the necessity for an explicit state-action-value function. To this end, we introduce a novel importance sampling loss for learning deep value functions from off-policy data. While this is common for linear methods, it has not been combined with deep value function networks. This transfer to deep methods is not straightforward and requires novel design choices such as robust policy updates, twin value function networks to avoid an optimization bias, and importance weight clipping. We also present a novel analysis of the variance of our estimate compared to commonly used importance sampling estimators such as V-trace. Our approach improves sample complexity as well as final performance and ensures consistent and robust performance across various benchmark tasks. Eliminating the state-action-value function in Vlearn facilitates a streamlined learning process, enabling more effective exploration and exploitation in complex environments. △ Less

Submitted 20 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2402.14606 [pdf, other]

Towards Diverse Behaviors: A Benchmark for Imitation Learning with Human Demonstrations

Authors: Xiaogang Jia, Denis Blessing, Xinkai Jiang, Moritz Reuss, Atalay Donat, Rudolf Lioutikov, Gerhard Neumann

Abstract: Imitation learning with human data has demonstrated remarkable success in teaching robots in a wide range of skills. However, the inherent diversity in human behavior leads to the emergence of multi-modal data distributions, thereby presenting a formidable challenge for existing imitation learning algorithms. Quantifying a model's capacity to capture and replicate this diversity effectively is sti… ▽ More Imitation learning with human data has demonstrated remarkable success in teaching robots in a wide range of skills. However, the inherent diversity in human behavior leads to the emergence of multi-modal data distributions, thereby presenting a formidable challenge for existing imitation learning algorithms. Quantifying a model's capacity to capture and replicate this diversity effectively is still an open problem. In this work, we introduce simulation benchmark environments and the corresponding Datasets with Diverse human Demonstrations for Imitation Learning (D3IL), designed explicitly to evaluate a model's ability to learn multi-modal behavior. Our environments are designed to involve multiple sub-tasks that need to be solved, consider manipulation of multiple objects which increases the diversity of the behavior and can only be solved by policies that rely on closed loop sensory feedback. Other available datasets are missing at least one of these challenging properties. To address the challenge of diversity quantification, we introduce tractable metrics that provide valuable insights into a model's ability to acquire and reproduce diverse behaviors. These metrics offer a practical means to assess the robustness and versatility of imitation learning algorithms. Furthermore, we conduct a thorough evaluation of state-of-the-art methods on the proposed task suite. This evaluation serves as a benchmark for assessing their capability to learn diverse behaviors. Our findings shed light on the effectiveness of these methods in tackling the intricate problem of capturing and generalizing multi-modal human behaviors, offering a valuable reference for the design of future imitation learning algorithms. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.10681 [pdf, other]

doi 10.1016/j.cma.2024.117102

Physics-informed MeshGraphNets (PI-MGNs): Neural finite element solvers for non-stationary and nonlinear simulations on arbitrary meshes

Authors: Tobias Würth, Niklas Freymuth, Clemens Zimmerling, Gerhard Neumann, Luise Kärger

Abstract: Engineering components must meet increasing technological demands in ever shorter development cycles. To face these challenges, a holistic approach is essential that allows for the concurrent development of part design, material system and manufacturing process. Current approaches employ numerical simulations, which however quickly becomes computation-intensive, especially for iterative optimizati… ▽ More Engineering components must meet increasing technological demands in ever shorter development cycles. To face these challenges, a holistic approach is essential that allows for the concurrent development of part design, material system and manufacturing process. Current approaches employ numerical simulations, which however quickly becomes computation-intensive, especially for iterative optimization. Data-driven machine learning methods can be used to replace time- and resource-intensive numerical simulations. In particular, MeshGraphNets (MGNs) have shown promising results. They enable fast and accurate predictions on unseen mesh geometries while being fully differentiable for optimization. However, these models rely on large amounts of expensive training data, such as numerical simulations. Physics-informed neural networks (PINNs) offer an opportunity to train neural networks with partial differential equations instead of labeled data, but have not been extended yet to handle time-dependent simulations of arbitrary meshes. This work introduces PI-MGNs, a hybrid approach that combines PINNs and MGNs to quickly and accurately solve non-stationary and nonlinear partial differential equations (PDEs) on arbitrary meshes. The method is exemplified for thermal process simulations of unseen parts with inhomogeneous material distribution. Further results show that the model scales well to large and complex meshes, although it is trained on small generic meshes only. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: Submitted to CMAME

arXiv:2401.11437 [pdf, other]

Open the Black Box: Step-based Policy Updates for Temporally-Correlated Episodic Reinforcement Learning

Authors: Ge Li, Hongyi Zhou, Dominik Roth, Serge Thilges, Fabian Otto, Rudolf Lioutikov, Gerhard Neumann

Abstract: Current advancements in reinforcement learning (RL) have predominantly focused on learning step-based policies that generate actions for each perceived state. While these methods efficiently leverage step information from environmental interaction, they often ignore the temporal correlation between actions, resulting in inefficient exploration and unsmooth trajectories that are challenging to impl… ▽ More Current advancements in reinforcement learning (RL) have predominantly focused on learning step-based policies that generate actions for each perceived state. While these methods efficiently leverage step information from environmental interaction, they often ignore the temporal correlation between actions, resulting in inefficient exploration and unsmooth trajectories that are challenging to implement on real hardware. Episodic RL (ERL) seeks to overcome these challenges by exploring in parameters space that capture the correlation of actions. However, these approaches typically compromise data efficiency, as they treat trajectories as opaque \emph{black boxes}. In this work, we introduce a novel ERL algorithm, Temporally-Correlated Episodic RL (TCE), which effectively utilizes step information in episodic policy updates, opening the 'black box' in existing ERL methods while retaining the smooth and consistent exploration in parameter space. TCE synergistically combines the advantages of step-based and episodic RL, achieving comparable performance to recent ERL methods while maintaining data efficiency akin to state-of-the-art (SoTA) step-based RL. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: Codebase, see: https://github.com/BruceGeLi/TCE_RL

arXiv:2401.09352 [pdf, other]

Neural Contractive Dynamical Systems

Authors: Hadi Beik-Mohammadi, Søren Hauberg, Georgios Arvanitidis, Nadia Figueroa, Gerhard Neumann, Leonel Rozo

Abstract: Stability guarantees are crucial when ensuring a fully autonomous robot does not take undesirable or potentially harmful actions. Unfortunately, global stability guarantees are hard to provide in dynamical systems learned from data, especially when the learned dynamics are governed by neural networks. We propose a novel methodology to learn neural contractive dynamical systems, where our neural ar… ▽ More Stability guarantees are crucial when ensuring a fully autonomous robot does not take undesirable or potentially harmful actions. Unfortunately, global stability guarantees are hard to provide in dynamical systems learned from data, especially when the learned dynamics are governed by neural networks. We propose a novel methodology to learn neural contractive dynamical systems, where our neural architecture ensures contraction, and hence, global stability. To efficiently scale the method to high-dimensional dynamical systems, we develop a variant of the variational autoencoder that learns dynamics in a low-dimensional latent representation space while retaining contractive stability after decoding. We further extend our approach to learning contractive systems on the Lie group of rotations to account for full-pose end-effector dynamic motions. The result is the first highly flexible learning architecture that provides contractive stability guarantees with capability to perform obstacle avoidance. Empirically, we demonstrate that our approach encodes the desired dynamics more accurately than the current state-of-the-art, which provides less strong stability guarantees. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2312.13905 [pdf, ps, other]

Domain-Specific Fine-Tuning of Large Language Models for Interactive Robot Programming

Authors: Benjamin Alt, Urs Keßner, Aleksandar Taranovic, Darko Katic, Andreas Hermann, Rainer Jäkel, Gerhard Neumann

Abstract: Industrial robots are applied in a widening range of industries, but robot programming mostly remains a task limited to programming experts. We propose a natural language-based assistant for programming of advanced, industrial robotic applications and investigate strategies for domain-specific fine-tuning of foundation models with limited data and compute. Industrial robots are applied in a widening range of industries, but robot programming mostly remains a task limited to programming experts. We propose a natural language-based assistant for programming of advanced, industrial robotic applications and investigate strategies for domain-specific fine-tuning of foundation models with limited data and compute. △ Less

Submitted 21 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: 5 pages, 1 figure, presented at the 2024 European Robotics Forum in Rimini, Italy

MSC Class: 68T40 ACM Class: I.2.9; I.2.5; I.2.6; I.2.7

arXiv:2312.10008 [pdf, ps, other]

doi 10.1109/LRA.2024.3382529

Movement Primitive Diffusion: Learning Gentle Robotic Manipulation of Deformable Objects

Authors: Paul Maria Scheikl, Nicolas Schreiber, Christoph Haas, Niklas Freymuth, Gerhard Neumann, Rudolf Lioutikov, Franziska Mathis-Ullrich

Abstract: Policy learning in robot-assisted surgery (RAS) lacks data efficient and versatile methods that exhibit the desired motion quality for delicate surgical interventions. To this end, we introduce Movement Primitive Diffusion (MPD), a novel method for imitation learning (IL) in RAS that focuses on gentle manipulation of deformable objects. The approach combines the versatility of diffusion-based imit… ▽ More Policy learning in robot-assisted surgery (RAS) lacks data efficient and versatile methods that exhibit the desired motion quality for delicate surgical interventions. To this end, we introduce Movement Primitive Diffusion (MPD), a novel method for imitation learning (IL) in RAS that focuses on gentle manipulation of deformable objects. The approach combines the versatility of diffusion-based imitation learning (DIL) with the high-quality motion generation capabilities of Probabilistic Dynamic Movement Primitives (ProDMPs). This combination enables MPD to achieve gentle manipulation of deformable objects, while maintaining data efficiency critical for RAS applications where demonstration data is scarce. We evaluate MPD across various simulated and real world robotic tasks on both state and image observations. MPD outperforms state-of-the-art DIL methods in success rate, motion quality, and data efficiency. Project page: https://scheiklp.github.io/movement-primitive-diffusion/ △ Less

Submitted 10 June, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Journal ref: IEEE Robotics and Automation Letters 9 (2024) 5338-5345

arXiv:2311.08240 [pdf, other]

Investigating the Encoding of Words in BERT's Neurons using Feature Textualization

Authors: Tanja Baeumel, Soniya Vijayakumar, Josef van Genabith, Guenter Neumann, Simon Ostermann

Abstract: Pretrained language models (PLMs) form the basis of most state-of-the-art NLP technologies. Nevertheless, they are essentially black boxes: Humans do not have a clear understanding of what knowledge is encoded in different parts of the models, especially in individual neurons. The situation is different in computer vision, where feature visualization provides a decompositional interpretability tec… ▽ More Pretrained language models (PLMs) form the basis of most state-of-the-art NLP technologies. Nevertheless, they are essentially black boxes: Humans do not have a clear understanding of what knowledge is encoded in different parts of the models, especially in individual neurons. The situation is different in computer vision, where feature visualization provides a decompositional interpretability technique for neurons of vision models. Activation maximization is used to synthesize inherently interpretable visual representations of the information encoded in individual neurons. Our work is inspired by this but presents a cautionary tale on the interpretability of single neurons, based on the first large-scale attempt to adapt activation maximization to NLP, and, more specifically, large PLMs. We propose feature textualization, a technique to produce dense representations of neurons in the PLM word embedding space. We apply feature textualization to the BERT model (Devlin et al., 2019) to investigate whether the knowledge encoded in individual neurons can be interpreted and symbolized. We find that the produced representations can provide insights about the knowledge encoded in individual neurons, but that individual neurons do not represent clearcut symbolic units of language such as words. Additionally, we use feature textualization to investigate how many neurons are needed to encode words in BERT. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: To be published in 'BlackboxNLP 2023: The 6th Workshop on Analysing and Interpreting Neural Networks for NLP'. Camera-ready version

arXiv:2311.07357 [pdf, other]

Registered and Segmented Deformable Object Reconstruction from a Single View Point Cloud

Authors: Pit Henrich, Balázs Gyenes, Paul Maria Scheikl, Gerhard Neumann, Franziska Mathis-Ullrich

Abstract: In deformable object manipulation, we often want to interact with specific segments of an object that are only defined in non-deformed models of the object. We thus require a system that can recognize and locate these segments in sensor data of deformed real world objects. This is normally done using deformable object registration, which is problem specific and complex to tune. Recent methods util… ▽ More In deformable object manipulation, we often want to interact with specific segments of an object that are only defined in non-deformed models of the object. We thus require a system that can recognize and locate these segments in sensor data of deformed real world objects. This is normally done using deformable object registration, which is problem specific and complex to tune. Recent methods utilize neural occupancy functions to improve deformable object registration by registering to an object reconstruction. Going one step further, we propose a system that in addition to reconstruction learns segmentation of the reconstructed object. As the resulting output already contains the information about the segments, we can skip the registration process. Tested on a variety of deformable objects in simulation and the real world, we demonstrate that our method learns to robustly find these segments. We also introduce a simple sampling algorithm to generate better training data for occupancy learning. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: Accepted at WACV 2024

arXiv:2311.05256 [pdf, other]

Latent Task-Specific Graph Network Simulators

Authors: Philipp Dahlinger, Niklas Freymuth, Michael Volpp, Tai Hoang, Gerhard Neumann

Abstract: Simulating dynamic physical interactions is a critical challenge across multiple scientific domains, with applications ranging from robotics to material science. For mesh-based simulations, Graph Network Simulators (GNSs) pose an efficient alternative to traditional physics-based simulators. Their inherent differentiability and speed make them particularly well-suited for inverse design problems.… ▽ More Simulating dynamic physical interactions is a critical challenge across multiple scientific domains, with applications ranging from robotics to material science. For mesh-based simulations, Graph Network Simulators (GNSs) pose an efficient alternative to traditional physics-based simulators. Their inherent differentiability and speed make them particularly well-suited for inverse design problems. Yet, adapting to new tasks from limited available data is an important aspect for real-world applications that current methods struggle with. We frame mesh-based simulation as a meta-learning problem and use a recent Bayesian meta-learning method to improve GNSs adaptability to new scenarios by leveraging context data and handling uncertainties. Our approach, latent task-specific graph network simulator, uses non-amortized task posterior approximations to sample latent descriptions of unknown system properties. Additionally, we leverage movement primitives for efficient full trajectory prediction, effectively addressing the issue of accumulating errors encountered by previous auto-regressive methods. We validate the effectiveness of our approach through various experiments, performing on par with or better than established baseline methods. Movement primitives further allow us to accommodate various types of context data, as demonstrated through the utilization of point clouds during inference. By combining GNSs with meta-learning, we bring them closer to real-world applicability, particularly in scenarios with smaller datasets. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.20574 [pdf, other]

Information-Theoretic Trust Regions for Stochastic Gradient-Based Optimization

Authors: Philipp Dahlinger, Philipp Becker, Maximilian Hüttenrauch, Gerhard Neumann

Abstract: Stochastic gradient-based optimization is crucial to optimize neural networks. While popular approaches heuristically adapt the step size and direction by rescaling gradients, a more principled approach to improve optimizers requires second-order information. Such methods precondition the gradient using the objective's Hessian. Yet, computing the Hessian is usually expensive and effectively using… ▽ More Stochastic gradient-based optimization is crucial to optimize neural networks. While popular approaches heuristically adapt the step size and direction by rescaling gradients, a more principled approach to improve optimizers requires second-order information. Such methods precondition the gradient using the objective's Hessian. Yet, computing the Hessian is usually expensive and effectively using second-order information in the stochastic gradient setting is non-trivial. We propose using Information-Theoretic Trust Region Optimization (arTuRO) for improved updates with uncertain second-order information. By modeling the network parameters as a Gaussian distribution and using a Kullback-Leibler divergence-based trust region, our approach takes bounded steps accounting for the objective's curvature and uncertainty in the parameters. Before each update, it solves the trust region problem for an optimal step size, resulting in a more stable and faster optimization process. We approximate the diagonal elements of the Hessian from stochastic gradients using a simple recursive least squares approach, constructing a model of the expected Hessian over time using only first-order information. We show that arTuRO combines the fast convergence of adaptive moment-based optimization with the generalization capabilities of SGD. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.18534 [pdf, other]

Multi Time Scale World Models

Authors: Vaisakh Shaj, Saleh Gholam Zadeh, Ozan Demir, Luiz Ricardo Douat, Gerhard Neumann

Abstract: Intelligent agents use internal world models to reason and make predictions about different courses of their actions at many scales. Devising learning paradigms and architectures that allow machines to learn world models that operate at multiple levels of temporal abstractions while dealing with complex uncertainty predictions is a major technical hurdle. In this work, we propose a probabilistic f… ▽ More Intelligent agents use internal world models to reason and make predictions about different courses of their actions at many scales. Devising learning paradigms and architectures that allow machines to learn world models that operate at multiple levels of temporal abstractions while dealing with complex uncertainty predictions is a major technical hurdle. In this work, we propose a probabilistic formalism to learn multi-time scale world models which we call the Multi Time Scale State Space (MTS3) model. Our model uses a computationally efficient inference scheme on multiple time scales for highly accurate long-horizon predictions and uncertainty estimates over several seconds into the future. Our experiments, which focus on action conditional long horizon future predictions, show that MTS3 outperforms recent methods on several system identification benchmarks including complex simulated and real-world dynamical systems. Code is available at this repository: https://github.com/ALRhub/MTS3. △ Less

Submitted 4 December, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

Comments: Accepted as spotlight at NeurIPS 2023

arXiv:2309.16832 [pdf, other]

doi 10.1093/mnras/stad2916

ALMA Lensing Cluster Survey: average dust, gas, and star formation properties of cluster and field galaxies from stacking analysis

Authors: Andrea Guerrero, Neil Nagar, Kotaro Kohno, Seiji Fujimoto, Vasily Kokorev, Gabriel Brammer, Jean-Baptiste Jolly, Kirsten Knudsen, Fengwu Sun, Franz E. Bauer, Gabriel B. Caminha, Karina Caputi, Gerald Neumann, Gustavo Orellana-González, Pierluigi Cerulo, Jorge González-López, Nicolas Laporte, Anton M. Koekemoer, Yiping Ao, Daniel Espada, Alejandra M. Muñoz Arancibia

Abstract: We develop new tools for continuum and spectral stacking of ALMA data, and apply these to the ALMA Lensing Cluster Survey (ALCS). We derive average dust masses, gas masses and star formation rates (SFR) from the stacked observed 260~GHz continuum of 3402 individually undetected star-forming galaxies, of which 1450 are cluster galaxies and 1952 field galaxies, over three redshift and stellar mass b… ▽ More We develop new tools for continuum and spectral stacking of ALMA data, and apply these to the ALMA Lensing Cluster Survey (ALCS). We derive average dust masses, gas masses and star formation rates (SFR) from the stacked observed 260~GHz continuum of 3402 individually undetected star-forming galaxies, of which 1450 are cluster galaxies and 1952 field galaxies, over three redshift and stellar mass bins (over $z = 0$-1.6 and log $M_{*} [M_{\odot}] = 8$-11.7), and derive the average molecular gas content by stacking the emission line spectra in a SFR-selected subsample. The average SFRs and specific SFRs of both cluster and field galaxies are lower than those expected for Main Sequence (MS) star-forming galaxies, and only galaxies with stellar mass of log $M_{*} [M_{\odot}] = 9.35$-10.6 show dust and gas fractions comparable to those in the MS. The ALMA-traced average `highly obscured' SFRs are typically lower than the SFRs observed from optical to near-IR spectral analysis. Cluster and field galaxies show similar trends in their contents of dust and gas, even when field galaxies were brighter in the stacked maps. From spectral stacking we find a potential CO ($J=4\to3$) line emission (SNR $\sim4$) when stacking cluster and field galaxies with the highest SFRs. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: 13 pages, 8 figures, 1 table (+4 pages, 4 figures in Appendix). Accepted for publication in MNRAS

arXiv:2308.16528 [pdf, other]

SA6D: Self-Adaptive Few-Shot 6D Pose Estimator for Novel and Occluded Objects

Authors: Ning Gao, Ngo Anh Vien, Hanna Ziesche, Gerhard Neumann

Abstract: To enable meaningful robotic manipulation of objects in the real-world, 6D pose estimation is one of the critical aspects. Most existing approaches have difficulties to extend predictions to scenarios where novel object instances are continuously introduced, especially with heavy occlusions. In this work, we propose a few-shot pose estimation (FSPE) approach called SA6D, which uses a self-adaptive… ▽ More To enable meaningful robotic manipulation of objects in the real-world, 6D pose estimation is one of the critical aspects. Most existing approaches have difficulties to extend predictions to scenarios where novel object instances are continuously introduced, especially with heavy occlusions. In this work, we propose a few-shot pose estimation (FSPE) approach called SA6D, which uses a self-adaptive segmentation module to identify the novel target object and construct a point cloud model of the target object using only a small number of cluttered reference images. Unlike existing methods, SA6D does not require object-centric reference images or any additional object information, making it a more generalizable and scalable solution across categories. We evaluate SA6D on real-world tabletop object datasets and demonstrate that SA6D outperforms existing FSPE methods, particularly in cluttered scenes with occlusions, while requiring fewer reference images. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Journal ref: Conference on Robot Learning (CoRL), 2023

arXiv:2308.11369 [pdf, other]

Enhancing Interpretable Object Abstraction via Clustering-based Slot Initialization

Authors: Ning Gao, Bernard Hohmann, Gerhard Neumann

Abstract: Object-centric representations using slots have shown the advances towards efficient, flexible and interpretable abstraction from low-level perceptual features in a compositional scene. Current approaches randomize the initial state of slots followed by an iterative refinement. As we show in this paper, the random slot initialization significantly affects the accuracy of the final slot prediction.… ▽ More Object-centric representations using slots have shown the advances towards efficient, flexible and interpretable abstraction from low-level perceptual features in a compositional scene. Current approaches randomize the initial state of slots followed by an iterative refinement. As we show in this paper, the random slot initialization significantly affects the accuracy of the final slot prediction. Moreover, current approaches require a predetermined number of slots from prior knowledge of the data, which limits the applicability in the real world. In our work, we initialize the slot representations with clustering algorithms conditioned on the perceptual input features. This requires an additional layer in the architecture to initialize the slots given the identified clusters. We design permutation invariant and permutation equivariant versions of this layer to enable the exchangeable slot representations after clustering. Additionally, we employ mean-shift clustering to automatically identify the number of slots for a given scene. We evaluate our method on object discovery and novel view synthesis tasks with various datasets. The results show that our method outperforms prior works consistently, especially for complex scenes. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Journal ref: The 34th British Machine Vision Conference (BMVC), 2023

arXiv:2308.00456 [pdf, other]

DMFC-GraspNet: Differentiable Multi-Fingered Robotic Grasp Generation in Cluttered Scenes

Authors: Philipp Blättner, Johannes Brand, Gerhard Neumann, Ngo Anh Vien

Abstract: Robotic grasping is a fundamental skill required for object manipulation in robotics. Multi-fingered robotic hands, which mimic the structure of the human hand, can potentially perform complex object manipulation. Nevertheless, current techniques for multi-fingered robotic grasping frequently predict only a single grasp for each inference time, limiting computational efficiency and their versatili… ▽ More Robotic grasping is a fundamental skill required for object manipulation in robotics. Multi-fingered robotic hands, which mimic the structure of the human hand, can potentially perform complex object manipulation. Nevertheless, current techniques for multi-fingered robotic grasping frequently predict only a single grasp for each inference time, limiting computational efficiency and their versatility, i.e. unimodal grasp distribution. This paper proposes a differentiable multi-fingered grasp generation network (DMFC-GraspNet) with three main contributions to address this challenge. Firstly, a novel neural grasp planner is proposed, which predicts a new grasp representation to enable versatile and dense grasp predictions. Secondly, a scene creation and label mapping method is developed for dense labeling of multi-fingered robotic hands, which allows a dense association of ground truth grasps. Thirdly, we propose to train DMFC-GraspNet end-to-end using using a forward-backward automatic differentiation approach with both a supervised loss and a differentiable collision loss and a generalized Q 1 grasp metric loss. The proposed approach is evaluated using the Shadow Dexterous Hand on Mujoco simulation and ablated by different choices of loss functions. The results demonstrate the effectiveness of the proposed approach in predicting versatile and dense grasps, and in advancing the field of multi-fingered robotic grasping. △ Less

Submitted 16 August, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

Comments: Submitted IROS 2023 workshop "Policy Learning in Geometric Spaces"

arXiv:2307.00306 [pdf, other]

SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation

Authors: Fabian Duffhauss, Sebastian Koch, Hanna Ziesche, Ngo Anh Vien, Gerhard Neumann

Abstract: Detecting objects and estimating their 6D poses is essential for automated systems to interact safely with the environment. Most 6D pose estimators, however, rely on a single camera frame and suffer from occlusions and ambiguities due to object symmetries. We overcome this issue by presenting a novel symmetry-aware multi-view 6D pose estimator called SyMFM6D. Our approach efficiently fuses the RGB… ▽ More Detecting objects and estimating their 6D poses is essential for automated systems to interact safely with the environment. Most 6D pose estimators, however, rely on a single camera frame and suffer from occlusions and ambiguities due to object symmetries. We overcome this issue by presenting a novel symmetry-aware multi-view 6D pose estimator called SyMFM6D. Our approach efficiently fuses the RGB-D frames from multiple perspectives in a deep multi-directional fusion network and predicts predefined keypoints for all objects in the scene simultaneously. Based on the keypoints and an instance semantic segmentation, we efficiently compute the 6D poses by least-squares fitting. To address the ambiguity issues for symmetric objects, we propose a novel training procedure for symmetry-aware keypoint detection including a new objective function. Our SyMFM6D network significantly outperforms the state-of-the-art in both single-view and multi-view 6D pose estimation. We furthermore show the effectiveness of our symmetry-aware training procedure and demonstrate that our approach is robust towards inaccurate camera calibration and dynamic camera setups. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Comments: Accepted at the IEEE Robotics and Automation Letters (RA-L) 2023

arXiv:2306.12729 [pdf, other]

MP3: Movement Primitive-Based (Re-)Planning Policy

Authors: Fabian Otto, Hongyi Zhou, Onur Celik, Ge Li, Rudolf Lioutikov, Gerhard Neumann

Abstract: We introduce a novel deep reinforcement learning (RL) approach called Movement Primitive-based Planning Policy (MP3). By integrating movement primitives (MPs) into the deep RL framework, MP3 enables the generation of smooth trajectories throughout the whole learning process while effectively learning from sparse and non-Markovian rewards. Additionally, MP3 maintains the capability to adapt to chan… ▽ More We introduce a novel deep reinforcement learning (RL) approach called Movement Primitive-based Planning Policy (MP3). By integrating movement primitives (MPs) into the deep RL framework, MP3 enables the generation of smooth trajectories throughout the whole learning process while effectively learning from sparse and non-Markovian rewards. Additionally, MP3 maintains the capability to adapt to changes in the environment during execution. Although many early successes in robot RL have been achieved by combining RL with MPs, these approaches are often limited to learning single stroke-based motions, lacking the ability to adapt to task variations or adjust motions during execution. Building upon our previous work, which introduced an episode-based RL method for the non-linear adaptation of MP parameters to different task variations, this paper extends the approach to incorporating replanning strategies. This allows adaptation of the MP parameters throughout motion execution, addressing the lack of online motion adaptation in stochastic domains requiring feedback. We compared our approach against state-of-the-art deep RL and RL with MPs methods. The results demonstrated improved performance in sophisticated, sparse reward settings and in domains requiring replanning. △ Less

Submitted 2 July, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

Comments: The video demonstration can be accessed at https://intuitive-robots.github.io/mp3_website/. arXiv admin note: text overlap with arXiv:2210.09622

arXiv:2306.12306 [pdf, ps, other]

Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep Learning under Distribution Shift

Authors: Florian Seligmann, Philipp Becker, Michael Volpp, Gerhard Neumann

Abstract: Bayesian deep learning (BDL) is a promising approach to achieve well-calibrated predictions on distribution-shifted data. Nevertheless, there exists no large-scale survey that evaluates recent SOTA methods on diverse, realistic, and challenging benchmark tasks in a systematic manner. To provide a clear picture of the current state of BDL research, we evaluate modern BDL algorithms on real-world da… ▽ More Bayesian deep learning (BDL) is a promising approach to achieve well-calibrated predictions on distribution-shifted data. Nevertheless, there exists no large-scale survey that evaluates recent SOTA methods on diverse, realistic, and challenging benchmark tasks in a systematic manner. To provide a clear picture of the current state of BDL research, we evaluate modern BDL algorithms on real-world datasets from the WILDS collection containing challenging classification and regression tasks, with a focus on generalization capability and calibration under distribution shift. We compare the algorithms on a wide range of large, convolutional and transformer-based neural network architectures. In particular, we investigate a signed version of the expected calibration error that reveals whether the methods are over- or under-confident, providing further insight into the behavior of the methods. Further, we provide the first systematic evaluation of BDL for fine-tuning large pre-trained models, where training from scratch is prohibitively expensive. Finally, given the recent success of Deep Ensembles, we extend popular single-mode posterior approximations to multiple modes by the use of ensembles. While we find that ensembling single-mode approximations generally improves the generalization capability and calibration of the models by a significant margin, we also identify a failure mode of ensembles when finetuning large transformer-based language models. In this setting, variational inference based approaches such as last-layer Bayes By Backprop outperform other methods in terms of accuracy by a large margin, while modern approximate inference algorithms such as SWAG achieve the best calibration. △ Less

Submitted 24 October, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

Comments: Code at https://github.com/Feuermagier/Beyond_Deep_Ensembles

arXiv:2304.05171 [pdf, other]

doi 10.1109/ICRA48891.2023.10160543

Curriculum-Based Imitation of Versatile Skills

Authors: Maximilian Xiling Li, Onur Celik, Philipp Becker, Denis Blessing, Rudolf Lioutikov, Gerhard Neumann

Abstract: Learning skills by imitation is a promising concept for the intuitive teaching of robots. A common way to learn such skills is to learn a parametric model by maximizing the likelihood given the demonstrations. Yet, human demonstrations are often multi-modal, i.e., the same task is solved in multiple ways which is a major challenge for most imitation learning methods that are based on such a maximu… ▽ More Learning skills by imitation is a promising concept for the intuitive teaching of robots. A common way to learn such skills is to learn a parametric model by maximizing the likelihood given the demonstrations. Yet, human demonstrations are often multi-modal, i.e., the same task is solved in multiple ways which is a major challenge for most imitation learning methods that are based on such a maximum likelihood (ML) objective. The ML objective forces the model to cover all data, it prevents specialization in the context space and can cause mode-averaging in the behavior space, leading to suboptimal or potentially catastrophic behavior. Here, we alleviate those issues by introducing a curriculum using a weight for each data point, allowing the model to specialize on data it can represent while incentivizing it to cover as much data as possible by an entropy bonus. We extend our algorithm to a Mixture of (linear) Experts (MoE) such that the single components can specialize on local context regions, while the MoE covers all data points. We evaluate our approach in complex simulated and real robot control tasks and show it learns from versatile human demonstrations and significantly outperforms current SOTA methods. A reference implementation can be found at https://github.com/intuitive-robots/ml-cur △ Less

Submitted 11 April, 2023; originally announced April 2023.

Journal ref: 2023 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2304.00818 [pdf, other]

Swarm Reinforcement Learning For Adaptive Mesh Refinement

Authors: Niklas Freymuth, Philipp Dahlinger, Tobias Würth, Simon Reisch, Luise Kärger, Gerhard Neumann

Abstract: Adaptive Mesh Refinement (AMR) enhances the Finite Element Method, an important technique for simulating complex problems in engineering, by dynamically refining mesh regions, enabling a favorable trade-off between computational speed and simulation accuracy. Classical methods for AMR depend on heuristics or expensive error estimators, hindering their use for complex simulations. Recent learning-b… ▽ More Adaptive Mesh Refinement (AMR) enhances the Finite Element Method, an important technique for simulating complex problems in engineering, by dynamically refining mesh regions, enabling a favorable trade-off between computational speed and simulation accuracy. Classical methods for AMR depend on heuristics or expensive error estimators, hindering their use for complex simulations. Recent learning-based AMR methods tackle these issues, but so far scale only to simple toy examples. We formulate AMR as a novel Adaptive Swarm Markov Decision Process in which a mesh is modeled as a system of simple collaborating agents that may split into multiple new agents. This framework allows for a spatial reward formulation that simplifies the credit assignment problem, which we combine with Message Passing Networks to propagate information between neighboring mesh elements. We experimentally validate our approach, Adaptive Swarm Mesh Refinement (ASMR), on challenging refinement tasks. Our approach learns reliable and efficient refinement strategies that can robustly generalize to different domains during inference. Additionally, it achieves a speedup of up to $2$ orders of magnitude compared to uniform refinements in more demanding simulations. We outperform learned baselines and heuristics, achieving a refinement quality that is on par with costly error-based oracle AMR strategies. △ Less

Submitted 9 October, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

Comments: Accepted at Neural Information Processing Systems (NeurIPS) 2023. Version 1 of this paper is a preliminary version that was accepted as a workshop paper in the International Conference on Learning Representations (ICLR) 2023 Workshop on Physics for Machine Learning

arXiv:2303.15349 [pdf, other]

Information Maximizing Curriculum: A Curriculum-Based Approach for Imitating Diverse Skills

Authors: Denis Blessing, Onur Celik, Xiaogang Jia, Moritz Reuss, Maximilian Xiling Li, Rudolf Lioutikov, Gerhard Neumann

Abstract: Imitation learning uses data for training policies to solve complex tasks. However, when the training data is collected from human demonstrators, it often leads to multimodal distributions because of the variability in human actions. Most imitation learning methods rely on a maximum likelihood (ML) objective to learn a parameterized policy, but this can result in suboptimal or unsafe behavior due… ▽ More Imitation learning uses data for training policies to solve complex tasks. However, when the training data is collected from human demonstrators, it often leads to multimodal distributions because of the variability in human actions. Most imitation learning methods rely on a maximum likelihood (ML) objective to learn a parameterized policy, but this can result in suboptimal or unsafe behavior due to the mode-averaging property of the ML objective. In this work, we propose Information Maximizing Curriculum, a curriculum-based approach that assigns a weight to each data point and encourages the model to specialize in the data it can represent, effectively mitigating the mode-averaging problem by allowing the model to ignore data from modes it cannot represent. To cover all modes and thus, enable diverse behavior, we extend our approach to a mixture of experts (MoE) policy, where each mixture component selects its own subset of the training data for learning. A novel, maximum entropy-based objective is proposed to achieve full coverage of the dataset, thereby enabling the policy to encompass all modes within the data distribution. We demonstrate the effectiveness of our approach on complex simulated control tasks using diverse human demonstrations, achieving superior performance compared to state-of-the-art methods. △ Less

Submitted 31 October, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

arXiv:2302.11864 [pdf, other]

Grounding Graph Network Simulators using Physical Sensor Observations

Authors: Jonas Linkerhägner, Niklas Freymuth, Paul Maria Scheikl, Franziska Mathis-Ullrich, Gerhard Neumann

Abstract: Physical simulations that accurately model reality are crucial for many engineering disciplines such as mechanical engineering and robotic motion planning. In recent years, learned Graph Network Simulators produced accurate mesh-based simulations while requiring only a fraction of the computational cost of traditional simulators. Yet, the resulting predictors are confined to learning from data gen… ▽ More Physical simulations that accurately model reality are crucial for many engineering disciplines such as mechanical engineering and robotic motion planning. In recent years, learned Graph Network Simulators produced accurate mesh-based simulations while requiring only a fraction of the computational cost of traditional simulators. Yet, the resulting predictors are confined to learning from data generated by existing mesh-based simulators and thus cannot include real world sensory information such as point cloud data. As these predictors have to simulate complex physical systems from only an initial state, they exhibit a high error accumulation for long-term predictions. In this work, we integrate sensory information to ground Graph Network Simulators on real world observations. In particular, we predict the mesh state of deformable objects by utilizing point cloud data. The resulting model allows for accurate predictions over longer time horizons, even under uncertainties in the simulation, such as unknown material properties. Since point clouds are usually not available for every time step, especially in online settings, we employ an imputation-based model. The model can make use of such additional information only when provided, and resorts to a standard Graph Network Simulator, otherwise. We experimentally validate our approach on a suite of prediction tasks for mesh-based interactions between soft and rigid bodies. Our method results in utilization of additional point cloud information to accurately predict stable simulations where existing Graph Network Simulators fail. △ Less

Submitted 7 March, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: Accepted as a poster at the 11th International Conference on Learning Representations (ICLR), 2023

arXiv:2302.09606 [pdf, other]

LapGym -- An Open Source Framework for Reinforcement Learning in Robot-Assisted Laparoscopic Surgery

Authors: Paul Maria Scheikl, Balázs Gyenes, Rayan Younis, Christoph Haas, Gerhard Neumann, Martin Wagner, Franziska Mathis-Ullrich

Abstract: Recent advances in reinforcement learning (RL) have increased the promise of introducing cognitive assistance and automation to robot-assisted laparoscopic surgery (RALS). However, progress in algorithms and methods depends on the availability of standardized learning environments that represent skills relevant to RALS. We present LapGym, a framework for building RL environments for RALS that mode… ▽ More Recent advances in reinforcement learning (RL) have increased the promise of introducing cognitive assistance and automation to robot-assisted laparoscopic surgery (RALS). However, progress in algorithms and methods depends on the availability of standardized learning environments that represent skills relevant to RALS. We present LapGym, a framework for building RL environments for RALS that models the challenges posed by surgical tasks, and sofa_env, a diverse suite of 12 environments. Motivated by surgical training, these environments are organized into 4 tracks: Spatial Reasoning, Deformable Object Manipulation & Grasping, Dissection, and Thread Manipulation. Each environment is highly parametrizable for increasing difficulty, resulting in a high performance ceiling for new algorithms. We use Proximal Policy Optimization (PPO) to establish a baseline for model-free RL algorithms, investigating the effect of several environment parameters on task difficulty. Finally, we show that many environments and parameter configurations reflect well-known, open problems in RL research, allowing researchers to continue exploring these fundamental problems in a surgical context. We aim to provide a challenging, standard environment suite for further development of RL for RALS, ultimately helping to realize the full potential of cognitive surgical robotics. LapGym is publicly accessible through GitHub (https://github.com/ScheiklP/lap_gym). △ Less

Submitted 19 February, 2023; originally announced February 2023.

arXiv:2302.05342 [pdf, other]

Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL

Authors: Philipp Becker, Sebastian Mossburger, Fabian Otto, Gerhard Neumann

Abstract: Learning self-supervised representations using reconstruction or contrastive losses improves performance and sample complexity of image-based and multimodal reinforcement learning (RL). Here, different self-supervised loss functions have distinct advantages and limitations depending on the information density of the underlying sensor modality. Reconstruction provides strong learning signals but is… ▽ More Learning self-supervised representations using reconstruction or contrastive losses improves performance and sample complexity of image-based and multimodal reinforcement learning (RL). Here, different self-supervised loss functions have distinct advantages and limitations depending on the information density of the underlying sensor modality. Reconstruction provides strong learning signals but is susceptible to distractions and spurious information. While contrastive approaches can ignore those, they may fail to capture all relevant details and can lead to representation collapse. For multimodal RL, this suggests that different modalities should be treated differently based on the amount of distractions in the signal. We propose Contrastive Reconstructive Aggregated representation Learning (CoRAL), a unified framework enabling us to choose the most appropriate self-supervised loss for each sensor modality and allowing the representation to better focus on relevant aspects. We evaluate CoRAL's benefits on a wide range of tasks with images containing distractions or occlusions, a new locomotion suite, and a challenging manipulation suite with visually realistic distractions. Our results show that learning a multimodal representation by combining contrastive and reconstruction-based losses can significantly improve performance and solve tasks that are out of reach for more naive representation learning approaches and other recent baselines. △ Less

Submitted 26 June, 2024; v1 submitted 10 February, 2023; originally announced February 2023.

Comments: Published in "Reinforcement Learning Conference (RLC)", August 2024

arXiv:2211.02114 [pdf, ps, other]

$r$-primitive $k$-normal elements in arithmetic progressions over finite fields

Authors: Josimar J. R. Aguirre, Abílio Lemos, Victor G. L. Neumann, Sávio Ribas

Abstract: Let $\mathbb{F}_{q^n}$ be a finite field with $q^n$ elements. For a positive divisor $r$ of $q^n-1$, the element $α\in \mathbb{F}_{q^n}^*$ is called \textit{$r$-primitive} if its multiplicative order is $(q^n-1)/r$. Also, for a non-negative integer $k$, the element $α\in \mathbb{F}_{q^n}$ is \textit{$k$-normal} over $\mathbb{F}_q$ if… ▽ More Let $\mathbb{F}_{q^n}$ be a finite field with $q^n$ elements. For a positive divisor $r$ of $q^n-1$, the element $α\in \mathbb{F}_{q^n}^*$ is called \textit{$r$-primitive} if its multiplicative order is $(q^n-1)/r$. Also, for a non-negative integer $k$, the element $α\in \mathbb{F}_{q^n}$ is \textit{$k$-normal} over $\mathbb{F}_q$ if $\gcd(αx^{n-1}+ α^q x^{n-2} + \ldots + α^{q^{n-2}}x + α^{q^{n-1}} , x^n-1)$ in $\mathbb{F}_{q^n}[x]$ has degree $k$. In this paper we discuss the existence of elements in arithmetic progressions $\{α, α+β, α+2β, \ldotsα+(m-1)β\} \subset \mathbb{F}_{q^n}$ with $α+(i-1)β$ being $r_i$-primitive and at least one of the elements in the arithmetic progression being $k$-normal over $\mathbb{F}_q$. We obtain asymptotic results for general $k, r_1, \dots, r_m$ and concrete results when $k = r_i = 2$ for $i \in \{1, \dots, m\}$. △ Less

Submitted 31 July, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

Comments: To appear in Communications in Algebra. arXiv admin note: substantial text overlap with arXiv:2210.11504

MSC Class: 12E20; 11T23

arXiv:2210.11504 [pdf, ps, other]

Pairs of $r$-primitive and $k$-normal elements in finite fields

Authors: Josimar J. R. Aguirre, Victor G. L. Neumann

Abstract: Let $\mathbb{F}_{q^n}$ be a finite field with $q^n$ elements and $r$ be a positive divisor of $q^n-1$. An element $α\in \mathbb{F}_{q^n}^*$ is called $r$-primitive if its multiplicative order is $(q^n-1)/r$. Also, $α\in \mathbb{F}_{q^n}$ is $k$-normal over $\mathbb{F}_q$ if the greatest common divisor of the polynomials $g_α(x) = αx^{n-1}+ α^q x^{n-2} + \ldots + α^{q^{n-2}}x + α^{q^{n-1}}$ and… ▽ More Let $\mathbb{F}_{q^n}$ be a finite field with $q^n$ elements and $r$ be a positive divisor of $q^n-1$. An element $α\in \mathbb{F}_{q^n}^*$ is called $r$-primitive if its multiplicative order is $(q^n-1)/r$. Also, $α\in \mathbb{F}_{q^n}$ is $k$-normal over $\mathbb{F}_q$ if the greatest common divisor of the polynomials $g_α(x) = αx^{n-1}+ α^q x^{n-2} + \ldots + α^{q^{n-2}}x + α^{q^{n-1}}$ and $x^n-1$ in $\mathbb{F}_{q^n}[x]$ has degree $k$. These concepts generalize the ideas of primitive and normal elements, respectively. In this paper, we consider non-negative integers $m_1,m_2,k_1,k_2$, positive integers $r_1,r_2$ and rational functions $F(x)=F_1(x)/F_2(x) \in \mathbb{F}_{q^n}(x)$ with $°(F_i) \leq m_i$ for $i\in\{ 1,2\}$ satisfying certain conditions and we present sufficient conditions for the existence of $r_1$-primitive $k_1$-normal elements $α\in \mathbb{F}_{q^n}$ over $\mathbb{F}_q$, such that $F(α)$ is an $r_2$-primitive $k_2$-normal element over $\mathbb{F}_q$. Finally as an example we study the case where $r_1=2$, $r_2=3$, $k_1=2$, $k_2=1$, $m_1=2$ and $m_2=1$, with $n \ge 7$. △ Less

Submitted 20 October, 2022; originally announced October 2022.

MSC Class: 12E20; 11T23

arXiv:2210.09622 [pdf, other]

Deep Black-Box Reinforcement Learning with Movement Primitives

Authors: Fabian Otto, Onur Celik, Hongyi Zhou, Hanna Ziesche, Ngo Anh Vien, Gerhard Neumann

Abstract: \Episode-based reinforcement learning (ERL) algorithms treat reinforcement learning (RL) as a black-box optimization problem where we learn to select a parameter vector of a controller, often represented as a movement primitive, for a given task descriptor called a context. ERL offers several distinct benefits in comparison to step-based RL. It generates smooth control trajectories, can handle non… ▽ More \Episode-based reinforcement learning (ERL) algorithms treat reinforcement learning (RL) as a black-box optimization problem where we learn to select a parameter vector of a controller, often represented as a movement primitive, for a given task descriptor called a context. ERL offers several distinct benefits in comparison to step-based RL. It generates smooth control trajectories, can handle non-Markovian reward definitions, and the resulting exploration in parameter space is well suited for solving sparse reward settings. Yet, the high dimensionality of the movement primitive parameters has so far hampered the effective use of deep RL methods. In this paper, we present a new algorithm for deep ERL. It is based on differentiable trust region layers, a successful on-policy deep RL algorithm. These layers allow us to specify trust regions for the policy update that are solved exactly for each state using convex optimization, which enables policies learning with the high precision required for the ERL. We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks. In doing so, we investigate different reward formulations - dense, sparse, and non-Markovian. While step-based algorithms perform well only on dense rewards, ERL performs favorably on sparse and non-Markovian rewards. Moreover, our results show that the sparse and the non-Markovian rewards are also often better suited to define the desired behavior, allowing us to obtain considerably higher quality policies compared to step-based RL. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Comments: Accepted at CoRL 2022

arXiv:2210.09256 [pdf, other]

On Uncertainty in Deep State Space Models for Model-Based Reinforcement Learning

Authors: Philipp Becker, Gerhard Neumann

Abstract: Improved state space models, such as Recurrent State Space Models (RSSMs), are a key factor behind recent advances in model-based reinforcement learning (RL). Yet, despite their empirical success, many of the underlying design choices are not well understood. We show that RSSMs use a suboptimal inference scheme and that models trained using this inference overestimate the aleatoric uncertainty of… ▽ More Improved state space models, such as Recurrent State Space Models (RSSMs), are a key factor behind recent advances in model-based reinforcement learning (RL). Yet, despite their empirical success, many of the underlying design choices are not well understood. We show that RSSMs use a suboptimal inference scheme and that models trained using this inference overestimate the aleatoric uncertainty of the ground truth system. We find this overestimation implicitly regularizes RSSMs and allows them to succeed in model-based RL. We postulate that this implicit regularization fulfills the same functionality as explicitly modeling epistemic uncertainty, which is crucial for many other model-based RL approaches. Yet, overestimating aleatoric uncertainty can also impair performance in cases where accurately estimating it matters, e.g., when we have to deal with occlusions, missing observations, or fusing sensor modalities at different frequencies. Moreover, the implicit regularization is a side-effect of the inference scheme and not the result of a rigorous, principled formulation, which renders analyzing or improving RSSMs difficult. Thus, we propose an alternative approach building on well-understood components for modeling aleatoric and epistemic uncertainty, dubbed Variational Recurrent Kalman Network (VRKN). This approach uses Kalman updates for exact smoothing inference in a latent space and Monte Carlo Dropout to model epistemic uncertainty. Due to the Kalman updates, the VRKN can naturally handle missing observations or sensor fusion problems with varying numbers of observations per time step. Our experiments show that using the VRKN instead of the RSSM improves performance in tasks where appropriately capturing aleatoric uncertainty is crucial while matching it in the deterministic standard benchmarks. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: Published in TMLR, October 2022

arXiv:2210.08121 [pdf, other]

Inferring Versatile Behavior from Demonstrations by Matching Geometric Descriptors

Authors: Niklas Freymuth, Nicolas Schreiber, Philipp Becker, Aleksandar Taranovic, Gerhard Neumann

Abstract: Humans intuitively solve tasks in versatile ways, varying their behavior in terms of trajectory-based planning and for individual steps. Thus, they can easily generalize and adapt to new and changing environments. Current Imitation Learning algorithms often only consider unimodal expert demonstrations and act in a state-action-based setting, making it difficult for them to imitate human behavior i… ▽ More Humans intuitively solve tasks in versatile ways, varying their behavior in terms of trajectory-based planning and for individual steps. Thus, they can easily generalize and adapt to new and changing environments. Current Imitation Learning algorithms often only consider unimodal expert demonstrations and act in a state-action-based setting, making it difficult for them to imitate human behavior in case of versatile demonstrations. Instead, we combine a mixture of movement primitives with a distribution matching objective to learn versatile behaviors that match the expert's behavior and versatility. To facilitate generalization to novel task configurations, we do not directly match the agent's and expert's trajectory distributions but rather work with concise geometric descriptors which generalize well to unseen task configurations. We empirically validate our method on various robot tasks using versatile human demonstrations and compare to imitation learning algorithms in a state-action setting as well as a trajectory-based setting. We find that the geometric descriptors greatly help in generalizing to new task configurations and that combining them with our distribution-matching objective is crucial for representing and reproducing versatile behavior. △ Less

Submitted 9 November, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

Comments: Accepted as a poster at the 6th Conference on Robot Learning (CoRL), 2022

arXiv:2210.01531 [pdf, other]

ProDMPs: A Unified Perspective on Dynamic and Probabilistic Movement Primitives

Authors: Ge Li, Zeqi Jin, Michael Volpp, Fabian Otto, Rudolf Lioutikov, Gerhard Neumann

Abstract: Movement Primitives (MPs) are a well-known concept to represent and generate modular trajectories. MPs can be broadly categorized into two types: (a) dynamics-based approaches that generate smooth trajectories from any initial state, e. g., Dynamic Movement Primitives (DMPs), and (b) probabilistic approaches that capture higher-order statistics of the motion, e. g., Probabilistic Movement Primitiv… ▽ More Movement Primitives (MPs) are a well-known concept to represent and generate modular trajectories. MPs can be broadly categorized into two types: (a) dynamics-based approaches that generate smooth trajectories from any initial state, e. g., Dynamic Movement Primitives (DMPs), and (b) probabilistic approaches that capture higher-order statistics of the motion, e. g., Probabilistic Movement Primitives (ProMPs). To date, however, there is no method that unifies both, i. e. that can generate smooth trajectories from an arbitrary initial state while capturing higher-order statistics. In this paper, we introduce a unified perspective of both approaches by solving the ODE underlying the DMPs. We convert expensive online numerical integration of DMPs into basis functions that can be computed offline. These basis functions can be used to represent trajectories or trajectory distributions similar to ProMPs while maintaining all the properties of dynamical systems. Since we inherit the properties of both methodologies, we call our proposed model Probabilistic Dynamic Movement Primitives (ProDMPs). Additionally, we embed ProDMPs in deep neural network architecture and propose a new cost function for efficient end-to-end learning of higher-order trajectory statistics. To this end, we leverage Bayesian Aggregation for non-linear iterative conditioning on sensory inputs. Our proposed model achieves smooth trajectory generation, goal-attractor convergence, correlation analysis, non-linear conditioning, and online re-planing in one framework. △ Less

Submitted 4 October, 2022; originally announced October 2022.

Comments: 12 pages, 13 figures

arXiv:2209.11533 [pdf, other]

A Unified Perspective on Natural Gradient Variational Inference with Gaussian Mixture Models

Authors: Oleg Arenz, Philipp Dahlinger, Zihan Ye, Michael Volpp, Gerhard Neumann

Abstract: Variational inference with Gaussian mixture models (GMMs) enables learning of highly tractable yet multi-modal approximations of intractable target distributions with up to a few hundred dimensions. The two currently most effective methods for GMM-based variational inference, VIPS and iBayes-GMM, both employ independent natural gradient updates for the individual components and their weights. We s… ▽ More Variational inference with Gaussian mixture models (GMMs) enables learning of highly tractable yet multi-modal approximations of intractable target distributions with up to a few hundred dimensions. The two currently most effective methods for GMM-based variational inference, VIPS and iBayes-GMM, both employ independent natural gradient updates for the individual components and their weights. We show for the first time, that their derived updates are equivalent, although their practical implementations and theoretical guarantees differ. We identify several design choices that distinguish both approaches, namely with respect to sample selection, natural gradient estimation, stepsize adaptation, and whether trust regions are enforced or the number of components adapted. We argue that for both approaches, the quality of the learned approximations can heavily suffer from the respective design choices: By updating the individual components using samples from the mixture model, iBayes-GMM often fails to produce meaningful updates to low-weight components, and by using a zero-order method for estimating the natural gradient, VIPS scales badly to higher-dimensional problems. Furthermore, we show that information-geometric trust-regions (used by VIPS) are effective even when using first-order natural gradient estimates, and often outperform the improved Bayesian learning rule (iBLR) update used by iBayes-GMM. We systematically evaluate the effects of design choices and show that a hybrid approach significantly outperforms both prior works. Along with this work, we publish our highly modular and efficient implementation for natural gradient variational inference with Gaussian mixture models, which supports 432 different combinations of design choices, facilitates the reproduction of all our experiments, and may prove valuable for the practitioner. △ Less

Submitted 17 July, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

Comments: This version corresponds to the camera ready version published at Transactions of Machine Learning Research (TMLR). https://openreview.net/forum?id=tLBjsX4tjs

Journal ref: Transactions on Machine Learning Research (2023) ISSN: 2835-8856

arXiv:2209.11277 [pdf, other]

FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image Fusion

Authors: Fabian Duffhauss, Ngo Anh Vien, Hanna Ziesche, Gerhard Neumann

Abstract: Sensor fusion can significantly improve the performance of many computer vision tasks. However, traditional fusion approaches are either not data-driven and cannot exploit prior knowledge nor find regularities in a given dataset or they are restricted to a single application. We overcome this shortcoming by presenting a novel deep hierarchical variational autoencoder called FusionVAE that can serv… ▽ More Sensor fusion can significantly improve the performance of many computer vision tasks. However, traditional fusion approaches are either not data-driven and cannot exploit prior knowledge nor find regularities in a given dataset or they are restricted to a single application. We overcome this shortcoming by presenting a novel deep hierarchical variational autoencoder called FusionVAE that can serve as a basis for many fusion tasks. Our approach is able to generate diverse image samples that are conditioned on multiple noisy, occluded, or only partially visible input images. We derive and optimize a variational lower bound for the conditional log-likelihood of FusionVAE. In order to assess the fusion capabilities of our model thoroughly, we created three novel datasets for image fusion based on popular computer vision datasets. In our experiments, we show that FusionVAE learns a representation of aggregated information that is relevant to fusion tasks. The results demonstrate that our approach outperforms traditional methods significantly. Furthermore, we present the advantages and disadvantages of different design choices. △ Less

Submitted 22 September, 2022; originally announced September 2022.

Comments: Accepted at ECCV 2022

arXiv:2208.01172 [pdf, other]

MV6D: Multi-View 6D Pose Estimation on RGB-D Frames Using a Deep Point-wise Voting Network

Authors: Fabian Duffhauss, Tobias Demmler, Gerhard Neumann

Abstract: Estimating 6D poses of objects is an essential computer vision task. However, most conventional approaches rely on camera data from a single perspective and therefore suffer from occlusions. We overcome this issue with our novel multi-view 6D pose estimation method called MV6D which accurately predicts the 6D poses of all objects in a cluttered scene based on RGB-D images from multiple perspective… ▽ More Estimating 6D poses of objects is an essential computer vision task. However, most conventional approaches rely on camera data from a single perspective and therefore suffer from occlusions. We overcome this issue with our novel multi-view 6D pose estimation method called MV6D which accurately predicts the 6D poses of all objects in a cluttered scene based on RGB-D images from multiple perspectives. We base our approach on the PVN3D network that uses a single RGB-D image to predict keypoints of the target objects. We extend this approach by using a combined point cloud from multiple views and fusing the images from each view with a DenseFusion layer. In contrast to current multi-view pose detection networks such as CosyPose, our MV6D can learn the fusion of multiple perspectives in an end-to-end manner and does not require multiple prediction stages or subsequent fine tuning of the prediction. Furthermore, we present three novel photorealistic datasets of cluttered scenes with heavy occlusions. All of them contain RGB-D images from multiple perspectives and the ground truth for instance semantic segmentation and 6D pose estimation. MV6D significantly outperforms the state-of-the-art in multi-view 6D pose estimation even in cases where the camera poses are known inaccurately. Furthermore, we show that our approach is robust towards dynamic camera setups and that its accuracy increases incrementally with an increasing number of perspectives. △ Less

Submitted 1 August, 2022; originally announced August 2022.

Comments: Accepted at IROS 2022

arXiv:2208.00478 [pdf, other]

Robot Policy Learning from Demonstration Using Advantage Weighting and Early Termination

Authors: Abdalkarim Mohtasib, Gerhard Neumann, Heriberto Cuayahuitl

Abstract: Learning robotic tasks in the real world is still highly challenging and effective practical solutions remain to be found. Traditional methods used in this area are imitation learning and reinforcement learning, but they both have limitations when applied to real robots. Combining reinforcement learning with pre-collected demonstrations is a promising approach that can help in learning control pol… ▽ More Learning robotic tasks in the real world is still highly challenging and effective practical solutions remain to be found. Traditional methods used in this area are imitation learning and reinforcement learning, but they both have limitations when applied to real robots. Combining reinforcement learning with pre-collected demonstrations is a promising approach that can help in learning control policies to solve robotic tasks. In this paper, we propose an algorithm that uses novel techniques to leverage offline expert data using offline and online training to obtain faster convergence and improved performance. The proposed algorithm (AWET) weights the critic losses with a novel agent advantage weight to improve over the expert data. In addition, AWET makes use of an automatic early termination technique to stop and discard policy rollouts that are not similar to expert trajectories -- to prevent drifting far from the expert data. In an ablation study, AWET showed improved and promising performance when compared to state-of-the-art baselines on four standard robotic tasks. △ Less

Submitted 31 July, 2022; originally announced August 2022.

arXiv:2206.14697 [pdf, other]

Hidden Parameter Recurrent State Space Models For Changing Dynamics Scenarios

Authors: Vaisakh Shaj, Dieter Buchler, Rohit Sonker, Philipp Becker, Gerhard Neumann

Abstract: Recurrent State-space models (RSSMs) are highly expressive models for learning patterns in time series data and system identification. However, these models assume that the dynamics are fixed and unchanging, which is rarely the case in real-world scenarios. Many control applications often exhibit tasks with similar but not identical dynamics which can be modeled as a latent variable. We introduce… ▽ More Recurrent State-space models (RSSMs) are highly expressive models for learning patterns in time series data and system identification. However, these models assume that the dynamics are fixed and unchanging, which is rarely the case in real-world scenarios. Many control applications often exhibit tasks with similar but not identical dynamics which can be modeled as a latent variable. We introduce the Hidden Parameter Recurrent State Space Models (HiP-RSSMs), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors. We present a simple and effective way of learning and performing inference over this Gaussian graphical model that avoids approximations like variational inference. We show that HiP-RSSMs outperforms RSSMs and competing multi-task models on several challenging robotic benchmarks both on real-world systems and simulations. △ Less

Submitted 12 October, 2023; v1 submitted 29 June, 2022; originally announced June 2022.

Comments: Published at the International Conference on Learning Representations, ICLR 2022

arXiv:2206.07162 [pdf, other]

Category-Agnostic 6D Pose Estimation with Conditional Neural Processes

Authors: Yumeng Li, Ning Gao, Hanna Ziesche, Gerhard Neumann

Abstract: We present a novel meta-learning approach for 6D pose estimation on unknown objects. In contrast to ``instance-level" and ``category-level" pose estimation methods, our algorithm learns object representation in a category-agnostic way, which endows it with strong generalization capabilities across object categories. Specifically, we employ a neural process-based meta-learning approach to train an… ▽ More We present a novel meta-learning approach for 6D pose estimation on unknown objects. In contrast to ``instance-level" and ``category-level" pose estimation methods, our algorithm learns object representation in a category-agnostic way, which endows it with strong generalization capabilities across object categories. Specifically, we employ a neural process-based meta-learning approach to train an encoder to capture texture and geometry of an object in a latent representation, based on very few RGB-D images and ground-truth keypoints. The latent representation is then used by a simultaneously meta-trained decoder to predict the 6D pose of the object in new images. Furthermore, we propose a novel geometry-aware decoder for the keypoint prediction using a Graph Neural Network (GNN), which explicitly takes geometric constraints specific to each object into consideration. To evaluate our algorithm, extensive experiments are conducted on the \linemod dataset, and on our new fully-annotated synthetic datasets generated from Multiple Categories in Multiple Scenes (MCMS). Experimental results demonstrate that our model performs well on unseen objects with very different shapes and appearances. Remarkably, our model also shows robust performance on occluded scenes although trained fully on data without occlusion. To our knowledge, this is the first work exploring \textbf{cross-category level} 6D pose estimation. △ Less

Submitted 19 October, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: Accepted at CVPR2022 workshop: Women in Computer Vision (WiCV)

Journal ref: CVPR2022 workshop: Women in Computer Vision (WiCV)

arXiv:2206.06090 [pdf, other]

Regret-Aware Black-Box Optimization with Natural Gradients, Trust-Regions and Entropy Control

Authors: Maximilian Hüttenrauch, Gerhard Neumann

Abstract: Most successful stochastic black-box optimizers, such as CMA-ES, use rankings of the individual samples to obtain a new search distribution. Yet, the use of rankings also introduces several issues such as the underlying optimization objective is often unclear, i.e., we do not optimize the expected fitness. Further, while these algorithms typically produce a high-quality mean estimate of the search… ▽ More Most successful stochastic black-box optimizers, such as CMA-ES, use rankings of the individual samples to obtain a new search distribution. Yet, the use of rankings also introduces several issues such as the underlying optimization objective is often unclear, i.e., we do not optimize the expected fitness. Further, while these algorithms typically produce a high-quality mean estimate of the search distribution, the produced samples can have poor quality as these algorithms are ignorant of the regret. Lastly, noisy fitness function evaluations may result in solutions that are highly sub-optimal on expectation. In contrast, stochastic optimizers that are motivated by policy gradients, such as the Model-based Relative Entropy Stochastic Search (MORE) algorithm, directly optimize the expected fitness function without the use of rankings. MORE can be derived by applying natural policy gradients and compatible function approximation, and is using information theoretic constraints to ensure the stability of the policy update. While MORE does not suffer from the listed limitations, it often cannot achieve state of the art performance in comparison to ranking based methods. We improve MORE by decoupling the update of the mean and covariance of the search distribution allowing for more aggressive updates on the mean while keeping the update on the covariance conservative, an improved entropy scheduling technique based on an evolution path which results in faster convergence and a simplified and more effective model learning approach in comparison to the original paper. We compare our algorithm to state of the art black-box optimization algorithms on standard optimization tasks as well as on episodic RL tasks in robotics where it is also crucial to have small regret. We obtain competitive results on benchmark functions and clearly outperform ranking-based methods in terms of regret on the RL tasks. △ Less

Submitted 24 May, 2022; originally announced June 2022.

Comments: 26 pages, 15 figures

arXiv:2206.02852 [pdf, other]

CompartOS: CHERI Compartmentalization for Embedded Systems

Authors: Hesham Almatary, Michael Dodson, Jessica Clarke, Peter Rugg, Ivan Gomes, Michal Podhradsky, Peter G. Neumann, Simon W. Moore, Robert N. M. Watson

Abstract: Existing high-end embedded systems face frequent security attacks. Software compartmentalization is one technique to limit the attacks' effects to the compromised compartment and not the entire system. Unfortunately, the existing state-of-the-art embedded hardware-software solutions do not work well to enforce software compartmentalization for high-end embedded systems. MPUs are not fine-grained a… ▽ More Existing high-end embedded systems face frequent security attacks. Software compartmentalization is one technique to limit the attacks' effects to the compromised compartment and not the entire system. Unfortunately, the existing state-of-the-art embedded hardware-software solutions do not work well to enforce software compartmentalization for high-end embedded systems. MPUs are not fine-grained and suffer from significant scalability limitations as they can only protect a small and fixed number of memory regions. On the other hand, MMUs suffer from non-determinism and coarse-grained protection. This paper introduces CompartOS as a lightweight linkage-based compartmentalization model for high-end, complex, mainstream embedded systems. CompartOS builds on CHERI, a capability-based hardware architecture, to meet scalability, availability, compatibility, and fine-grained security goals. Microbenchmarks show that CompartOS' protection-domain crossing is 95% faster than MPU-based IPC. We applied the CompartOS model, with low effort, to complex existing systems, including TCP servers and a safety-critical automotive demo. CompartOS not only catches 10 out of 13 FreeRTOS-TCP published vulnerabilities that MPU-based protection (e.g., uVisor) cannot catch but can also recover from them. Further, our TCP throughput evaluations show that our CompartOS prototype is 52% faster than relevant MPU-based compartmentalization models (e.g., ACES), with a 15% overhead compared to an unprotected system. This comes at an FPGA's LUTs overhead of 10.4% to support CHERI for an unprotected baseline RISC-V processor, compared to 7.6% to support MPU, while CHERI only incurs 1.3% of the registers area overhead compared to 2% for MPU. △ Less

Submitted 11 June, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

arXiv:2205.13804 [pdf, other]

End-to-End Learning of Hybrid Inverse Dynamics Models for Precise and Compliant Impedance Control

Authors: Moritz Reuss, Niels van Duijkeren, Robert Krug, Philipp Becker, Vaisakh Shaj, Gerhard Neumann

Abstract: It is well-known that inverse dynamics models can improve tracking performance in robot control. These models need to precisely capture the robot dynamics, which consist of well-understood components, e.g., rigid body dynamics, and effects that remain challenging to capture, e.g., stick-slip friction and mechanical flexibilities. Such effects exhibit hysteresis and partial observability, rendering… ▽ More It is well-known that inverse dynamics models can improve tracking performance in robot control. These models need to precisely capture the robot dynamics, which consist of well-understood components, e.g., rigid body dynamics, and effects that remain challenging to capture, e.g., stick-slip friction and mechanical flexibilities. Such effects exhibit hysteresis and partial observability, rendering them, particularly challenging to model. Hence, hybrid models, which combine a physical prior with data-driven approaches are especially well-suited in this setting. We present a novel hybrid model formulation that enables us to identify fully physically consistent inertial parameters of a rigid body dynamics model which is paired with a recurrent neural network architecture, allowing us to capture unmodeled partially observable effects using the network memory. We compare our approach against state-of-the-art inverse dynamics models on a 7 degree of freedom manipulator. Using data sets obtained through an optimal experiment design approach, we study the accuracy of offline torque prediction and generalization capabilities of joint learning methods. In control experiments on the real system, we evaluate the model as a feed-forward term for impedance control and show the feedback gains can be drastically reduced to achieve a given tracking accuracy. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: Accepted for publication at Robotics: Science and System XVIII (RSS), year 2022. Paper length is 13 pages (i.e. 9 pages of technical content, 1 page of the Bibliography/References and 3 pages of Appendix)

arXiv:2205.11110 [pdf, other]

Meta-Learning Regrasping Strategies for Physical-Agnostic Objects

Authors: Ning Gao, Jingyu Zhang, Ruijie Chen, Ngo Anh Vien, Hanna Ziesche, Gerhard Neumann

Abstract: Grasping inhomogeneous objects in real-world applications remains a challenging task due to the unknown physical properties such as mass distribution and coefficient of friction. In this study, we propose a meta-learning algorithm called ConDex, which incorporates Conditional Neural Processes (CNP) with DexNet-2.0 to autonomously discern the underlying physical properties of objects using depth im… ▽ More Grasping inhomogeneous objects in real-world applications remains a challenging task due to the unknown physical properties such as mass distribution and coefficient of friction. In this study, we propose a meta-learning algorithm called ConDex, which incorporates Conditional Neural Processes (CNP) with DexNet-2.0 to autonomously discern the underlying physical properties of objects using depth images. ConDex efficiently acquires physical embeddings from limited trials, enabling precise grasping point estimation. Furthermore, ConDex is capable of updating the predicted grasping quality iteratively from new trials in an online fashion. To the best of our knowledge, we are the first who generate two object datasets focusing on inhomogeneous physical properties with varying mass distributions and friction coefficients. Extensive evaluations in simulation demonstrate ConDex's superior performance over DexNet-2.0 and existing meta-learning-based grasping pipelines. Furthermore, ConDex shows robust generalization to previously unseen real-world objects despite training solely in the simulation. The synthetic and real-world datasets will be published as well. △ Less

Submitted 14 September, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: Accepted as spotlight in ICRA 2022 Workshop: Scaling Robot Learning

Showing 1–50 of 115 results for author: Neumann, G