-
Ambiguous Annotations: When is a Pedestrian not a Pedestrian?
Authors:
Luisa Schwirten,
Jannes Scholz,
Daniel Kondermann,
Janis Keuper
Abstract:
Datasets labelled by human annotators are widely used in the training and testing of machine learning models. In recent years, researchers are increasingly paying attention to label quality. However, it is not always possible to objectively determine whether an assigned label is correct or not. The present work investigates this ambiguity in the annotation of autonomous driving datasets as an impo…
▽ More
Datasets labelled by human annotators are widely used in the training and testing of machine learning models. In recent years, researchers are increasingly paying attention to label quality. However, it is not always possible to objectively determine whether an assigned label is correct or not. The present work investigates this ambiguity in the annotation of autonomous driving datasets as an important dimension of data quality. Our experiments show that excluding highly ambiguous data from the training improves model performance of a state-of-the-art pedestrian detector in terms of LAMR, precision and F1 score, thereby saving training time and annotation costs. Furthermore, we demonstrate that, in order to safely remove ambiguous instances and ensure the retained representativeness of the training data, an understanding of the properties of the dataset and class under investigation is crucial.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks
Authors:
Ben Eisner,
Yi Yang,
Todor Davchev,
Mel Vecerik,
Jonathan Scholz,
David Held
Abstract:
Many robot manipulation tasks can be framed as geometric reasoning tasks, where an agent must be able to precisely manipulate an object into a position that satisfies the task from a set of initial conditions. Often, task success is defined based on the relationship between two objects - for instance, hanging a mug on a rack. In such cases, the solution should be equivariant to the initial positio…
▽ More
Many robot manipulation tasks can be framed as geometric reasoning tasks, where an agent must be able to precisely manipulate an object into a position that satisfies the task from a set of initial conditions. Often, task success is defined based on the relationship between two objects - for instance, hanging a mug on a rack. In such cases, the solution should be equivariant to the initial position of the objects as well as the agent, and invariant to the pose of the camera. This poses a challenge for learning systems which attempt to solve this task by learning directly from high-dimensional demonstrations: the agent must learn to be both equivariant as well as precise, which can be challenging without any inductive biases about the problem. In this work, we propose a method for precise relative pose prediction which is provably SE(3)-equivariant, can be learned from only a few demonstrations, and can generalize across variations in a class of objects. We accomplish this by factoring the problem into learning an SE(3) invariant task-specific representation of the scene and then interpreting this representation with novel geometric reasoning layers which are provably SE(3) equivariant. We demonstrate that our method can yield substantially more precise placement predictions in simulated placement tasks than previous methods trained with the same amount of data, and can accurately represent relative placement relationships data collected from real-world demonstrations. Supplementary information and videos can be found at https://sites.google.com/view/reldist-iclr-2023.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Sim2Real for Environmental Neural Processes
Authors:
Jonas Scholz,
Tom R. Andersson,
Anna Vaughan,
James Requeima,
Richard E. Turner
Abstract:
Machine learning (ML)-based weather models have recently undergone rapid improvements. These models are typically trained on gridded reanalysis data from numerical data assimilation systems. However, reanalysis data comes with limitations, such as assumptions about physical laws and low spatiotemporal resolution. The gap between reanalysis and reality has sparked growing interest in training ML mo…
▽ More
Machine learning (ML)-based weather models have recently undergone rapid improvements. These models are typically trained on gridded reanalysis data from numerical data assimilation systems. However, reanalysis data comes with limitations, such as assumptions about physical laws and low spatiotemporal resolution. The gap between reanalysis and reality has sparked growing interest in training ML models directly on observations such as weather stations. Modelling scattered and sparse environmental observations requires scalable and flexible ML architectures, one of which is the convolutional conditional neural process (ConvCNP). ConvCNPs can learn to condition on both gridded and off-the-grid context data to make uncertainty-aware predictions at target locations. However, the sparsity of real observations presents a challenge for data-hungry deep learning models like the ConvCNP. One potential solution is 'Sim2Real': pre-training on reanalysis and fine-tuning on observational data. We analyse Sim2Real with a ConvCNP trained to interpolate surface air temperature over Germany, using varying numbers of weather stations for fine-tuning. On held-out weather stations, Sim2Real training substantially outperforms the same model architecture trained only with reanalysis data or only with station data, showing that reanalysis data can serve as a stepping stone for learning from real observations. Sim2Real could thus enable more accurate models for weather prediction and climate monitoring.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation
Authors:
Mel Vecerik,
Carl Doersch,
Yi Yang,
Todor Davchev,
Yusuf Aytar,
Guangyao Zhou,
Raia Hadsell,
Lourdes Agapito,
Jon Scholz
Abstract:
For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly. Current approaches lack either the generality to onboard new tasks without task-specific engineering, or else lack the data-efficiency to do so in an amount of time that enables practical use. In this work we explore dense tracking as a representational vehicle to allow faster a…
▽ More
For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly. Current approaches lack either the generality to onboard new tasks without task-specific engineering, or else lack the data-efficiency to do so in an amount of time that enables practical use. In this work we explore dense tracking as a representational vehicle to allow faster and more general learning from demonstration. Our approach utilizes Track-Any-Point (TAP) models to isolate the relevant motion in a demonstration, and parameterize a low-level controller to reproduce this motion across changes in the scene configuration. We show this results in robust robot policies that can solve complex object-arrangement tasks such as shape-matching, stacking, and even full path-following tasks such as applying glue and sticking objects together, all from demonstrations that can be collected in minutes.
△ Less
Submitted 31 August, 2023; v1 submitted 30 August, 2023;
originally announced August 2023.
-
Prediction of Tourism Flow with Sparse Geolocation Data
Authors:
Julian Lemmel,
Zahra Babaiee,
Marvin Kleinlehner,
Ivan Majic,
Philipp Neubauer,
Johannes Scholz,
Radu Grosu,
Sophie A. Neubauer
Abstract:
Modern tourism in the 21st century is facing numerous challenges. Among these the rapidly growing number of tourists visiting space-limited regions like historical cities, museums and bottlenecks such as bridges is one of the biggest. In this context, a proper and accurate prediction of tourism volume and tourism flow within a certain area is important and critical for visitor management tasks suc…
▽ More
Modern tourism in the 21st century is facing numerous challenges. Among these the rapidly growing number of tourists visiting space-limited regions like historical cities, museums and bottlenecks such as bridges is one of the biggest. In this context, a proper and accurate prediction of tourism volume and tourism flow within a certain area is important and critical for visitor management tasks such as sustainable treatment of the environment and prevention of overcrowding. Static flow control methods like conventional low-level controllers or limiting access to overcrowded venues could not solve the problem yet. In this paper, we empirically evaluate the performance of state-of-the-art deep-learning methods such as RNNs, GNNs, and Transformers as well as the classic statistical ARIMA method. Granular limited data supplied by a tourism region is extended by exogenous data such as geolocation trajectories of individual tourists, weather and holidays. In the field of visitor flow prediction with sparse data, we are thereby capable of increasing the accuracy of our predictions, incorporating modern input feature handling as well as mapping geolocation data on top of discrete POI data.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
Authors:
Konstantinos Bousmalis,
Giulia Vezzani,
Dushyant Rao,
Coline Devin,
Alex X. Lee,
Maria Bauza,
Todor Davchev,
Yuxiang Zhou,
Agrim Gupta,
Akhil Raju,
Antoine Laurens,
Claudio Fantacci,
Valentin Dalibard,
Martina Zambelli,
Murilo Martins,
Rugile Pevceviciute,
Michiel Blokzijl,
Misha Denil,
Nathan Batchelor,
Thomas Lampe,
Emilio Parisotto,
Konrad Żołna,
Scott Reed,
Sergio Gómez Colmenarejo,
Jon Scholz
, et al. (14 additional authors not shown)
Abstract:
The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned de…
▽ More
The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent's capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.
△ Less
Submitted 22 December, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation
Authors:
Mohit Sharma,
Claudio Fantacci,
Yuxiang Zhou,
Skanda Koppula,
Nicolas Heess,
Jon Scholz,
Yusuf Aytar
Abstract:
Recent works have shown that large models pretrained on common visual learning tasks can provide useful representations for a wide range of specialized perception problems, as well as a variety of robotic manipulation tasks. While prior work on robotic manipulation has predominantly used frozen pretrained features, we demonstrate that in robotics this approach can fail to reach optimal performance…
▽ More
Recent works have shown that large models pretrained on common visual learning tasks can provide useful representations for a wide range of specialized perception problems, as well as a variety of robotic manipulation tasks. While prior work on robotic manipulation has predominantly used frozen pretrained features, we demonstrate that in robotics this approach can fail to reach optimal performance, and that fine-tuning of the full model can lead to significantly better results. Unfortunately, fine-tuning disrupts the pretrained visual representation, and causes representational drift towards the fine-tuned task thus leading to a loss of the versatility of the original model. We introduce "lossless adaptation" to address this shortcoming of classical fine-tuning. We demonstrate that appropriate placement of our parameter efficient adapters can significantly reduce the performance gap between frozen pretrained representations and full end-to-end fine-tuning without changes to the original representation and thus preserving original capabilities of the pretrained model. We perform a comprehensive investigation across three major model architectures (ViTs, NFNets, and ResNets), supervised (ImageNet-1K classification) and self-supervised pretrained weights (CLIP, BYOL, Visual MAE) in 3 task domains and 35 individual tasks, and demonstrate that our claims are strongly validated in various settings.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
How Many Equations of Motion Describe a Moving Human?
Authors:
Gabriele De Luca,
Thomas J. Lampoltshammer,
Johannes Scholz
Abstract:
A human is a thing that moves in space. Like all things that move in space, we can in principle use differential equations to describe their motion as a set of functions that maps time to position (and velocity, acceleration, and so on). With inanimate objects, we can reliably predict their trajectories by using differential equations that account for up to the second-order time derivative of thei…
▽ More
A human is a thing that moves in space. Like all things that move in space, we can in principle use differential equations to describe their motion as a set of functions that maps time to position (and velocity, acceleration, and so on). With inanimate objects, we can reliably predict their trajectories by using differential equations that account for up to the second-order time derivative of their position, as is commonly done in analytical mechanics. With animate objects, though, and with humans, in particular, we do not know the cardinality of the set of equations that define their trajectory. We may be tempted to think, for example, that by reason of their complexity in cognition or behaviour as compared to, say, a rock, then the motion of humans requires a more complex description than the one generally used to describe the motion of physical systems. In this paper, we examine a real-world dataset on human mobility and consider the information that is added by each (computed, but denoised) additional time derivative, and find the maximum order of derivatives of the position that, for that particular dataset, cannot be expressed as a linear transformation of the previous. In this manner, we identify the dimensionality of a minimal model that correctly describes the observed trajectories. We find that every higher-order derivative after the acceleration is linearly dependent upon one of the previous time-derivatives. This measure is robust against noise and the choice for differentiation techniques that we use to compute the time-derivatives numerically as a function of the measured position. This result imposes empirical constraints on the possible sets of differential equations that can be used to describe the kinematics of a moving human.
△ Less
Submitted 2 August, 2022; v1 submitted 28 July, 2022;
originally announced July 2022.
-
Platial mobility: expanding place and mobility in GIS via platio-temporal representations and the mobilities paradigm
Authors:
Farrukh Chishtie,
Rizwan Bulbul,
Panka Babukova,
Johannes Scholz
Abstract:
While platial representations are being developed for sedentary entities, a parallel and useful endeavour would be to consider time in so-called "platio-temporal" representations that would also expand notions of mobility in GIScience, that are solely dependent on Euclidean space and time. Besides enhancing such aspects of place and mobility via spatio-temporal, we also include human aspects of th…
▽ More
While platial representations are being developed for sedentary entities, a parallel and useful endeavour would be to consider time in so-called "platio-temporal" representations that would also expand notions of mobility in GIScience, that are solely dependent on Euclidean space and time. Besides enhancing such aspects of place and mobility via spatio-temporal, we also include human aspects of these representations via considerations of the sociological notions of mobility via the mobilities paradigm that can systematically introduce representation of both platial information along with mobilities associated with 'moving places.' We condense these aspects into 'platial mobility,' a novel conceptual framework, as an integration in GIScience and the mobilities paradigm in sociology, that denotes movement of places in our platio-temporal and sociology-based representations. As illustrative cases for further study using platial mobility as a framework, we explore its benefits and methodological aspects toward developing better understanding for disaster management, disaster risk reduction and pandemics. We then discuss some of the illustrative use cases to clarify the concept of platial mobility and its application prospects in the areas of disaster management, disaster risk reduction and pandemics. These use cases, which include flood events and the ongoing COVID-19 pandemic, have led to displaced and restricted communities having to change practices and places, which would be particularly amenable to the conceptual framework developed in our work.
△ Less
Submitted 23 July, 2022;
originally announced July 2022.
-
Deep-Learning vs Regression: Prediction of Tourism Flow with Limited Data
Authors:
Julian Lemmel,
Zahra Babaiee,
Marvin Kleinlehner,
Ivan Majic,
Philipp Neubauer,
Johannes Scholz,
Radu Grosu,
Sophie A. Neubauer
Abstract:
Modern tourism in the 21st century is facing numerous challenges. One of these challenges is the rapidly growing number of tourists in space limited regions such as historical city centers, museums or geographical bottlenecks like narrow valleys. In this context, a proper and accurate prediction of tourism volume and tourism flow within a certain area is important and critical for visitor manageme…
▽ More
Modern tourism in the 21st century is facing numerous challenges. One of these challenges is the rapidly growing number of tourists in space limited regions such as historical city centers, museums or geographical bottlenecks like narrow valleys. In this context, a proper and accurate prediction of tourism volume and tourism flow within a certain area is important and critical for visitor management tasks such as visitor flow control and prevention of overcrowding. Static flow control methods like limiting access to hotspots or using conventional low level controllers could not solve the problem yet. In this paper, we empirically evaluate the performance of several state-of-the-art deep-learning methods in the field of visitor flow prediction with limited data by using available granular data supplied by a tourism region and comparing the results to ARIMA, a classical statistical method. Our results show that deep-learning models yield better predictions compared to the ARIMA method, while both featuring faster inference times and being able to incorporate additional input features.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
A method for ethical AI in Defence: A case study on developing trustworthy autonomous systems
Authors:
Tara Roberson,
Stephen Bornstein,
Rain Liivoja,
Simon Ng,
Jason Scholz,
S. Kate Devitt
Abstract:
What does it mean to be responsible and responsive when developing and deploying trusted autonomous systems in Defence? In this short reflective article, we describe a case study of building a trusted autonomous system - Athena AI - within an industry-led, government-funded project with diverse collaborators and stakeholders. Using this case study, we draw out lessons on the value and impact of em…
▽ More
What does it mean to be responsible and responsive when developing and deploying trusted autonomous systems in Defence? In this short reflective article, we describe a case study of building a trusted autonomous system - Athena AI - within an industry-led, government-funded project with diverse collaborators and stakeholders. Using this case study, we draw out lessons on the value and impact of embedding responsible research and innovation-aligned, ethics-by-design approaches and principles throughout the development of technology at high translation readiness levels.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
Developing a Trusted Human-AI Network for Humanitarian Benefit
Authors:
Susannah Kate Devitt,
Jason Scholz,
Timo Schless,
Larry Lewis
Abstract:
Artificial intelligences (AI) will increasingly participate digitally and physically in conflicts, yet there is a lack of trused communications with humans for humanitarian purposes. In this paper we consider the integration of a communications protocol (the 'whiteflag protocol'), distributed ledger 'blockchain' technology, and information fusion with AI, to improve conflict communications called…
▽ More
Artificial intelligences (AI) will increasingly participate digitally and physically in conflicts, yet there is a lack of trused communications with humans for humanitarian purposes. In this paper we consider the integration of a communications protocol (the 'whiteflag protocol'), distributed ledger 'blockchain' technology, and information fusion with AI, to improve conflict communications called 'protected assurance understanding situation and entitities' PAUSE. Such a trusted human-AI communication network could provide accountable information exchange regarding protected entities, critical infrastructure, humanitiarian signals and status updates for humans and machines in conflicts. We examine several realistic potential case studies for the integration of these technologies into a trusted human-AI network for humanitarian benefit including mapping a conflict zone with civilians and combatants in real time, preparation to avoid incidents and using the network to manage misinformation. We finish with a real-world example of a PAUSE-like network, the Human Security Information System (HSIS), being developed by USAID, that uses blockchain technology to provide a secure means to better understand the civilian environment.
△ Less
Submitted 10 March, 2023; v1 submitted 7 December, 2021;
originally announced December 2021.
-
Few-Shot Keypoint Detection as Task Adaptation via Latent Embeddings
Authors:
Mel Vecerik,
Jackie Kay,
Raia Hadsell,
Lourdes Agapito,
Jon Scholz
Abstract:
Dense object tracking, the ability to localize specific object points with pixel-level accuracy, is an important computer vision task with numerous downstream applications in robotics. Existing approaches either compute dense keypoint embeddings in a single forward pass, meaning the model is trained to track everything at once, or allocate their full capacity to a sparse predefined set of points,…
▽ More
Dense object tracking, the ability to localize specific object points with pixel-level accuracy, is an important computer vision task with numerous downstream applications in robotics. Existing approaches either compute dense keypoint embeddings in a single forward pass, meaning the model is trained to track everything at once, or allocate their full capacity to a sparse predefined set of points, trading generality for accuracy. In this paper we explore a middle ground based on the observation that the number of relevant points at a given time are typically relatively few, e.g. grasp points on a target object. Our main contribution is a novel architecture, inspired by few-shot task adaptation, which allows a sparse-style network to condition on a keypoint embedding that indicates which point to track. Our central finding is that this approach provides the generality of dense-embedding models, while offering accuracy significantly closer to sparse-keypoint approaches. We present results illustrating this capacity vs. accuracy trade-off, and demonstrate the ability to zero-shot transfer to new object instances (within-class) using a real-robot pick-and-place task.
△ Less
Submitted 13 December, 2021; v1 submitted 9 December, 2021;
originally announced December 2021.
-
Wish you were here: Hindsight Goal Selection for long-horizon dexterous manipulation
Authors:
Todor Davchev,
Oleg Sushkov,
Jean-Baptiste Regli,
Stefan Schaal,
Yusuf Aytar,
Markus Wulfmeier,
Jon Scholz
Abstract:
Complex sequential tasks in continuous-control settings often require agents to successfully traverse a set of "narrow passages" in their state space. Solving such tasks with a sparse reward in a sample-efficient manner poses a challenge to modern reinforcement learning (RL) due to the associated long-horizon nature of the problem and the lack of sufficient positive signal during learning. Various…
▽ More
Complex sequential tasks in continuous-control settings often require agents to successfully traverse a set of "narrow passages" in their state space. Solving such tasks with a sparse reward in a sample-efficient manner poses a challenge to modern reinforcement learning (RL) due to the associated long-horizon nature of the problem and the lack of sufficient positive signal during learning. Various tools have been applied to address this challenge. When available, large sets of demonstrations can guide agent exploration. Hindsight relabelling on the other hand does not require additional sources of information. However, existing strategies explore based on task-agnostic goal distributions, which can render the solution of long-horizon tasks impractical. In this work, we extend hindsight relabelling mechanisms to guide exploration along task-specific distributions implied by a small set of successful demonstrations. We evaluate the approach on four complex, single and dual arm, robotics manipulation tasks against strong suitable baselines. The method requires far fewer demonstrations to solve all tasks and achieves a significantly higher overall performance as task complexity increases. Finally, we investigate the robustness of the proposed solution with respect to the quality of input representations and the number of demonstrations.
△ Less
Submitted 22 March, 2022; v1 submitted 1 December, 2021;
originally announced December 2021.
-
Offline Meta-Reinforcement Learning for Industrial Insertion
Authors:
Tony Z. Zhao,
Jianlan Luo,
Oleg Sushkov,
Rugile Pevceviciute,
Nicolas Heess,
Jon Scholz,
Stefan Schaal,
Sergey Levine
Abstract:
Reinforcement learning (RL) can in principle let robots automatically adapt to new tasks, but current RL methods require a large number of trials to accomplish this. In this paper, we tackle rapid adaptation to new tasks through the framework of meta-learning, which utilizes past tasks to learn to adapt with a specific focus on industrial insertion tasks. Fast adaptation is crucial because prohibi…
▽ More
Reinforcement learning (RL) can in principle let robots automatically adapt to new tasks, but current RL methods require a large number of trials to accomplish this. In this paper, we tackle rapid adaptation to new tasks through the framework of meta-learning, which utilizes past tasks to learn to adapt with a specific focus on industrial insertion tasks. Fast adaptation is crucial because prohibitively large number of on-robot trials will potentially damage hardware pieces. Additionally, effective adaptation is also feasible in that experience among different insertion applications can be largely leveraged by each other. In this setting, we address two specific challenges when applying meta-learning. First, conventional meta-RL algorithms require lengthy online meta-training. We show that this can be replaced with appropriately chosen offline data, resulting in an offline meta-RL method that only requires demonstrations and trials from each of the prior tasks, without the need to run costly meta-RL procedures online. Second, meta-RL methods can fail to generalize to new tasks that are too different from those seen at meta-training time, which poses a particular challenge in industrial applications, where high success rates are critical. We address this by combining contextual meta-learning with direct online finetuning: if the new task is similar to those seen in the prior data, then the contextual meta-learner adapts immediately, and if it is too different, it gradually adapts through finetuning. We show that our approach is able to quickly adapt to a variety of different insertion tasks, with a success rate of 100% using only a fraction of the samples needed for learning the tasks from scratch. Experiment videos and details are available at https://sites.google.com/view/offline-metarl-insertion.
△ Less
Submitted 1 September, 2022; v1 submitted 8 October, 2021;
originally announced October 2021.
-
Robust Multi-Modal Policies for Industrial Assembly via Reinforcement Learning and Demonstrations: A Large-Scale Study
Authors:
Jianlan Luo,
Oleg Sushkov,
Rugile Pevceviciute,
Wenzhao Lian,
Chang Su,
Mel Vecerik,
Ning Ye,
Stefan Schaal,
Jon Scholz
Abstract:
Over the past several years there has been a considerable research investment into learning-based approaches to industrial assembly, but despite significant progress these techniques have yet to be adopted by industry. We argue that it is the prohibitively large design space for Deep Reinforcement Learning (DRL), rather than algorithmic limitations per se, that are truly responsible for this lack…
▽ More
Over the past several years there has been a considerable research investment into learning-based approaches to industrial assembly, but despite significant progress these techniques have yet to be adopted by industry. We argue that it is the prohibitively large design space for Deep Reinforcement Learning (DRL), rather than algorithmic limitations per se, that are truly responsible for this lack of adoption. Pushing these techniques into the industrial mainstream requires an industry-oriented paradigm which differs significantly from the academic mindset. In this paper we define criteria for industry-oriented DRL, and perform a thorough comparison according to these criteria of one family of learning approaches, DRL from demonstration, against a professional industrial integrator on the recently established NIST assembly benchmark. We explain the design choices, representing several years of investigation, which enabled our DRL system to consistently outperform the integrator baseline in terms of both speed and reliability. Finally, we conclude with a competition between our DRL system and a human on a challenge task of insertion into a randomly moving target. This study suggests that DRL is capable of outperforming not only established engineered approaches, but the human motor system as well, and that there remains significant room for improvement. Videos can be found on our project website: https://sites.google.com/view/shield-nist.
△ Less
Submitted 31 July, 2021; v1 submitted 21 March, 2021;
originally announced March 2021.
-
Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision
Authors:
Julien Scholz,
Cornelius Weber,
Muhammad Burhan Hafez,
Stefan Wermter
Abstract:
Using a model of the environment, reinforcement learning agents can plan their future moves and achieve superhuman performance in board games like Chess, Shogi, and Go, while remaining relatively sample-efficient. As demonstrated by the MuZero Algorithm, the environment model can even be learned dynamically, generalizing the agent to many more tasks while at the same time achieving state-of-the-ar…
▽ More
Using a model of the environment, reinforcement learning agents can plan their future moves and achieve superhuman performance in board games like Chess, Shogi, and Go, while remaining relatively sample-efficient. As demonstrated by the MuZero Algorithm, the environment model can even be learned dynamically, generalizing the agent to many more tasks while at the same time achieving state-of-the-art performance. Notably, MuZero uses internal state representations derived from real environment states for its predictions. In this paper, we bind the model's predicted internal state representation to the environment state via two additional terms: a reconstruction model loss and a simpler consistency loss, both of which work independently and unsupervised, acting as constraints to stabilize the learning process. Our experiments show that this new integration of reconstruction model loss and simpler consistency loss provide a significant performance increase in OpenAI Gym environments. Our modifications also enable self-supervised pretraining for MuZero, so the algorithm can learn about environment dynamics before a goal is made available.
△ Less
Submitted 10 February, 2021;
originally announced February 2021.
-
Deep Face Fuzzy Vault: Implementation and Performance
Authors:
Christian Rathgeb,
Johannes Merkle,
Johanna Scholz,
Benjamin Tams,
Vanessa Nesterowicz
Abstract:
Biometric technologies, especially face recognition, have become an essential part of identity management systems worldwide. In deployments of biometrics, secure storage of biometric information is necessary in order to protect the users' privacy. In this context, biometric cryptosystems are designed to meet key requirements of biometric information protection enabling a privacy-preserving storage…
▽ More
Biometric technologies, especially face recognition, have become an essential part of identity management systems worldwide. In deployments of biometrics, secure storage of biometric information is necessary in order to protect the users' privacy. In this context, biometric cryptosystems are designed to meet key requirements of biometric information protection enabling a privacy-preserving storage and comparison of biometric data.
This work investigates the application of a well-known biometric cryptosystem, i.e. the improved fuzzy vault scheme, to facial feature vectors extracted through deep convolutional neural networks. To this end, a feature transformation method is introduced which maps fixed-length real-valued deep feature vectors to integer-valued feature sets. As part of said feature transformation, a detailed analysis of different feature quantisation and binarisation techniques is conducted. At key binding, obtained feature sets are locked in an unlinkable improved fuzzy vault. For key retrieval, the efficiency of different polynomial reconstruction techniques is investigated. The proposed feature transformation method and template protection scheme are agnostic of the biometric characteristic. In experiments, an unlinkable improved deep face fuzzy vault-based template protection scheme is constructed employing features extracted with a state-of-the-art deep convolutional neural network trained with the additive angular margin loss (ArcFace). For the best configuration, a false non-match rate below 1% at a false match rate of 0.01%, is achieved in cross-database experiments on the FERET and FRGCv2 face databases. On average, a security level of up to approximately 28 bits is obtained. This work presents an effective face-based fuzzy vault scheme providing privacy protection of facial reference data as well as digital key derivation from face.
△ Less
Submitted 5 November, 2021; v1 submitted 4 February, 2021;
originally announced February 2021.
-
S3K: Self-Supervised Semantic Keypoints for Robotic Manipulation via Multi-View Consistency
Authors:
Mel Vecerik,
Jean-Baptiste Regli,
Oleg Sushkov,
David Barker,
Rugile Pevceviciute,
Thomas Rothörl,
Christopher Schuster,
Raia Hadsell,
Lourdes Agapito,
Jonathan Scholz
Abstract:
A robot's ability to act is fundamentally constrained by what it can perceive. Many existing approaches to visual representation learning utilize general-purpose training criteria, e.g. image reconstruction, smoothness in latent space, or usefulness for control, or else make use of large datasets annotated with specific features (bounding boxes, segmentations, etc.). However, both approaches often…
▽ More
A robot's ability to act is fundamentally constrained by what it can perceive. Many existing approaches to visual representation learning utilize general-purpose training criteria, e.g. image reconstruction, smoothness in latent space, or usefulness for control, or else make use of large datasets annotated with specific features (bounding boxes, segmentations, etc.). However, both approaches often struggle to capture the fine-detail required for precision tasks on specific objects, e.g. grasping and mating a plug and socket. We argue that these difficulties arise from a lack of geometric structure in these models. In this work we advocate semantic 3D keypoints as a visual representation, and present a semi-supervised training objective that can allow instance or category-level keypoints to be trained to 1-5 millimeter-accuracy with minimal supervision. Furthermore, unlike local texture-based approaches, our model integrates contextual information from a large area and is therefore robust to occlusion, noise, and lack of discernible texture. We demonstrate that this ability to locate semantic keypoints enables high level scripting of human understandable behaviours. Finally we show that these keypoints provide a good way to define reward functions for reinforcement learning and are a good representation for training agents.
△ Less
Submitted 13 October, 2020; v1 submitted 30 September, 2020;
originally announced September 2020.
-
Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient
Authors:
Kevin Sebastian Luck,
Mel Vecerik,
Simon Stepputtis,
Heni Ben Amor,
Jonathan Scholz
Abstract:
Model-free reinforcement learning algorithms such as Deep Deterministic Policy Gradient (DDPG) often require additional exploration strategies, especially if the actor is of deterministic nature. This work evaluates the use of model-based trajectory optimization methods used for exploration in Deep Deterministic Policy Gradient when trained on a latent image embedding. In addition, an extension of…
▽ More
Model-free reinforcement learning algorithms such as Deep Deterministic Policy Gradient (DDPG) often require additional exploration strategies, especially if the actor is of deterministic nature. This work evaluates the use of model-based trajectory optimization methods used for exploration in Deep Deterministic Policy Gradient when trained on a latent image embedding. In addition, an extension of DDPG is derived using a value function as critic, making use of a learned deep dynamics model to compute the policy gradient. This approach leads to a symbiotic relationship between the deep reinforcement learning algorithm and the latent trajectory optimizer. The trajectory optimizer benefits from the critic learned by the RL algorithm and the latter from the enhanced exploration generated by the planner. The developed methods are evaluated on two continuous control tasks, one in simulation and one in the real world. In particular, a Baxter robot is trained to perform an insertion task, while only receiving sparse rewards and images as observations from the environment.
△ Less
Submitted 15 November, 2019;
originally announced November 2019.
-
Scaling data-driven robotics with reward sketching and batch reinforcement learning
Authors:
Serkan Cabi,
Sergio Gómez Colmenarejo,
Alexander Novikov,
Ksenia Konyushkova,
Scott Reed,
Rae Jeong,
Konrad Zolna,
Yusuf Aytar,
David Budden,
Mel Vecerik,
Oleg Sushkov,
David Barker,
Jonathan Scholz,
Misha Denil,
Nando de Freitas,
Ziyu Wang
Abstract:
We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human…
▽ More
We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human annotation as supervision to learn a reward function, which enables us to deal with real-world tasks where the reward signal cannot be acquired directly. Learned rewards are used in combination with a large dataset of experience from different tasks to learn a robot policy offline using batch RL. We show that using our approach it is possible to train agents to perform a variety of challenging manipulation tasks including stacking rigid objects and handling cloth.
△ Less
Submitted 4 June, 2020; v1 submitted 26 September, 2019;
originally announced September 2019.
-
Generative predecessor models for sample-efficient imitation learning
Authors:
Yannick Schroecker,
Mel Vecerik,
Jonathan Scholz
Abstract:
We propose Generative Predecessor Models for Imitation Learning (GPRIL), a novel imitation learning algorithm that matches the state-action distribution to the distribution observed in expert demonstrations, using generative models to reason probabilistically about alternative histories of demonstrated states. We show that this approach allows an agent to learn robust policies using only a small n…
▽ More
We propose Generative Predecessor Models for Imitation Learning (GPRIL), a novel imitation learning algorithm that matches the state-action distribution to the distribution observed in expert demonstrations, using generative models to reason probabilistically about alternative histories of demonstrated states. We show that this approach allows an agent to learn robust policies using only a small number of expert demonstrations and self-supervised interactions with the environment. We derive this approach from first principles and compare it empirically to a state-of-the-art imitation learning method, showing that it outperforms or matches its performance on two simulated robot manipulation tasks and demonstrate significantly higher sample efficiency by applying the algorithm on a real robot.
△ Less
Submitted 1 April, 2019;
originally announced April 2019.
-
Genetic Algorithms and the Traveling Salesman Problem a historical Review
Authors:
Jan Scholz
Abstract:
In this paper a highly abstracted view on the historical development of Genetic Algorithms for the Traveling Salesman Problem is given. In a meta-data analysis three phases in the development can be distinguished. First exponential growth in interest till 1996 can be observed, growth stays linear till 2011 and after that publications deteriorate. These three phases are examined and the major miles…
▽ More
In this paper a highly abstracted view on the historical development of Genetic Algorithms for the Traveling Salesman Problem is given. In a meta-data analysis three phases in the development can be distinguished. First exponential growth in interest till 1996 can be observed, growth stays linear till 2011 and after that publications deteriorate. These three phases are examined and the major milestones are presented. Lastly an outlook to future work in this field is infered.
△ Less
Submitted 17 January, 2019;
originally announced January 2019.
-
A Practical Approach to Insertion with Variable Socket Position Using Deep Reinforcement Learning
Authors:
Mel Vecerik,
Oleg Sushkov,
David Barker,
Thomas Rothörl,
Todd Hester,
Jon Scholz
Abstract:
Insertion is a challenging haptic and visual control problem with significant practical value for manufacturing. Existing approaches in the model-based robotics community can be highly effective when task geometry is known, but are complex and cumbersome to implement, and must be tailored to each individual problem by a qualified engineer. Within the learning community there is a long history of i…
▽ More
Insertion is a challenging haptic and visual control problem with significant practical value for manufacturing. Existing approaches in the model-based robotics community can be highly effective when task geometry is known, but are complex and cumbersome to implement, and must be tailored to each individual problem by a qualified engineer. Within the learning community there is a long history of insertion research, but existing approaches are typically either too sample-inefficient to run on real robots, or assume access to high-level object features, e.g. socket pose. In this paper we show that relatively minor modifications to an off-the-shelf Deep-RL algorithm (DDPG), combined with a small number of human demonstrations, allows the robot to quickly learn to solve these tasks efficiently and robustly. Our approach requires no modeling or simulation, no parameterized search or alignment behaviors, no vision system aside from raw images, and no reward shaping. We evaluate our approach on a narrow-clearance peg-insertion task and a deformable clip-insertion task, both of which include variability in the socket position. Our results show that these tasks can be solved reliably on the real robot in less than 10 minutes of interaction time, and that the resulting policies are robust to variance in the socket position and orientation.
△ Less
Submitted 8 October, 2018; v1 submitted 2 October, 2018;
originally announced October 2018.
-
Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards
Authors:
Mel Vecerik,
Todd Hester,
Jonathan Scholz,
Fumin Wang,
Olivier Pietquin,
Bilal Piot,
Nicolas Heess,
Thomas Rothörl,
Thomas Lampe,
Martin Riedmiller
Abstract:
We propose a general and model-free approach for Reinforcement Learning (RL) on real robotics with sparse rewards. We build upon the Deep Deterministic Policy Gradient (DDPG) algorithm to use demonstrations. Both demonstrations and actual interactions are used to fill a replay buffer and the sampling ratio between demonstrations and transitions is automatically tuned via a prioritized replay mecha…
▽ More
We propose a general and model-free approach for Reinforcement Learning (RL) on real robotics with sparse rewards. We build upon the Deep Deterministic Policy Gradient (DDPG) algorithm to use demonstrations. Both demonstrations and actual interactions are used to fill a replay buffer and the sampling ratio between demonstrations and transitions is automatically tuned via a prioritized replay mechanism. Typically, carefully engineered shaping rewards are required to enable the agents to efficiently explore on high dimensional control problems such as robotics. They are also required for model-based acceleration methods relying on local solvers such as iLQG (e.g. Guided Policy Search and Normalized Advantage Function). The demonstrations replace the need for carefully engineered rewards, and reduce the exploration problem encountered by classical RL approaches in these domains. Demonstrations are collected by a robot kinesthetically force-controlled by a human demonstrator. Results on four simulated insertion tasks show that DDPG from demonstrations out-performs DDPG, and does not require engineered rewards. Finally, we demonstrate the method on a real robotics task consisting of inserting a clip (flexible object) into a rigid object.
△ Less
Submitted 8 October, 2018; v1 submitted 27 July, 2017;
originally announced July 2017.
-
PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations
Authors:
Rico Jonschkowski,
Roland Hafner,
Jonathan Scholz,
Martin Riedmiller
Abstract:
We propose position-velocity encoders (PVEs) which learn---without supervision---to encode images to positions and velocities of task-relevant objects. PVEs encode a single image into a low-dimensional position state and compute the velocity state from finite differences in position. In contrast to autoencoders, position-velocity encoders are not trained by image reconstruction, but by making the…
▽ More
We propose position-velocity encoders (PVEs) which learn---without supervision---to encode images to positions and velocities of task-relevant objects. PVEs encode a single image into a low-dimensional position state and compute the velocity state from finite differences in position. In contrast to autoencoders, position-velocity encoders are not trained by image reconstruction, but by making the position-velocity representation consistent with priors about interacting with the physical world. We applied PVEs to several simulated control tasks from pixels and achieved promising preliminary results.
△ Less
Submitted 24 July, 2017; v1 submitted 27 May, 2017;
originally announced May 2017.
-
Optimized network structure and routing metric in wireless multihop ad hoc communication
Authors:
Wolfram Krause,
Jan Scholz,
Martin Greiner
Abstract:
Inspired by the Statistical Physics of complex networks, wireless multihop ad hoc communication networks are considered in abstracted form. Since such engineered networks are able to modify their structure via topology control, we search for optimized network structures, which maximize the end-to-end throughput performance. A modified version of betweenness centrality is introduced and shown to…
▽ More
Inspired by the Statistical Physics of complex networks, wireless multihop ad hoc communication networks are considered in abstracted form. Since such engineered networks are able to modify their structure via topology control, we search for optimized network structures, which maximize the end-to-end throughput performance. A modified version of betweenness centrality is introduced and shown to be very relevant for the respective modeling. The calculated optimized network structures lead to a significant increase of the end-to-end throughput. The discussion of the resulting structural properties reveals that it will be almost impossible to construct these optimized topologies in a technologically efficient distributive manner. However, the modified betweenness centrality also allows to propose a new routing metric for the end-to-end communication traffic. This approach leads to an even larger increase of throughput capacity and is easily implementable in a technologically relevant manner.
△ Less
Submitted 7 March, 2005; v1 submitted 3 March, 2005;
originally announced March 2005.