-
Exploring Domain Shift on Radar-Based 3D Object Detection Amidst Diverse Environmental Conditions
Authors:
Miao Zhang,
Sherif Abdulatif,
Benedikt Loesch,
Marco Altmann,
Marius Schwarz,
Bin Yang
Abstract:
The rapid evolution of deep learning and its integration with autonomous driving systems have led to substantial advancements in 3D perception using multimodal sensors. Notably, radar sensors show greater robustness compared to cameras and lidar under adverse weather and varying illumination conditions. This study delves into the often-overlooked yet crucial issue of domain shift in 4D radar-based…
▽ More
The rapid evolution of deep learning and its integration with autonomous driving systems have led to substantial advancements in 3D perception using multimodal sensors. Notably, radar sensors show greater robustness compared to cameras and lidar under adverse weather and varying illumination conditions. This study delves into the often-overlooked yet crucial issue of domain shift in 4D radar-based object detection, examining how varying environmental conditions, such as different weather patterns and road types, impact 3D object detection performance. Our findings highlight distinct domain shifts across various weather scenarios, revealing unique dataset sensitivities that underscore the critical role of radar point cloud generation. Additionally, we demonstrate that transitioning between different road types, especially from highways to urban settings, introduces notable domain shifts, emphasizing the necessity for diverse data collection across varied road environments. To the best of our knowledge, this is the first comprehensive analysis of domain shift effects on 4D radar-based object detection. We believe this empirical study contributes to understanding the complex nature of domain shifts in radar data and suggests paths forward for data collection strategy in the face of environmental variability.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Self-centering 3-DOF feet controller for hands-free locomotion control in telepresence and virtual reality
Authors:
Raphael Memmesheimer,
Christian Lenz,
Max Schwarz,
Michael Schreiber,
Sven Behnke
Abstract:
We present a novel seated foot controller for handling 3-DOF aimed to control locomotion for telepresence robotics and virtual reality environments. Tilting the feet on two axes yields in forward, backward and sideways motion. In addition, a separate rotary joint allows for rotation around the vertical axis. Attached springs on all joints self-center the controller. The HTC Vive tracker is used to…
▽ More
We present a novel seated foot controller for handling 3-DOF aimed to control locomotion for telepresence robotics and virtual reality environments. Tilting the feet on two axes yields in forward, backward and sideways motion. In addition, a separate rotary joint allows for rotation around the vertical axis. Attached springs on all joints self-center the controller. The HTC Vive tracker is used to translate the trackers' orientation into locomotion commands. The proposed self-centering foot controller was used successfully for the ANA Avatar XPRIZE competition, where a naive operator traversed the robot through a longer distance, surpassing obstacles while solving various interaction and manipulation tasks in between. We publicly provide the models of the mostly 3D-printed feet controller for reproduction.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features
Authors:
Andre Rochow,
Max Schwarz,
Sven Behnke
Abstract:
The task of face reenactment is to transfer the head motion and facial expressions from a driving video to the appearance of a source image, which may be of a different person (cross-reenactment). Most existing methods are CNN-based and estimate optical flow from the source image to the current driving frame, which is then inpainted and refined to produce the output animation. We propose a transfo…
▽ More
The task of face reenactment is to transfer the head motion and facial expressions from a driving video to the appearance of a source image, which may be of a different person (cross-reenactment). Most existing methods are CNN-based and estimate optical flow from the source image to the current driving frame, which is then inpainted and refined to produce the output animation. We propose a transformer-based encoder for computing a set-latent representation of the source image(s). We then predict the output color of a query pixel using a transformer-based decoder, which is conditioned with keypoints and a facial expression vector extracted from the driving frame. Latent representations of the source person are learned in a self-supervised manner that factorize their appearance, head pose, and facial expressions. Thus, they are perfectly suited for cross-reenactment. In contrast to most related work, our method naturally extends to multiple source images and can thus adapt to person-specific facial dynamics. We also propose data augmentation and regularization schemes that are necessary to prevent overfitting and support generalizability of the learned representations. We evaluated our approach in a randomized user study. The results indicate superior performance compared to the state-of-the-art in terms of motion transfer quality and temporal consistency.
△ Less
Submitted 10 June, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping
Authors:
Anas Gouda,
Max Schwarz,
Christopher Reining,
Sven Behnke,
Alice Kirchheim
Abstract:
Foundation models are a strong trend in deep learning and computer vision. These models serve as a base for applications as they require minor or no further fine-tuning by developers to integrate into their applications. Foundation models for zero-shot object segmentation such as Segment Anything (SAM) output segmentation masks from images without any further object information. When they are foll…
▽ More
Foundation models are a strong trend in deep learning and computer vision. These models serve as a base for applications as they require minor or no further fine-tuning by developers to integrate into their applications. Foundation models for zero-shot object segmentation such as Segment Anything (SAM) output segmentation masks from images without any further object information. When they are followed in a pipeline by an object identification model, they can perform object detection without training. Here, we focus on training such an object identification model. A crucial practical aspect for an object identification model is to be flexible in input size. As object identification is an image retrieval problem, a suitable method should handle multi-query multi-gallery situations without constraining the number of input images (e.g. by having fixed-size aggregation layers). The key solution to train such a model is the centroid triplet loss (CTL), which aggregates image features to their centroids. CTL yields high accuracy, avoids misleading training signals and keeps the model input size flexible. In our experiments, we establish a new state of the art on the ArmBench object identification task, which shows general applicability of our model. We furthermore demonstrate an integrated unseen object detection pipeline on the challenging HOPE dataset, which requires fine-grained detection. There, our pipeline matches and surpasses related methods which have been trained on dataset-specific data.
△ Less
Submitted 8 July, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Non-Numerical Weakly Relational Domains
Authors:
Helmut Seidl,
Julian Erhard,
Sarah Tilscher,
Michael Schwarz
Abstract:
The weakly relational domain of Octagons offers a decent compromise between precision and efficiency for numerical properties. Here, we are concerned with the construction of non-numerical relational domains. We provide a general construction of weakly relational domains, which we exemplify with an extension of constant propagation by disjunctions. Since for the resulting domain of 2-disjunctive f…
▽ More
The weakly relational domain of Octagons offers a decent compromise between precision and efficiency for numerical properties. Here, we are concerned with the construction of non-numerical relational domains. We provide a general construction of weakly relational domains, which we exemplify with an extension of constant propagation by disjunctions. Since for the resulting domain of 2-disjunctive formulas, satisfiability is NP-complete, we provide a general construction for a further, more abstract weakly relational domain where the abstract operations of restriction and least upper bound can be efficiently implemented. In the second step, we consider a relational domain that tracks conjunctions of inequalities between variables, and between variables and constants for arbitrary partial orders of values. Examples are sub(multi)sets, as well as prefix, substring or scattered substring orderings on strings. When the partial order is a lattice, we provide precise polynomial algorithms for satisfiability, restriction, and the best abstraction of disjunction. Complementary to the constructions for lattices, we find that, in general, satisfiability of conjunctions is NP-complete. We therefore again provide polynomial abstract versions of restriction, conjunction, and join. By using our generic constructions, these domains are extended to weakly relational domains that additionally track disjunctions. For all our domains, we indicate how abstract transformers for assignments and guards can be constructed.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Attention-Based VR Facial Animation with Visual Mouth Camera Guidance for Immersive Telepresence Avatars
Authors:
Andre Rochow,
Max Schwarz,
Sven Behnke
Abstract:
Facial animation in virtual reality environments is essential for applications that necessitate clear visibility of the user's face and the ability to convey emotional signals. In our scenario, we animate the face of an operator who controls a robotic Avatar system. The use of facial animation is particularly valuable when the perception of interacting with a specific individual, rather than just…
▽ More
Facial animation in virtual reality environments is essential for applications that necessitate clear visibility of the user's face and the ability to convey emotional signals. In our scenario, we animate the face of an operator who controls a robotic Avatar system. The use of facial animation is particularly valuable when the perception of interacting with a specific individual, rather than just a robot, is intended. Purely keypoint-driven animation approaches struggle with the complexity of facial movements. We present a hybrid method that uses both keypoints and direct visual guidance from a mouth camera. Our method generalizes to unseen operators and requires only a quick enrolment step with capture of two short videos. Multiple source images are selected with the intention to cover different facial expressions. Given a mouth camera frame from the HMD, we dynamically construct the target keypoints and apply an attention mechanism to determine the importance of each source image. To resolve keypoint ambiguities and animate a broader range of mouth expressions, we propose to inject visual mouth camera information into the latent space. We enable training on large-scale speaking head datasets by simulating the mouth camera input with its perspective differences and facial deformations. Our method outperforms a baseline in quality, capability, and temporal consistency. In addition, we highlight how the facial animation contributed to our victory at the ANA Avatar XPRIZE Finals.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Correctness Witness Validation by Abstract Interpretation
Authors:
Simmo Saan,
Michael Schwarz,
Julian Erhard,
Helmut Seidl,
Sarah Tilscher,
Vesal Vojdani
Abstract:
Witnesses record automated program analysis results and make them exchangeable. To validate correctness witnesses through abstract interpretation, we introduce a novel abstract operation unassume. This operator incorporates witness invariants into the abstract program state. Given suitable invariants, the unassume operation can accelerate fixpoint convergence and yield more precise results. We dem…
▽ More
Witnesses record automated program analysis results and make them exchangeable. To validate correctness witnesses through abstract interpretation, we introduce a novel abstract operation unassume. This operator incorporates witness invariants into the abstract program state. Given suitable invariants, the unassume operation can accelerate fixpoint convergence and yield more precise results. We demonstrate the feasibility of this approach by augmenting an abstract interpreter with unassume operators and evaluating the impact of incorporating witnesses on performance and precision. Using manually crafted witnesses, we can confirm verification results for multi-threaded programs with a reduction in effort ranging from 7% to 47% in CPU time. More intriguingly, we discover that using witnesses from model checkers can guide our analyzer to verify program properties that it could not verify on its own.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Reviving Meltdown 3a
Authors:
Daniel Weber,
Fabian Thomas,
Lukas Gerlach,
Ruiyi Zhang,
Michael Schwarz
Abstract:
Since the initial discovery of Meltdown and Spectre in 2017, different variants of these attacks have been discovered. One often overlooked variant is Meltdown 3a, also known as Meltdown-CPL-REG. Even though Meltdown-CPL-REG was initially discovered in 2018, the available information regarding the vulnerability is still sparse. In this paper, we analyze Meltdown-CPL-REG on 19 different CPUs from d…
▽ More
Since the initial discovery of Meltdown and Spectre in 2017, different variants of these attacks have been discovered. One often overlooked variant is Meltdown 3a, also known as Meltdown-CPL-REG. Even though Meltdown-CPL-REG was initially discovered in 2018, the available information regarding the vulnerability is still sparse. In this paper, we analyze Meltdown-CPL-REG on 19 different CPUs from different vendors using an automated tool. We observe that the impact is more diverse than documented and differs from CPU to CPU. Surprisingly, while the newest Intel CPUs do not seem affected by Meltdown-CPL-REG, the newest available AMD CPUs (Zen3+) are still affected by the vulnerability. Furthermore, given our attack primitive CounterLeak, we show that besides up-to-date patches, Meltdown-CPL-REG can still be exploited as we reenable performance-counter-based attacks on cryptographic algorithms, break KASLR, and mount Spectre attacks. Although Meltdown-CPL-REG is not as powerful as other transient-execution attacks, its attack surface should not be underestimated.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Indirect Meltdown: Building Novel Side-Channel Attacks from Transient-Execution Attacks
Authors:
Daniel Weber,
Fabian Thomas,
Lukas Gerlach,
Ruiyi Zhang,
Michael Schwarz
Abstract:
The transient-execution attack Meltdown leaks sensitive information by transiently accessing inaccessible data during out-of-order execution. Although Meltdown is fixed in hardware for recent CPU generations, most currently-deployed CPUs have to rely on software mitigations, such as KPTI. Still, Meltdown is considered non-exploitable on current systems. In this paper, we show that adding another l…
▽ More
The transient-execution attack Meltdown leaks sensitive information by transiently accessing inaccessible data during out-of-order execution. Although Meltdown is fixed in hardware for recent CPU generations, most currently-deployed CPUs have to rely on software mitigations, such as KPTI. Still, Meltdown is considered non-exploitable on current systems. In this paper, we show that adding another layer of indirection to Meltdown transforms a transient-execution attack into a side-channel attack, leaking metadata instead of data. We show that despite software mitigations, attackers can still leak metadata from other security domains by observing the success rate of Meltdown on non-secret data. With LeakIDT, we present the first cache-line granular monitoring of kernel addresses. LeakIDT allows an attacker to obtain cycle-accurate timestamps for attacker-chosen interrupts. We use our attack to get accurate inter-keystroke timings and fingerprint visited websites. While we propose a low-overhead software mitigation to prevent the exploitation of LeakIDT, we emphasize that the side-channel aspect of transient-execution attacks should not be underestimated.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Learning from SAM: Harnessing a Foundation Model for Sim2Real Adaptation by Regularization
Authors:
Mayara E. Bonani,
Max Schwarz,
Sven Behnke
Abstract:
Domain adaptation is especially important for robotics applications, where target domain training data is usually scarce and annotations are costly to obtain. We present a method for self-supervised domain adaptation for the scenario where annotated source domain data (e.g. from synthetic generation) is available, but the target domain data is completely unannotated. Our method targets the semanti…
▽ More
Domain adaptation is especially important for robotics applications, where target domain training data is usually scarce and annotations are costly to obtain. We present a method for self-supervised domain adaptation for the scenario where annotated source domain data (e.g. from synthetic generation) is available, but the target domain data is completely unannotated. Our method targets the semantic segmentation task and leverages a segmentation foundation model (Segment Anything Model) to obtain segment information on unannotated data. We take inspiration from recent advances in unsupervised local feature learning and propose an invariance-variance loss over the detected segments for regularizing feature representations in the target domain. Crucially, this loss structure and network architecture can handle overlapping segments and oversegmentation as produced by Segment Anything. We demonstrate the advantage of our method on the challenging YCB-Video and HomebrewedDB datasets and show that it outperforms prior work and, on YCB-Video, even a network trained with real annotations. Additionally, we provide insight through model ablations and show applicability to a custom robotic application.
△ Less
Submitted 10 May, 2024; v1 submitted 27 September, 2023;
originally announced September 2023.
-
Emergent learning in physical systems as feedback-based aging in a glassy landscape
Authors:
Vidyesh Rao Anisetti,
Ananth Kandala,
J. M. Schwarz
Abstract:
By training linear physical networks to learn linear transformations, we discern how their physical properties evolve due to weight update rules. Our findings highlight a striking similarity between the learning behaviors of such networks and the processes of aging and memory formation in disordered and glassy systems. We show that the learning dynamics resembles an aging process, where the system…
▽ More
By training linear physical networks to learn linear transformations, we discern how their physical properties evolve due to weight update rules. Our findings highlight a striking similarity between the learning behaviors of such networks and the processes of aging and memory formation in disordered and glassy systems. We show that the learning dynamics resembles an aging process, where the system relaxes in response to repeated application of the feedback boundary forces in presence of an input force, thus encoding a memory of the input-output relationship. With this relaxation comes an increase in the correlation length, which is indicated by the two-point correlation function for the components of the network. We also observe that the square root of the mean-squared error as a function of epoch takes on a non-exponential form, which is a typical feature of glassy systems. This physical interpretation suggests that by encoding more detailed information into input and feedback boundary forces, the process of emergent learning can be rather ubiquitous and, thus, serve as a very early physical mechanism, from an evolutionary standpoint, for learning in biological systems.
△ Less
Submitted 30 October, 2023; v1 submitted 8 September, 2023;
originally announced September 2023.
-
NimbRo wins ANA Avatar XPRIZE Immersive Telepresence Competition: Human-Centric Evaluation and Lessons Learned
Authors:
Christian Lenz,
Max Schwarz,
Andre Rochow,
Bastian Pätzold,
Raphael Memmesheimer,
Michael Schreiber,
Sven Behnke
Abstract:
Robotic avatar systems can enable immersive telepresence with locomotion, manipulation, and communication capabilities. We present such an avatar system, based on the key components of immersive 3D visualization and transparent force-feedback telemanipulation. Our avatar robot features an anthropomorphic upper body with dexterous hands. The remote human operator drives the arms and fingers through…
▽ More
Robotic avatar systems can enable immersive telepresence with locomotion, manipulation, and communication capabilities. We present such an avatar system, based on the key components of immersive 3D visualization and transparent force-feedback telemanipulation. Our avatar robot features an anthropomorphic upper body with dexterous hands. The remote human operator drives the arms and fingers through an exoskeleton-based operator station, which provides force feedback both at the wrist and for each finger. The robot torso is mounted on a holonomic base, providing omnidirectional locomotion on flat floors, controlled using a 3D rudder device. Finally, the robot features a 6D movable head with stereo cameras, which stream images to a VR display worn by the operator. Movement latency is hidden using spherical rendering. The head also carries a telepresence screen displaying an animated image of the operator's face, enabling direct interaction with remote persons. Our system won the \$10M ANA Avatar XPRIZE competition, which challenged teams to develop intuitive and immersive avatar systems that could be operated by briefly trained judges. We analyze our successful participation in the semifinals and finals and provide insight into our operator training and lessons learned. In addition, we evaluate our system in a user study that demonstrates its intuitive and easy usability.
△ Less
Submitted 28 August, 2023; v1 submitted 23 August, 2023;
originally announced August 2023.
-
Enhancing Network Slicing Architectures with Machine Learning, Security, Sustainability and Experimental Networks Integration
Authors:
Joberto S. B. Martins,
Tereza C. Carvalho,
Rodrigo Moreira,
Cristiano Both,
Adnei Donatti,
João H. Corrêa,
José A. Suruagy,
Sand L. Corrêa,
Antonio J. G. Abelem,
Moisés R. N. Ribeiro,
Jose-Marcos Nogueira,
Luiz C. S. Magalhães,
Juliano Wickboldt,
Tiago Ferreto,
Ricardo Mello,
Rafael Pasquini,
Marcos Schwarz,
Leobino N. Sampaio,
Daniel F. Macedo,
José F. de Rezende,
Kleber V. Cardoso,
Flávio O. Silva
Abstract:
Network Slicing (NS) is an essential technique extensively used in 5G networks computing strategies, mobile edge computing, mobile cloud computing, and verticals like the Internet of Vehicles and industrial IoT, among others. NS is foreseen as one of the leading enablers for 6G futuristic and highly demanding applications since it allows the optimization and customization of scarce and disputed re…
▽ More
Network Slicing (NS) is an essential technique extensively used in 5G networks computing strategies, mobile edge computing, mobile cloud computing, and verticals like the Internet of Vehicles and industrial IoT, among others. NS is foreseen as one of the leading enablers for 6G futuristic and highly demanding applications since it allows the optimization and customization of scarce and disputed resources among dynamic, demanding clients with highly distinct application requirements. Various standardization organizations, like 3GPP's proposal for new generation networks and state-of-the-art 5G/6G research projects, are proposing new NS architectures. However, new NS architectures have to deal with an extensive range of requirements that inherently result in having NS architecture proposals typically fulfilling the needs of specific sets of domains with commonalities. The Slicing Future Internet Infrastructures (SFI2) architecture proposal explores the gap resulting from the diversity of NS architectures target domains by proposing a new NS reference architecture with a defined focus on integrating experimental networks and enhancing the NS architecture with Machine Learning (ML) native optimizations, energy-efficient slicing, and slicing-tailored security functionalities. The SFI2 architectural main contribution includes the utilization of the slice-as-a-service paradigm for end-to-end orchestration of resources across multi-domains and multi-technology experimental networks. In addition, the SFI2 reference architecture instantiations will enhance the multi-domain and multi-technology integrated experimental network deployment with native ML optimization, energy-efficient aware slicing, and slicing-tailored security functionalities for the practical domain.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
TALUS: Reinforcing TEE Confidentiality with Cryptographic Coprocessors (Technical Report)
Authors:
Dhiman Chakraborty,
Michael Schwarz,
Sven Bugiel
Abstract:
Platforms are nowadays typically equipped with tristed execution environments (TEES), such as Intel SGX and ARM TrustZone. However, recent microarchitectural attacks on TEEs repeatedly broke their confidentiality guarantees, including the leakage of long-term cryptographic secrets. These systems are typically also equipped with a cryptographic coprocessor, such as a TPM or Google Titan. These copr…
▽ More
Platforms are nowadays typically equipped with tristed execution environments (TEES), such as Intel SGX and ARM TrustZone. However, recent microarchitectural attacks on TEEs repeatedly broke their confidentiality guarantees, including the leakage of long-term cryptographic secrets. These systems are typically also equipped with a cryptographic coprocessor, such as a TPM or Google Titan. These coprocessors offer a unique set of security features focused on safeguarding cryptographic secrets. Still, despite their simultaneous availability, the integration between these technologies is practically nonexistent, which prevents them from benefitting from each other's strengths. In this paper, we propose TALUS, a general design and a set of three main requirements for a secure symbiosis between TEEs and cryptographic coprocessors. We implement a proof-of-concept of TALUS based on Intel SGX and a hardware TPM. We show that with TALUS, the long-term secrets used in the SGX life cycle can be moved to the TPM. We demonstrate that our design is robust even in the presence of transient execution attacks, preventing an entire class of attacks due to the reduced attack surface on the shared hardware.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
VR Facial Animation for Immersive Telepresence Avatars
Authors:
Andre Rochow,
Max Schwarz,
Michael Schreiber,
Sven Behnke
Abstract:
VR Facial Animation is necessary in applications requiring clear view of the face, even though a VR headset is worn. In our case, we aim to animate the face of an operator who is controlling our robotic avatar system. We propose a real-time capable pipeline with very fast adaptation for specific operators. In a quick enrollment step, we capture a sequence of source images from the operator without…
▽ More
VR Facial Animation is necessary in applications requiring clear view of the face, even though a VR headset is worn. In our case, we aim to animate the face of an operator who is controlling our robotic avatar system. We propose a real-time capable pipeline with very fast adaptation for specific operators. In a quick enrollment step, we capture a sequence of source images from the operator without the VR headset which contain all the important operator-specific appearance information. During inference, we then use the operator keypoint information extracted from a mouth camera and two eye cameras to estimate the target expression and head pose, to which we map the appearance of a source still image. In order to enhance the mouth expression accuracy, we dynamically select an auxiliary expression frame from the captured sequence. This selection is done by learning to transform the current mouth keypoints into the source camera space, where the alignment can be determined accurately. We, furthermore, demonstrate an eye tracking pipeline that can be trained in less than a minute, a time efficient way to train the whole pipeline given a dataset that includes only complete faces, show exemplary results generated by our method, and discuss performance at the ANA Avatar XPRIZE semifinals.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Audio-based Roughness Sensing and Tactile Feedback for Haptic Perception in Telepresence
Authors:
Bastian Pätzold,
Andre Rochow,
Michael Schreiber,
Raphael Memmesheimer,
Christian Lenz,
Max Schwarz,
Sven Behnke
Abstract:
Haptic perception is highly important for immersive teleoperation of robots, especially for accomplishing manipulation tasks. We propose a low-cost haptic sensing and rendering system, which is capable of detecting and displaying surface roughness. As the robot fingertip moves across a surface of interest, two microphones capture sound coupled directly through the fingertip and through the air, re…
▽ More
Haptic perception is highly important for immersive teleoperation of robots, especially for accomplishing manipulation tasks. We propose a low-cost haptic sensing and rendering system, which is capable of detecting and displaying surface roughness. As the robot fingertip moves across a surface of interest, two microphones capture sound coupled directly through the fingertip and through the air, respectively. A learning-based detector system analyzes the data in real time and gives roughness estimates with both high temporal resolution and low latency. Finally, an audio-based vibrational actuator displays the result to the human operator. We demonstrate the effectiveness of our system through lab experiments and our winning entry in the ANA Avatar XPRIZE competition finals, where briefly trained judges solved a roughness-based selection task even without additional vision feedback. We publish our dataset used for training and evaluation together with our trained models to enable reproducibility of results.
△ Less
Submitted 16 October, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Robust Immersive Telepresence and Mobile Telemanipulation: NimbRo wins ANA Avatar XPRIZE Finals
Authors:
Max Schwarz,
Christian Lenz,
Raphael Memmesheimer,
Bastian Pätzold,
Andre Rochow,
Michael Schreiber,
Sven Behnke
Abstract:
Robotic avatar systems promise to bridge distances and reduce the need for travel. We present the updated NimbRo avatar system, winner of the $5M grand prize at the international ANA Avatar XPRIZE competition, which required participants to build intuitive and immersive robotic telepresence systems that could be operated by briefly trained operators. We describe key improvements for the finals, co…
▽ More
Robotic avatar systems promise to bridge distances and reduce the need for travel. We present the updated NimbRo avatar system, winner of the $5M grand prize at the international ANA Avatar XPRIZE competition, which required participants to build intuitive and immersive robotic telepresence systems that could be operated by briefly trained operators. We describe key improvements for the finals, compared to the system used in the semifinals: To operate without a power- and communications tether, we integrated a battery and a robust redundant wireless communication system. Video and audio data are compressed using low-latency HEVC and Opus codecs. We propose a new locomotion control device with tunable resistance force. To increase flexibility, the robot's upper-body height can be adjusted by the operator. We describe essential monitoring and robustness tools which enabled the success at the competition. Finally, we analyze our performance at the competition finals and discuss lessons learned.
△ Less
Submitted 6 December, 2023; v1 submitted 6 March, 2023;
originally announced March 2023.
-
Clustered Relational Thread-Modular Abstract Interpretation with Local Traces
Authors:
Michael Schwarz,
Simmo Saan,
Helmut Seidl,
Julian Erhard,
Vesal Vojdani
Abstract:
We construct novel thread-modular analyses that track relational information for potentially overlapping clusters of global variables - given that they are protected by common mutexes. We provide a framework to systematically increase the precision of clustered relational analyses by splitting control locations based on abstractions of local traces. As one instance, we obtain an analysis of dynami…
▽ More
We construct novel thread-modular analyses that track relational information for potentially overlapping clusters of global variables - given that they are protected by common mutexes. We provide a framework to systematically increase the precision of clustered relational analyses by splitting control locations based on abstractions of local traces. As one instance, we obtain an analysis of dynamic thread creation and joining. Interestingly, tracking less relational information for globals may result in higher precision. We consider the class of 2-decomposable domains that encompasses many weakly relational domains (e.g., Octagons). For these domains, we prove that maximal precision is attained already for clusters of globals of sizes at most 2.
△ Less
Submitted 16 January, 2023;
originally announced January 2023.
-
Interactive Abstract Interpretation: Reanalyzing Whole Programs for Cheap
Authors:
Julian Erhard,
Simmo Saan,
Sarah Tilscher,
Michael Schwarz,
Karoliine Holter,
Vesal Vojdani,
Helmut Seidl
Abstract:
To put static program analysis at the fingertips of the software developer, we propose a framework for interactive abstract interpretation. While providing sound analysis results, abstract interpretation in general can be quite costly. To achieve quick response times, we incrementalize the analysis infrastructure, including postprocessing, without necessitating any modifications to the analysis sp…
▽ More
To put static program analysis at the fingertips of the software developer, we propose a framework for interactive abstract interpretation. While providing sound analysis results, abstract interpretation in general can be quite costly. To achieve quick response times, we incrementalize the analysis infrastructure, including postprocessing, without necessitating any modifications to the analysis specifications themselves. We rely on the local generic fixpoint engine TD, which dynamically tracks dependencies, while exploring the unknowns contributing to answering an initial query. Lazy invalidation is employed for analysis results affected by program change. Dedicated improvements support the incremental analysis of concurrency deficiencies such as data-races. The framework has been implemented for multithreaded C within the static analyzer Goblint, using MagpieBridge to relay findings to IDEs. We evaluate our implementation w.r.t. the yard sticks of response time and consistency: formerly proven invariants should be retained - when they are not affected by the change. The results indicate that with our approach, a reanalysis after small changes only takes a fraction of from-scratch analysis time, while most of the precision is retained. We also provide examples of program development highlighting the usability of the overall approach.
△ Less
Submitted 25 November, 2022; v1 submitted 21 September, 2022;
originally announced September 2022.
-
Frequency propagation: Multi-mechanism learning in nonlinear physical networks
Authors:
Vidyesh Rao Anisetti,
A. Kandala,
B. Scellier,
J. M. Schwarz
Abstract:
We introduce frequency propagation, a learning algorithm for nonlinear physical networks. In a resistive electrical circuit with variable resistors, an activation current is applied at a set of input nodes at one frequency, and an error current is applied at a set of output nodes at another frequency. The voltage response of the circuit to these boundary currents is the superposition of an `activa…
▽ More
We introduce frequency propagation, a learning algorithm for nonlinear physical networks. In a resistive electrical circuit with variable resistors, an activation current is applied at a set of input nodes at one frequency, and an error current is applied at a set of output nodes at another frequency. The voltage response of the circuit to these boundary currents is the superposition of an `activation signal' and an `error signal' whose coefficients can be read in different frequencies of the frequency domain. Each conductance is updated proportionally to the product of the two coefficients. The learning rule is local and proved to perform gradient descent on a loss function. We argue that frequency propagation is an instance of a multi-mechanism learning strategy for physical networks, be it resistive, elastic, or flow networks. Multi-mechanism learning strategies incorporate at least two physical quantities, potentially governed by independent physical mechanisms, to act as activation and error signals in the training process. Locally available information about these two signals is then used to update the trainable parameters to perform gradient descent. We demonstrate how earlier work implementing learning via chemical signaling in flow networks also falls under the rubric of multi-mechanism learning.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
HyperDbg: Reinventing Hardware-Assisted Debugging (Extended Version)
Authors:
Mohammad Sina Karvandi,
MohammadHossein Gholamrezaei,
Saleh Khalaj Monfared,
Soroush Meghdadizanjani,
Behrooz Abbassi,
Ali Amini,
Reza Mortazavi,
Saeid Gorgin,
Dara Rahmati,
Michael Schwarz
Abstract:
Software analysis, debugging, and reverse engineering have a crucial impact in today's software industry. Efficient and stealthy debuggers are especially relevant for malware analysis. However, existing debugging platforms fail to address a transparent, effective, and high-performance low-level debugger due to their detectable fingerprints, complexity, and implementation restrictions. In this pape…
▽ More
Software analysis, debugging, and reverse engineering have a crucial impact in today's software industry. Efficient and stealthy debuggers are especially relevant for malware analysis. However, existing debugging platforms fail to address a transparent, effective, and high-performance low-level debugger due to their detectable fingerprints, complexity, and implementation restrictions. In this paper, we present HyperDbg, a new hypervisor-assisted debugger for high-performance and stealthy debugging of user and kernel applications. To accomplish this, HyperDbg relies on state-of-the-art hardware features available in today's CPUs, such as VT-x and extended page tables. In contrast to other widely used existing debuggers, we design HyperDbg using a custom hypervisor, making it independent of OS functionality or API. We propose hardware-based instruction-level emulation and OS-level API hooking via extended page tables to increase the stealthiness. Our results of the dynamic analysis of 10,853 malware samples show that HyperDbg's stealthiness allows debugging on average 22% and 26% more samples than WinDbg and x64dbg, respectively. Moreover, in contrast to existing debuggers, HyperDbg is not detected by any of the 13 tested packers and protectors. We improve the performance over other debuggers by deploying a VMX-compatible script engine, eliminating unnecessary context switches. Our experiment on three concrete debugging scenarios shows that compared to WinDbg as the only kernel debugger, HyperDbg performs step-in, conditional breaks, and syscall recording, 2.98x, 1319x, and 2018x faster, respectively. We finally show real-world applications, such as a 0-day analysis, structure reconstruction for reverse engineering, software performance analysis, and code-coverage analysis.
△ Less
Submitted 2 September, 2022; v1 submitted 29 May, 2022;
originally announced July 2022.
-
Probing for Passwords -- Privacy Implications of SSIDs in Probe Requests
Authors:
Johanna Ansohn McDougall,
Christian Burkert,
Daniel Demmler,
Monina Schwarz,
Vincent Hubbe,
Hannes Federrath
Abstract:
Probe requests help mobile devices discover active Wi-Fi networks. They often contain a multitude of data that can be used to identify and track devices and thereby their users. The past years have been a cat-and-mouse game of improving fingerprinting and introducing countermeasures against fingerprinting. This paper analyses the content of probe requests sent by mobile devices and operating syste…
▽ More
Probe requests help mobile devices discover active Wi-Fi networks. They often contain a multitude of data that can be used to identify and track devices and thereby their users. The past years have been a cat-and-mouse game of improving fingerprinting and introducing countermeasures against fingerprinting. This paper analyses the content of probe requests sent by mobile devices and operating systems in a field experiment. In it, we discover that users (probably by accident) input a wealth of data into the SSID field and find passwords, e-mail addresses, names and holiday locations. With these findings we underline that probe requests should be considered sensitive data and be well protected. To preserve user privacy, we suggest and evaluate a privacy-friendly hash-based construction of probe requests and improved user controls.
△ Less
Submitted 6 July, 2022; v1 submitted 8 June, 2022;
originally announced June 2022.
-
Predicting Physical Object Properties from Video
Authors:
Martin Link,
Max Schwarz,
Sven Behnke
Abstract:
We present a novel approach to estimating physical properties of objects from video. Our approach consists of a physics engine and a correction estimator. Starting from the initial observed state, object behavior is simulated forward in time. Based on the simulated and observed behavior, the correction estimator then determines refined physical parameters for each object. The method can be iterate…
▽ More
We present a novel approach to estimating physical properties of objects from video. Our approach consists of a physics engine and a correction estimator. Starting from the initial observed state, object behavior is simulated forward in time. Based on the simulated and observed behavior, the correction estimator then determines refined physical parameters for each object. The method can be iterated for increased precision. Our approach is generic, as it allows for the use of an arbitrary - not necessarily differentiable - physics engine and correction estimator. For the latter, we evaluate both gradient-free hyperparameter optimization and a deep convolutional neural network. We demonstrate faster and more robust convergence of the learned method in several simulated 2D scenarios focusing on bin situations.
△ Less
Submitted 2 June, 2022;
originally announced June 2022.
-
ConvPoseCNN2: Prediction and Refinement of Dense 6D Object Poses
Authors:
Arul Selvam Periyasamy,
Catherine Capellen,
Max Schwarz,
Sven Behnke
Abstract:
Object pose estimation is a key perceptual capability in robotics. We propose a fully-convolutional extension of the PoseCNN method, which densely predicts object translations and orientations. This has several advantages such as improving the spatial resolution of the orientation predictions -- useful in highly-cluttered arrangements, significant reduction in parameters by avoiding full connectiv…
▽ More
Object pose estimation is a key perceptual capability in robotics. We propose a fully-convolutional extension of the PoseCNN method, which densely predicts object translations and orientations. This has several advantages such as improving the spatial resolution of the orientation predictions -- useful in highly-cluttered arrangements, significant reduction in parameters by avoiding full connectivity, and fast inference. We propose and discuss several aggregation methods for dense orientation predictions that can be applied as a post-processing step, such as averaging and clustering techniques. We demonstrate that our method achieves the same accuracy as PoseCNN on the challenging YCB-Video dataset and provide a detailed ablation study of several variants of our method. Finally, we demonstrate that the model can be further improved by inserting an iterative refinement module into the middle of the network, which enforces consistency of the prediction.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
Learning by non-interfering feedback chemical signaling in physical networks
Authors:
Vidyesh Rao Anisetti,
B. Scellier,
J. M. Schwarz
Abstract:
Both non-neural and neural biological systems can learn. So rather than focusing on purely brain-like learning, efforts are underway to study learning in physical systems. Such efforts include equilibrium propagation (EP) and coupled learning (CL), which require storage of two different states-the free state and the perturbed state-during the learning process to retain information about gradients.…
▽ More
Both non-neural and neural biological systems can learn. So rather than focusing on purely brain-like learning, efforts are underway to study learning in physical systems. Such efforts include equilibrium propagation (EP) and coupled learning (CL), which require storage of two different states-the free state and the perturbed state-during the learning process to retain information about gradients. Inspired by slime mold, we propose a new learning algorithm rooted in chemical signaling that does not require storage of two different states. Rather, the output error information is encoded in a chemical signal that diffuses into the network in a similar way as the activation/feedforward signal. The steady state feedback chemical concentration, along with the activation signal, stores the required gradient information locally. We apply our algorithm using a physical, linear flow network and test it using the Iris data set with 93% accuracy. We also prove that our algorithm performs gradient descent. Finally, in addition to comparing our algorithm directly with EP and CL, we address the biological plausibility of the algorithm.
△ Less
Submitted 23 June, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
Synthetic-to-Real Domain Adaptation using Contrastive Unpaired Translation
Authors:
Benedikt T. Imbusch,
Max Schwarz,
Sven Behnke
Abstract:
The usefulness of deep learning models in robotics is largely dependent on the availability of training data. Manual annotation of training data is often infeasible. Synthetic data is a viable alternative, but suffers from domain gap. We propose a multi-step method to obtain training data without manual annotation effort: From 3D object meshes, we generate images using a modern synthesis pipeline.…
▽ More
The usefulness of deep learning models in robotics is largely dependent on the availability of training data. Manual annotation of training data is often infeasible. Synthetic data is a viable alternative, but suffers from domain gap. We propose a multi-step method to obtain training data without manual annotation effort: From 3D object meshes, we generate images using a modern synthesis pipeline. We utilize a state-of-the-art image-to-image translation method to adapt the synthetic images to the real domain, minimizing the domain gap in a learned manner. The translation network is trained from unpaired images, i.e. just requires an un-annotated collection of real images. The generated and refined images can then be used to train deep learning models for a particular task. We also propose and evaluate extensions to the translation method that further increase performance, such as patch-based training, which shortens training time and increases global consistency. We evaluate our method and demonstrate its effectiveness on two robotic datasets. We finally give insight into the learned refinement operations.
△ Less
Submitted 28 June, 2022; v1 submitted 17 March, 2022;
originally announced March 2022.
-
SFIP: Coarse-Grained Syscall-Flow-Integrity Protection in Modern Systems
Authors:
Claudio Canella,
Sebastian Dorn,
Daniel Gruss,
Michael Schwarz
Abstract:
Growing code bases of modern applications have led to a steady increase in the number of vulnerabilities. Control-Flow Integrity (CFI) is one promising mitigation that is more and more widely deployed and prevents numerous exploits. CFI focuses purely on one security domain. That is, transitions between user space and kernel space are not protected by CFI. Furthermore, if user space CFI is bypasse…
▽ More
Growing code bases of modern applications have led to a steady increase in the number of vulnerabilities. Control-Flow Integrity (CFI) is one promising mitigation that is more and more widely deployed and prevents numerous exploits. CFI focuses purely on one security domain. That is, transitions between user space and kernel space are not protected by CFI. Furthermore, if user space CFI is bypassed, the system and kernel interfaces remain unprotected, and an attacker can run arbitrary transitions.
In this paper, we introduce the concept of syscall-flow-integrity protection (SFIP) that complements the concept of CFI with integrity for user-kernel transitions. Our proof-of-concept implementation relies on static analysis during compilation to automatically extract possible syscall transitions. An application can opt-in to SFIP by providing the extracted information to the kernel for runtime enforcement. The concept is built on three fully-automated pillars: First, a syscall state machine, representing possible transitions according to a syscall digraph model. Second, a syscall-origin mapping, which maps syscalls to the locations at which they can occur. Third, an efficient enforcement of syscall-flow integrity in a modified Linux kernel. In our evaluation, we show that SFIP can be applied to large scale applications with minimal slowdowns. In a micro- and a macrobenchmark, it only introduces an overhead of 13.1% and 1.8%, respectively. In terms of security, we discuss and demonstrate its effectiveness in preventing control-flow-hijacking attacks in real-world applications. Finally, to highlight the reduction in attack surface, we perform an analysis of the state machines and syscall-origin mappings of several real-world applications. On average, SFIP decreases the number of possible transitions by 38.6% compared to seccomp and 90.9% when no protection is applied.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
Target Chase, Wall Building, and Fire Fighting: Autonomous UAVs of Team NimbRo at MBZIRC 2020
Authors:
Marius Beul,
Max Schwarz,
Jan Quenzel,
Malte Splietker,
Simon Bultmann,
Daniel Schleich,
Andre Rochow,
Dmytro Pavlichenko,
Radu Alexandru Rosu,
Patrick Lowin,
Bruno Scheider,
Michael Schreiber,
Finn Süberkrüb,
Sven Behnke
Abstract:
The Mohamed Bin Zayed International Robotics Challenge (MBZIRC) 2020 posed diverse challenges for unmanned aerial vehicles (UAVs). We present our four tailored UAVs, specifically developed for individual aerial-robot tasks of MBZIRC, including custom hardware- and software components.
In Challenge 1, a target UAV is pursued using a high-efficiency, onboard object detection pipeline to capture a…
▽ More
The Mohamed Bin Zayed International Robotics Challenge (MBZIRC) 2020 posed diverse challenges for unmanned aerial vehicles (UAVs). We present our four tailored UAVs, specifically developed for individual aerial-robot tasks of MBZIRC, including custom hardware- and software components.
In Challenge 1, a target UAV is pursued using a high-efficiency, onboard object detection pipeline to capture a ball from the target UAV. A second UAV uses a similar detection method to find and pop balloons scattered throughout the arena.
For Challenge 2, we demonstrate a larger UAV capable of autonomous aerial manipulation: Bricks are found and tracked from camera images. Subsequently, they are approached, picked, transported, and placed on a wall.
Finally, in Challenge 3, our UAV autonomously finds fires using LiDAR and thermal cameras. It extinguishes the fires with an onboard fire extinguisher.
While every robot features task-specific subsystems, all UAVs rely on a standard software stack developed for this particular and future competitions. We present our mostly open-source software solutions, including tools for system configuration, monitoring, robust wireless communication, high-level control, and agile trajectory generation. For solving the MBZIRC 2020 tasks, we advanced the state of the art in multiple research areas like machine vision and trajectory generation.
We present our scientific contributions that constitute the foundation for our algorithms and systems and analyze the results from the MBZIRC competition 2020 in Abu Dhabi, where our systems reached second place in the Grand Challenge. Furthermore, we discuss lessons learned from our participation in this complex robotic challenge.
△ Less
Submitted 11 January, 2022;
originally announced January 2022.
-
A Structured Analysis of Information Security Incidents in the Maritime Sector
Authors:
Monina Schwarz,
Matthias Marx,
Hannes Federrath
Abstract:
Cyber attacks in the maritime sector can have a major impact on world economy. However, the severity of this threat can be underestimated because many attacks remain unknown or unnoticed. We present an overview about publicly known cyber incidents in the maritime sector from the past 20 years. In total, we found 90 publicly reported attacks and 15 proof of concepts. Furthermore, we interviewed fiv…
▽ More
Cyber attacks in the maritime sector can have a major impact on world economy. However, the severity of this threat can be underestimated because many attacks remain unknown or unnoticed. We present an overview about publicly known cyber incidents in the maritime sector from the past 20 years. In total, we found 90 publicly reported attacks and 15 proof of concepts. Furthermore, we interviewed five IT security experts from the maritime sector. The interviews put the results of our research in perspective and confirm that our view is comprehensive. However, the interviewees highlight that there is a high dark figure of unreported incidents and argue that threat information sharing may potentially be helpful for attack prevention. From these results, we extract threats for players in the maritime sector.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
Semantic Interaction in Augmented Reality Environments for Microsoft HoloLens
Authors:
Peer Schüett,
Max Schwarz,
Sven Behnke
Abstract:
Augmented Reality is a promising technique for human-machine interaction. Especially in robotics, which always considers systems in their environment, it is highly beneficial to display visualizations and receive user input directly in exactly that environment. We explore this idea using the Microsoft HoloLens, with which we capture indoor environments and display interaction cues with known objec…
▽ More
Augmented Reality is a promising technique for human-machine interaction. Especially in robotics, which always considers systems in their environment, it is highly beneficial to display visualizations and receive user input directly in exactly that environment. We explore this idea using the Microsoft HoloLens, with which we capture indoor environments and display interaction cues with known object classes. The 3D mesh recorded by the HoloLens is annotated on-line, as the user moves, with semantic classes using a projective approach, which allows us to use a state-of-the-art 2D semantic segmentation method. The results are fused onto the mesh; prominent object segments are identified and displayed in 3D to the user. Finally, the user can trigger actions by gesturing at the object. We both present qualitative results and analyze the accuracy and performance of our method in detail on an indoor dataset.
△ Less
Submitted 18 November, 2021;
originally announced December 2021.
-
Domain Page-Table Isolation
Authors:
Claudio Canella,
Andreas Kogler,
Lukas Giner,
Daniel Gruss,
Michael Schwarz
Abstract:
Modern applications often consist of different security domains that require isolation from each other. While several solutions exist, most of them rely on specialized hardware, hardware extensions, or require less-efficient software instrumentation of the application.
In this paper, we propose Domain Page-Table Isolation (DPTI), a novel mechanism for hardware-enforced security domains that can…
▽ More
Modern applications often consist of different security domains that require isolation from each other. While several solutions exist, most of them rely on specialized hardware, hardware extensions, or require less-efficient software instrumentation of the application.
In this paper, we propose Domain Page-Table Isolation (DPTI), a novel mechanism for hardware-enforced security domains that can be readily used on commodity off-the-shelf CPUs. DPTI uses two novel techniques for dynamic, time-limited changes to the memory isolation at security-critical points, called memory freezing and stashing. We demonstrate the versatility and efficacy of DPTI in two scenarios: First, DPTI freezes or stashes memory to support faster and more fine-grained syscall filtering than state-of-the-art seccomp-bpf. With the provided memory safety guarantees, DPTI can even securely support deep argument filtering, such as string comparisons. Second, DPTI freezes or stashes memory to efficiently confine potentially untrusted SGX enclaves, outperforming existing solutions by 14.6%-22% while providing the same security guarantees. Our results show that DPTI is a viable mechanism to isolate domains within applications using only existing mechanisms available on modern CPUs, without relying on special hardware instructions or extensions
△ Less
Submitted 21 November, 2021;
originally announced November 2021.
-
Practical Timing Side Channel Attacks on Memory Compression
Authors:
Martin Schwarzl,
Pietro Borrello,
Gururaj Saileshwar,
Hanna Müller,
Michael Schwarz,
Daniel Gruss
Abstract:
Compression algorithms are widely used as they save memory without losing data. However, elimination of redundant symbols and sequences in data leads to a compression side channel. So far, compression attacks have only focused on the compression-ratio side channel, i.e., the size of compressed data,and largely targeted HTTP traffic and website content.
In this paper, we present the first memory…
▽ More
Compression algorithms are widely used as they save memory without losing data. However, elimination of redundant symbols and sequences in data leads to a compression side channel. So far, compression attacks have only focused on the compression-ratio side channel, i.e., the size of compressed data,and largely targeted HTTP traffic and website content.
In this paper, we present the first memory compression attacks exploiting timing side channels in compression algorithms, targeting a broad set of applications using compression. Our work systematically analyzes different compression algorithms and demonstrates timing leakage in each. We present Comprezzor,an evolutionary fuzzer which finds memory layouts that lead to amplified latency differences for decompression and therefore enable remote attacks. We demonstrate a remote covert channel exploiting small local timing differences transmitting on average 643.25 bit/h over 14 hops over the internet. We also demonstrate memory compression attacks that can leak secrets bytewise as well as in dictionary attacks in three different case studies. First, we show that an attacker can disclose secrets co-located and compressed with attacker data in PHP applications using Memcached. Second, we present an attack that leaks database records from PostgreSQL, managed by a Python-Flask application, over the internet. Third, we demonstrate an attack that leaks secrets from transparently compressed pages with ZRAM,the memory compression module in Linux. We conclude that memory-compression attacks are a practical threat.
△ Less
Submitted 16 November, 2021;
originally announced November 2021.
-
Dynamic Process Isolation
Authors:
Martin Schwarzl,
Pietro Borrello,
Andreas Kogler,
Kenton Varda,
Thomas Schuster,
Daniel Gruss,
Michael Schwarz
Abstract:
In the quest for efficiency and performance, edge-computing providers eliminate isolation boundaries between tenants, such as strict process isolation, and instead let them compute in a more lightweight multi-threaded single-process design. Edge-computing providers support a high number of tenants per machine to reduce the physical distance to customers without requiring a large number of machines…
▽ More
In the quest for efficiency and performance, edge-computing providers eliminate isolation boundaries between tenants, such as strict process isolation, and instead let them compute in a more lightweight multi-threaded single-process design. Edge-computing providers support a high number of tenants per machine to reduce the physical distance to customers without requiring a large number of machines. Isolation is provided by sandboxing mechanisms, e.g., tenants can only run sandboxed V8 JavaScript code. While this is as secure as a sandbox for software vulnerabilities, microarchitectural attacks can bypass these sandboxes.
In this paper, we show that it is possible to mount a Spectre attack on such a restricted environment, leaking secrets from co-located tenants. Cloudflare Workers is one of the top three edge-computing solutions and handles millions of HTTP requests per second worldwide across tens of thousands of web sites every day. We demonstrate a remote Spectre attack using amplification techniques in combination with a remote timing server, which is capable of leaking 120 bit/h. This motivates our main contribution, Dynamic Process Isolation, a process isolation mechanism that only isolates suspicious worker scripts following a detection mechanism. In the worst case of only false positives, Dynamic Process Isolation simply degrades to process isolation. Our proof-of-concept implementation augments a real-world cloud infrastructure framework, Cloudflare Workers, which is used in production at large scale. With a false-positive rate of only 0.61%, we demonstrate that our solution vastly outperforms strict process isolation in terms of performance. In our security evaluation, we show that Dynamic Process Isolation statistically provides the same security guarantees as strict process isolation, fully mitigating Spectre attacks between multiple tenants.
△ Less
Submitted 10 October, 2021;
originally announced October 2021.
-
NimbRo Avatar: Interactive Immersive Telepresence with Force-Feedback Telemanipulation
Authors:
Max Schwarz,
Christian Lenz,
Andre Rochow,
Michael Schreiber,
Sven Behnke
Abstract:
Robotic avatars promise immersive teleoperation with human-like manipulation and communication capabilities. We present such an avatar system, based on the key components of immersive 3D visualization and transparent force-feedback telemanipulation. Our avatar robot features an anthropomorphic bimanual arm configuration with dexterous hands. The remote human operator drives the arms and fingers th…
▽ More
Robotic avatars promise immersive teleoperation with human-like manipulation and communication capabilities. We present such an avatar system, based on the key components of immersive 3D visualization and transparent force-feedback telemanipulation. Our avatar robot features an anthropomorphic bimanual arm configuration with dexterous hands. The remote human operator drives the arms and fingers through an exoskeleton-based operator station, which provides force feedback both at the wrist and for each finger. The robot torso is mounted on a holonomic base, providing locomotion capability in typical indoor scenarios, controlled using a 3D rudder device. Finally, the robot features a 6D movable head with stereo cameras, which stream images to a VR HMD worn by the operator. Movement latency is hidden using spherical rendering. The head also carries a telepresence screen displaying a synthesized image of the operator with facial animation, which enables direct interaction with remote persons. We evaluate our system successfully both in a user study with untrained operators as well as a longer and more complex integrated mission. We discuss lessons learned from the trials and possible improvements.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
Low-Latency Immersive 6D Televisualization with Spherical Rendering
Authors:
Max Schwarz,
Sven Behnke
Abstract:
We present a method for real-time stereo scene capture and remote VR visualization that allows a human operator to freely move their head and thus intuitively control their perspective during teleoperation. The stereo camera is mounted on a 6D robotic arm, which follows the operator's head pose. Existing VR teleoperation systems either induce high latencies on head movements, leading to motion sic…
▽ More
We present a method for real-time stereo scene capture and remote VR visualization that allows a human operator to freely move their head and thus intuitively control their perspective during teleoperation. The stereo camera is mounted on a 6D robotic arm, which follows the operator's head pose. Existing VR teleoperation systems either induce high latencies on head movements, leading to motion sickness, or use scene reconstruction methods to allow re-rendering of the scene from different perspectives, which cannot handle dynamic scenes effectively. Instead, we present a decoupled approach which renders captured camera images as spheres, assuming constant distance. This allows very fast re-rendering on head pose changes while keeping the resulting temporary distortions during head translations small. We present qualitative examples, quantitative results in the form of lab experiments and a small user study, showing that our method outperforms other visualization methods.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
Improving Thread-Modular Abstract Interpretation
Authors:
Michael Schwarz,
Simmo Saan,
Helmut Seidl,
Kalmer Apinis,
Julian Erhard,
Vesal Vojdani
Abstract:
We give thread-modular non-relational value analyses as abstractions of a local trace semantics. The semantics as well as the analyses are formulated by means of global invariants and side-effecting constraint systems. We show that a generalization of the analysis provided by the static analyzer Goblint as well as a natural improvement of Antoine Miné's approach can be obtained as instances of thi…
▽ More
We give thread-modular non-relational value analyses as abstractions of a local trace semantics. The semantics as well as the analyses are formulated by means of global invariants and side-effecting constraint systems. We show that a generalization of the analysis provided by the static analyzer Goblint as well as a natural improvement of Antoine Miné's approach can be obtained as instances of this general scheme. We show that these two analyses are incomparable w.r.t. precision and provide a refinement which improves on both precision-wise. We also report on a preliminary experimental comparison of the given analyses on a meaningful suite of benchmarks.
△ Less
Submitted 17 August, 2021;
originally announced August 2021.
-
SynPick: A Dataset for Dynamic Bin Picking Scene Understanding
Authors:
Arul Selvam Periyasamy,
Max Schwarz,
Sven Behnke
Abstract:
We present SynPick, a synthetic dataset for dynamic scene understanding in bin-picking scenarios. In contrast to existing datasets, our dataset is both situated in a realistic industrial application domain -- inspired by the well-known Amazon Robotics Challenge (ARC) -- and features dynamic scenes with authentic picking actions as chosen by our picking heuristic developed for the ARC 2017. The dat…
▽ More
We present SynPick, a synthetic dataset for dynamic scene understanding in bin-picking scenarios. In contrast to existing datasets, our dataset is both situated in a realistic industrial application domain -- inspired by the well-known Amazon Robotics Challenge (ARC) -- and features dynamic scenes with authentic picking actions as chosen by our picking heuristic developed for the ARC 2017. The dataset is compatible with the popular BOP dataset format. We describe the dataset generation process in detail, including object arrangement generation and manipulation simulation using the NVIDIA PhysX physics engine. To cover a large action space, we perform untargeted and targeted picking actions, as well as random moving actions. To establish a baseline for object perception, a state-of-the-art pose estimation approach is evaluated on the dataset. We demonstrate the usefulness of tracking poses during manipulation instead of single-shot estimation even with a naive filtering approach. The generator source code and dataset are publicly available.
△ Less
Submitted 10 July, 2021;
originally announced July 2021.
-
FaDIV-Syn: Fast Depth-Independent View Synthesis using Soft Masks and Implicit Blending
Authors:
Andre Rochow,
Max Schwarz,
Michael Weinmann,
Sven Behnke
Abstract:
Novel view synthesis is required in many robotic applications, such as VR teleoperation and scene reconstruction. Existing methods are often too slow for these contexts, cannot handle dynamic scenes, and are limited by their explicit depth estimation stage, where incorrect depth predictions can lead to large projection errors. Our proposed method runs in real time on live streaming data and avoids…
▽ More
Novel view synthesis is required in many robotic applications, such as VR teleoperation and scene reconstruction. Existing methods are often too slow for these contexts, cannot handle dynamic scenes, and are limited by their explicit depth estimation stage, where incorrect depth predictions can lead to large projection errors. Our proposed method runs in real time on live streaming data and avoids explicit depth estimation by efficiently warping input images into the target frame for a range of assumed depth planes. The resulting plane sweep volume (PSV) is directly fed into our network, which first estimates soft PSV masks in a self-supervised manner, and then directly produces the novel output view. This improves efficiency and performance on transparent, reflective, thin, and feature-less scene parts. FaDIV-Syn can perform both interpolation and extrapolation tasks at 540p in real-time and outperforms state-of-the-art extrapolation methods on the large-scale RealEstate10k dataset. We thoroughly evaluate ablations, such as removing the Soft-Masking network, training from fewer examples as well as generalization to higher resolutions and stronger depth discretization. Our implementation is available.
△ Less
Submitted 13 May, 2022; v1 submitted 24 June, 2021;
originally announced June 2021.
-
Autonomous Fire Fighting with a UAV-UGV Team at MBZIRC 2020
Authors:
Jan Quenzel,
Malte Splietker,
Dmytro Pavlichenko,
Daniel Schleich,
Christian Lenz,
Max Schwarz,
Michael Schreiber,
Marius Beul,
Sven Behnke
Abstract:
Every day, burning buildings threaten the lives of occupants and first responders trying to save them. Quick action is of essence, but some areas might not be accessible or too dangerous to enter. Robotic systems have become a promising addition to firefighting, but at this stage, they are mostly manually controlled, which is error-prone and requires specially trained personal.
We present two sy…
▽ More
Every day, burning buildings threaten the lives of occupants and first responders trying to save them. Quick action is of essence, but some areas might not be accessible or too dangerous to enter. Robotic systems have become a promising addition to firefighting, but at this stage, they are mostly manually controlled, which is error-prone and requires specially trained personal.
We present two systems for autonomous firefighting from air and ground we developed for the Mohamed Bin Zayed International Robotics Challenge (MBZIRC) 2020. The systems use LiDAR for reliable localization within narrow, potentially GNSS-restricted environments while maneuvering close to obstacles. Measurements from LiDAR and thermal cameras are fused to track fires, while relative navigation ensures successful extinguishing.
We analyze and discuss our successful participation during the MBZIRC 2020, present further experiments, and provide insights into our lessons learned from the competition.
△ Less
Submitted 11 June, 2021;
originally announced June 2021.
-
Osiris: Automated Discovery of Microarchitectural Side Channels
Authors:
Daniel Weber,
Ahmad Ibrahim,
Hamed Nemati,
Michael Schwarz,
Christian Rossow
Abstract:
In the last years, a series of side channels have been discovered on CPUs. These side channels have been used in powerful attacks, e.g., on cryptographic implementations, or as building blocks in transient-execution attacks such as Spectre or Meltdown. However, in many cases, discovering side channels is still a tedious manual process.
In this paper, we present Osiris, a fuzzing-based framework…
▽ More
In the last years, a series of side channels have been discovered on CPUs. These side channels have been used in powerful attacks, e.g., on cryptographic implementations, or as building blocks in transient-execution attacks such as Spectre or Meltdown. However, in many cases, discovering side channels is still a tedious manual process.
In this paper, we present Osiris, a fuzzing-based framework to automatically discover microarchitectural side channels. Based on a machine-readable specification of a CPU's ISA, Osiris generates instruction-sequence triples and automatically tests whether they form a timing-based side channel. Furthermore, Osiris evaluates their usability as a side channel in transient-execution attacks, i.e., as the microarchitectural encoding for attacks like Spectre. In total, we discover four novel timing-based side channels on Intel and AMD CPUs. Based on these side channels, we demonstrate exploitation in three case studies. We show that our microarchitectural KASLR break using non-temporal loads, FlushConflict, even works on the new Intel Ice Lake and Comet Lake microarchitectures. We present a cross-core cross-VM covert channel that is not relying on the memory subsystem and transmits up to 1 kbit/s. We demonstrate this channel on the AWS cloud, showing that it is stealthy and noise resistant. Finally, we demonstrate Stream+Reload, a covert channel for transient-execution attacks that, on average, allows leaking 7.83 bytes within a transient window, improving state-of-the-art attacks that only leak up to 3 bytes.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
Team NimbRo's UGV Solution for Autonomous Wall Building and Fire Fighting at MBZIRC 2020
Authors:
Christian Lenz,
Jan Quenzel,
Arul Selvam Periyasamy,
Jan Razlaw,
Andre Rochow,
Malte Splietker,
Michael Schreiber,
Max Schwarz,
Finn Süberkrüb,
Sven Behnke
Abstract:
Autonomous robotic systems for various applications including transport, mobile manipulation, and disaster response are becoming more and more complex. Evaluating and analyzing such systems is challenging. Robotic competitions are designed to benchmark complete robotic systems on complex state-of-the-art tasks. Participants compete in defined scenarios under equal conditions. We present our UGV so…
▽ More
Autonomous robotic systems for various applications including transport, mobile manipulation, and disaster response are becoming more and more complex. Evaluating and analyzing such systems is challenging. Robotic competitions are designed to benchmark complete robotic systems on complex state-of-the-art tasks. Participants compete in defined scenarios under equal conditions. We present our UGV solution developed for the Mohamed Bin Zayed International Robotics Challenge 2020. Our hard- and software components to address the challenge tasks of wall building and fire fighting are integrated into a fully autonomous system. The robot consists of a wheeled omnidirectional base, a 6 DoF manipulator arm equipped with a magnetic gripper, a highly efficient storage system to transport box-shaped objects, and a water spraying system to fight fires. The robot perceives its environment using 3D LiDAR as well as RGB and thermal camera-based perception modules, is capable of picking box-shaped objects and constructing a pre-defined wall structure, as well as detecting and localizing heat sources in order to extinguish potential fires. A high-level planner solves the challenge tasks using the robot skills. We analyze and discuss our successful participation during the MBZIRC 2020 finals, present further experiments, and provide insights to our lessons learned.
△ Less
Submitted 27 May, 2021; v1 submitted 25 May, 2021;
originally announced May 2021.
-
Automating Seccomp Filter Generation for Linux Applications
Authors:
Claudio Canella,
Mario Werner,
Daniel Gruss,
Michael Schwarz
Abstract:
Software vulnerabilities in applications undermine the security of applications. By blocking unused functionality, the impact of potential exploits can be reduced. While seccomp provides a solution for filtering syscalls, it requires manual implementation of filter rules for each individual application. Recent work has investigated automated approaches for detecting and installing the necessary fi…
▽ More
Software vulnerabilities in applications undermine the security of applications. By blocking unused functionality, the impact of potential exploits can be reduced. While seccomp provides a solution for filtering syscalls, it requires manual implementation of filter rules for each individual application. Recent work has investigated automated approaches for detecting and installing the necessary filter rules. However, as we show, these approaches make assumptions that are not necessary or require overly time-consuming analysis.
In this paper, we propose Chestnut, an automated approach for generating strict syscall filters for Linux userspace applications with lower requirements and limitations. Chestnut comprises two phases, with the first phase consisting of two static components, i.e., a compiler and a binary analyzer, that extract the used syscalls during compilation or in an analysis of the binary. The compiler-based approach of Chestnut is up to factor 73 faster than previous approaches without affecting the accuracy adversely. On the binary analysis level, we demonstrate that the requirement of position-independent binaries of related work is not needed, enlarging the set of applications for which Chestnut is usable. In an optional second phase, Chestnut provides a dynamic refinement tool that allows restricting the set of allowed syscalls further. We demonstrate that Chestnut on average blocks 302 syscalls (86.5%) via the compiler and 288 (82.5%) using the binary-level analysis on a set of 18 widely used applications. We found that Chestnut blocks the dangerous exec syscall in 50% and 77.7% of the tested applications using the compiler- and binary-based approach, respectively. For the tested applications, Chestnut prevents exploitation of more than 62% of the 175 CVEs that target the kernel via syscalls. Finally, we perform a 6 month long-term study of a sandboxed Nginx server.
△ Less
Submitted 4 December, 2020;
originally announced December 2020.
-
Autonomous Wall Building with a UGV-UAV Team at MBZIRC 2020
Authors:
Christian Lenz,
Max Schwarz,
Andre Rochow,
Jan Razlaw,
Arul Selvam Periyasamy,
Michael Schreiber,
Sven Behnke
Abstract:
Constructing large structures with robots is a challenging task with many potential applications that requires mobile manipulation capabilities. We present two systems for autonomous wall building that we developed for the Mohamed Bin Zayed International Robotics Challenge 2020. Both systems autonomously perceive their environment, find bricks, and build a predefined wall structure. While the UGV…
▽ More
Constructing large structures with robots is a challenging task with many potential applications that requires mobile manipulation capabilities. We present two systems for autonomous wall building that we developed for the Mohamed Bin Zayed International Robotics Challenge 2020. Both systems autonomously perceive their environment, find bricks, and build a predefined wall structure. While the UGV uses a 3D LiDAR-based perception system which measures brick poses with high precision, the UAV employs a real-time camera-based system for visual servoing. We report results and insights from our successful participation at the MBZIRC 2020 Finals, additional lab experiments, and discuss the lessons learned from the competition.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Speculative Dereferencing of Registers:Reviving Foreshadow
Authors:
Martin Schwarzl,
Thomas Schuster,
Michael Schwarz,
Daniel Gruss
Abstract:
Since 2016, multiple microarchitectural attacks have exploited an effect that is attributed to prefetching. These works observe that certain user-space operations can fetch kernel addresses into the cache. Fetching user-inaccessible data into the cache enables KASLR breaks and assists various Meltdown-type attacks, especially Foreshadow.
In this paper, we provide a systematic analysis of the roo…
▽ More
Since 2016, multiple microarchitectural attacks have exploited an effect that is attributed to prefetching. These works observe that certain user-space operations can fetch kernel addresses into the cache. Fetching user-inaccessible data into the cache enables KASLR breaks and assists various Meltdown-type attacks, especially Foreshadow.
In this paper, we provide a systematic analysis of the root cause of this prefetching effect. While we confirm the empirical results of previous papers, we show that the attribution to a prefetching mechanism is fundamentally incorrect in all previous papers describing or exploiting this effect. In particular, neither the prefetch instruction nor other user-space instructions actually prefetch kernel addresses into the cache, leading to incorrect conclusions and ineffectiveness of proposed defenses. The effect exploited in all of these papers is, in fact, caused by speculative dereferencing of user-space registers in the kernel. Hence, mitigation techniques such as KAISER do not eliminate this leakage as previously believed. Beyond our thorough analysis of these previous works, we also demonstrate new attacks enabled by understanding the root cause, namely an address-translation attack in more restricted contexts, direct leakage of register values in certain scenarios, and the first end-to-end Foreshadow (L1TF) exploit targeting non-L1 data. The latter is effective even with the recommended Foreshadow mitigations enabled and thus revives the Foreshadow attack. We demonstrate that these dereferencing effects exist even on the most recent Intel CPUs with the latest hardware mitigations, and on CPUs previously believed to be unaffected, i.e., ARM, IBM, and AMD CPUs.
△ Less
Submitted 5 August, 2020;
originally announced August 2020.
-
Mind the GAP: Security & Privacy Risks of Contact Tracing Apps
Authors:
Lars Baumgärtner,
Alexandra Dmitrienko,
Bernd Freisleben,
Alexander Gruler,
Jonas Höchst,
Joshua Kühlberg,
Mira Mezini,
Richard Mitev,
Markus Miettinen,
Anel Muhamedagic,
Thien Duc Nguyen,
Alvar Penning,
Dermot Frederik Pustelnik,
Filipp Roos,
Ahmad-Reza Sadeghi,
Michael Schwarz,
Christian Uhl
Abstract:
Google and Apple have jointly provided an API for exposure notification in order to implement decentralized contract tracing apps using Bluetooth Low Energy, the so-called "Google/Apple Proposal", which we abbreviate by "GAP". We demonstrate that in real-world scenarios the current GAP design is vulnerable to (i) profiling and possibly de-anonymizing infected persons, and (ii) relay-based wormhole…
▽ More
Google and Apple have jointly provided an API for exposure notification in order to implement decentralized contract tracing apps using Bluetooth Low Energy, the so-called "Google/Apple Proposal", which we abbreviate by "GAP". We demonstrate that in real-world scenarios the current GAP design is vulnerable to (i) profiling and possibly de-anonymizing infected persons, and (ii) relay-based wormhole attacks that basically can generate fake contacts with the potential of affecting the accuracy of an app-based contact tracing system. For both types of attack, we have built tools that can easily be used on mobile phones or Raspberry Pis (e.g., Bluetooth sniffers). The goal of our work is to perform a reality check towards possibly providing empirical real-world evidence for these two privacy and security risks. We hope that our findings provide valuable input for developing secure and privacy-preserving digital contact tracing systems.
△ Less
Submitted 6 November, 2020; v1 submitted 10 June, 2020;
originally announced June 2020.
-
Stillleben: Realistic Scene Synthesis for Deep Learning in Robotics
Authors:
Max Schwarz,
Sven Behnke
Abstract:
Training data is the key ingredient for deep learning approaches, but difficult to obtain for the specialized domains often encountered in robotics. We describe a synthesis pipeline capable of producing training data for cluttered scene perception tasks such as semantic segmentation, object detection, and correspondence or pose estimation. Our approach arranges object meshes in physically realisti…
▽ More
Training data is the key ingredient for deep learning approaches, but difficult to obtain for the specialized domains often encountered in robotics. We describe a synthesis pipeline capable of producing training data for cluttered scene perception tasks such as semantic segmentation, object detection, and correspondence or pose estimation. Our approach arranges object meshes in physically realistic, dense scenes using physics simulation. The arranged scenes are rendered using high-quality rasterization with randomized appearance and material parameters. Noise and other transformations introduced by the camera sensors are simulated. Our pipeline can be run online during training of a deep neural network, yielding applications in life-long learning and in iterative render-and-compare approaches. We demonstrate the usability by learning semantic segmentation on the challenging YCB-Video dataset without actually using any training frames, where our method achieves performance comparable to a conventionally trained model. Additionally, we show successful application in a real-world regrasping system.
△ Less
Submitted 12 May, 2020;
originally announced May 2020.
-
Visual Descriptor Learning from Monocular Video
Authors:
Umashankar Deekshith,
Nishit Gajjar,
Max Schwarz,
Sven Behnke
Abstract:
Correspondence estimation is one of the most widely researched and yet only partially solved area of computer vision with many applications in tracking, mapping, recognition of objects and environment. In this paper, we propose a novel way to estimate dense correspondence on an RGB image where visual descriptors are learned from video examples by training a fully convolutional network. Most deep l…
▽ More
Correspondence estimation is one of the most widely researched and yet only partially solved area of computer vision with many applications in tracking, mapping, recognition of objects and environment. In this paper, we propose a novel way to estimate dense correspondence on an RGB image where visual descriptors are learned from video examples by training a fully convolutional network. Most deep learning methods solve this by training the network with a large set of expensive labeled data or perform labeling through strong 3D generative models using RGB-D videos. Our method learns from RGB videos using contrastive loss, where relative labeling is estimated from optical flow. We demonstrate the functionality in a quantitative analysis on rendered videos, where ground truth information is available. Not only does the method perform well on test data with the same background, it also generalizes to situations with a new background. The descriptors learned are unique and the representations determined by the network are global. We further show the applicability of the method to real-world videos.
△ Less
Submitted 15 April, 2020;
originally announced April 2020.
-
ConvPoseCNN: Dense Convolutional 6D Object Pose Estimation
Authors:
Catherine Capellen,
Max Schwarz,
Sven Behnke
Abstract:
6D object pose estimation is a prerequisite for many applications. In recent years, monocular pose estimation has attracted much research interest because it does not need depth measurements. In this work, we introduce ConvPoseCNN, a fully convolutional architecture that avoids cutting out individual objects. Instead we propose pixel-wise, dense prediction of both translation and orientation compo…
▽ More
6D object pose estimation is a prerequisite for many applications. In recent years, monocular pose estimation has attracted much research interest because it does not need depth measurements. In this work, we introduce ConvPoseCNN, a fully convolutional architecture that avoids cutting out individual objects. Instead we propose pixel-wise, dense prediction of both translation and orientation components of the object pose, where the dense orientation is represented in Quaternion form. We present different approaches for aggregation of the dense orientation predictions, including averaging and clustering schemes. We evaluate ConvPoseCNN on the challenging YCB-Video Dataset, where we show that the approach has far fewer parameters and trains faster than comparable methods without sacrificing accuracy. Furthermore, our results indicate that the dense orientation prediction implicitly learns to attend to trustworthy, occlusion-free, and feature-rich object regions.
△ Less
Submitted 16 December, 2019;
originally announced December 2019.
-
Refining 6D Object Pose Predictions using Abstract Render-and-Compare
Authors:
Arul Selvam Periyasamy,
Max Schwarz,
Sven Behnke
Abstract:
Robotic systems often require precise scene analysis capabilities, especially in unstructured, cluttered situations, as occurring in human-made environments. While current deep-learning based methods yield good estimates of object poses, they often struggle with large amounts of occlusion and do not take inter-object effects into account. Vision as inverse graphics is a promising concept for detai…
▽ More
Robotic systems often require precise scene analysis capabilities, especially in unstructured, cluttered situations, as occurring in human-made environments. While current deep-learning based methods yield good estimates of object poses, they often struggle with large amounts of occlusion and do not take inter-object effects into account. Vision as inverse graphics is a promising concept for detailed scene analysis. A key element for this idea is a method for inferring scene parameter updates from the rasterized 2D scene. However, the rasterization process is notoriously difficult to invert, both due to the projection and occlusion process, but also due to secondary effects such as lighting or reflections. We propose to remove the latter from the process by mapping the rasterized image into an abstract feature space learned in a self-supervised way from pixel correspondences. Using only a light-weight inverse rendering module, this allows us to refine 6D object pose estimations in highly cluttered scenes by optimizing a simple pixel-wise difference in the abstract image representation. We evaluate our approach on the challenging YCB-Video dataset, where it yields large improvements and demonstrates a large basin of attraction towards the correct object poses.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Autonomous Bimanual Functional Regrasping of Novel Object Class Instances
Authors:
Dmytro Pavlichenko,
Diego Rodriguez,
Christian Lenz,
Max Schwarz,
Sven Behnke
Abstract:
In human-made scenarios, robots need to be able to fully operate objects in their surroundings, i.e., objects are required to be functionally grasped rather than only picked. This imposes very strict constraints on the object pose such that a direct grasp can be performed. Inspired by the anthropomorphic nature of humanoid robots, we propose an approach that first grasps an object with one hand, o…
▽ More
In human-made scenarios, robots need to be able to fully operate objects in their surroundings, i.e., objects are required to be functionally grasped rather than only picked. This imposes very strict constraints on the object pose such that a direct grasp can be performed. Inspired by the anthropomorphic nature of humanoid robots, we propose an approach that first grasps an object with one hand, obtaining full control over its pose, and performs the functional grasp with the second hand subsequently. Thus, we develop a fully autonomous pipeline for dual-arm functional regrasping of novel familiar objects, i.e., objects never seen before that belong to a known object category, e.g., spray bottles. This process involves semantic segmentation, object pose estimation, non-rigid mesh registration, grasp sampling, handover pose generation and in-hand pose refinement. The latter is used to compensate for the unpredictable object movement during the first grasp. The approach is applied to a human-like upper body. To the best knowledge of the authors, this is the first system that exhibits autonomous bimanual functional regrasping capabilities. We demonstrate that our system yields reliable success rates and can be applied on-line to real-world tasks using only one off-the-shelf RGB-D sensor.
△ Less
Submitted 1 October, 2019;
originally announced October 2019.