Search | arXiv e-print repository

Towards Assessing Compliant Robotic Grasping from First-Object Perspective via Instrumented Objects

Authors: Maceon Knopke, Liguo Zhu, Peter Corke, Fangyi Zhang

Abstract: Grasping compliant objects is difficult for robots - applying too little force may cause the grasp to fail, while too much force may lead to object damage. A robot needs to apply the right amount of force to quickly and confidently grasp the objects so that it can perform the required task. Although some methods have been proposed to tackle this issue, performance assessment is still a problem for… ▽ More Grasping compliant objects is difficult for robots - applying too little force may cause the grasp to fail, while too much force may lead to object damage. A robot needs to apply the right amount of force to quickly and confidently grasp the objects so that it can perform the required task. Although some methods have been proposed to tackle this issue, performance assessment is still a problem for directly measuring object property changes and possible damage. To fill the gap, a new concept is introduced in this paper to assess compliant robotic grasping using instrumented objects. A proof-of-concept design is proposed to measure the force applied on a cuboid object from a first-object perspective. The design can detect multiple contact locations and applied forces on its surface by using multiple embedded 3D Hall sensors to detect deformation relative to embedded magnets. The contact estimation is achieved by interpreting the Hall-effect signals using neural networks. In comprehensive experiments, the design achieved good performance in estimating contacts from each single face of the cuboid and decent performance in detecting contacts from multiple faces when being used to evaluate grasping from a parallel jaw gripper, demonstrating the effectiveness of the design and the feasibility of the concept. △ Less

Submitted 14 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: Under review for RA-L

arXiv:2309.09393 [pdf, other]

Reactive Base Control for On-The-Move Mobile Manipulation in Dynamic Environments

Authors: Ben Burgess-Limerick, Jesse Haviland, Chris Lehnert, Peter Corke

Abstract: We present a reactive base control method that enables high performance mobile manipulation on-the-move in environments with static and dynamic obstacles. Performing manipulation tasks while the mobile base remains in motion can significantly decrease the time required to perform multi-step tasks, as well as improve the gracefulness of the robot's motion. Existing approaches to manipulation on-the… ▽ More We present a reactive base control method that enables high performance mobile manipulation on-the-move in environments with static and dynamic obstacles. Performing manipulation tasks while the mobile base remains in motion can significantly decrease the time required to perform multi-step tasks, as well as improve the gracefulness of the robot's motion. Existing approaches to manipulation on-the-move either ignore the obstacle avoidance problem or rely on the execution of planned trajectories, which is not suitable in environments with dynamic objects and obstacles. The presented controller addresses both of these deficiencies and demonstrates robust performance of pick-and-place tasks in dynamic environments. The performance is evaluated on several simulated and real-world tasks. On a real-world task with static obstacles, we outperform an existing method by 48\% in terms of total task time. Further, we present real-world examples of our robot performing manipulation tasks on-the-move while avoiding a second autonomous robot in the workspace. See https://benburgesslimerick.github.io/MotM-BaseControl for supplementary materials. △ Less

Submitted 17 September, 2023; originally announced September 2023.

arXiv:2308.00514 [pdf, other]

Understanding URDF: A Dataset and Analysis

Authors: Daniella Tola, Peter Corke

Abstract: As the complexity of robot systems increases, it becomes more effective to simulate them before deployment. To do this, a model of the robot's kinematics or dynamics is required, and the most commonly used format is the Unified Robot Description Format (URDF). This article presents, to our knowledge, the first dataset of URDF files from various industrial and research organizations, with metadata… ▽ More As the complexity of robot systems increases, it becomes more effective to simulate them before deployment. To do this, a model of the robot's kinematics or dynamics is required, and the most commonly used format is the Unified Robot Description Format (URDF). This article presents, to our knowledge, the first dataset of URDF files from various industrial and research organizations, with metadata describing each robot, its type, manufacturer, and the source of the model. The dataset contains 322 URDF files of which 195 are unique robot models, meaning the excess URDFs are either of a robot that is multiply defined across sources or URDF variants of the same robot. We analyze the files in the dataset, where we, among other things, provide information on how they were generated, which mesh file types are most commonly used, and compare models of multiply defined robots. The intention of this article is to build a foundation of knowledge on URDF and how it is used based on publicly available URDF files. Publishing the dataset, analysis, and the scripts and tools used enables others using, researching or developing URDFs to easily access this data and use it in their own work. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2307.15363 [pdf, other]

doi 10.1145/3570731

Robotic Vision for Human-Robot Interaction and Collaboration: A Survey and Systematic Review

Authors: Nicole Robinson, Brendan Tidd, Dylan Campbell, Dana Kulić, Peter Corke

Abstract: Robotic vision for human-robot interaction and collaboration is a critical process for robots to collect and interpret detailed information related to human actions, goals, and preferences, enabling robots to provide more useful services to people. This survey and systematic review presents a comprehensive analysis on robotic vision in human-robot interaction and collaboration over the last 10 yea… ▽ More Robotic vision for human-robot interaction and collaboration is a critical process for robots to collect and interpret detailed information related to human actions, goals, and preferences, enabling robots to provide more useful services to people. This survey and systematic review presents a comprehensive analysis on robotic vision in human-robot interaction and collaboration over the last 10 years. From a detailed search of 3850 articles, systematic extraction and evaluation was used to identify and explore 310 papers in depth. These papers described robots with some level of autonomy using robotic vision for locomotion, manipulation and/or visual communication to collaborate or interact with people. This paper provides an in-depth analysis of current trends, common domains, methods and procedures, technical processes, data sets and models, experimental testing, sample populations, performance metrics and future challenges. This manuscript found that robotic vision was often used in action and gesture recognition, robot movement in human spaces, object handover and collaborative actions, social communication and learning from demonstration. Few high-impact and novel techniques from the computer vision field had been translated into human-robot interaction and collaboration. Overall, notable advancements have been made on how to develop and deploy robots to assist people. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Journal ref: ACM Transactions on Human-Robot Interaction (2023) Volume 12 Issue 1 Article No 12 pp 1-66

arXiv:2305.08351 [pdf, other]

Enabling Failure Recovery for On-The-Move Mobile Manipulation

Authors: Ben Burgess-Limerick, Chris Lehnert Jurgen Leitner, Peter Corke

Abstract: We present a robot base placement and control method that enables a mobile manipulator to gracefully recover from manipulation failures while performing tasks on-the-move. A mobile manipulator in motion has a limited window to complete a task, unlike when stationary where it can make repeated attempts until successful. Existing approaches to manipulation on-the-move are typically based on open-loo… ▽ More We present a robot base placement and control method that enables a mobile manipulator to gracefully recover from manipulation failures while performing tasks on-the-move. A mobile manipulator in motion has a limited window to complete a task, unlike when stationary where it can make repeated attempts until successful. Existing approaches to manipulation on-the-move are typically based on open-loop execution of planned trajectories which does not allow the base controller to react to manipulation failures, slowing down or stopping as required. To overcome this limitation, we present a reactive base control method that repeatedly evaluates the best base placement given the robot's current state, the immediate manipulation task, as well as the next part of a multi-step task. The result is a system that retains the reliability of traditional mobile manipulation approaches where the base comes to a stop, but leverages the performance gains available by performing manipulation on-the-move. The controller keeps the base in range of the target for as long as required to recover from manipulation failures while making as much progress as possible toward the next objective. See https://benburgesslimerick.github.io/MotM-FailureRecovery for videos of experiments. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: Accepted for Workshop on Robot Execution Failures and Failure Management Strategies at ICRA 2023

arXiv:2303.16408 [pdf, other]

The Need for Inherently Privacy-Preserving Vision in Trustworthy Autonomous Systems

Authors: Adam K. Taras, Niko Suenderhauf, Peter Corke, Donald G. Dansereau

Abstract: Vision is a popular and effective sensor for robotics from which we can derive rich information about the environment: the geometry and semantics of the scene, as well as the age, gender, identity, activity and even emotional state of humans within that scene. This raises important questions about the reach, lifespan, and potential misuse of this information. This paper is a call to action to cons… ▽ More Vision is a popular and effective sensor for robotics from which we can derive rich information about the environment: the geometry and semantics of the scene, as well as the age, gender, identity, activity and even emotional state of humans within that scene. This raises important questions about the reach, lifespan, and potential misuse of this information. This paper is a call to action to consider privacy in the context of robotic vision. We propose a specific form privacy preservation in which no images are captured or could be reconstructed by an attacker even with full remote access. We present a set of principles by which such systems can be designed, and through a case study in localisation demonstrate in simulation a specific implementation that delivers an important robotic capability in an inherently privacy-preserving manner. This is a first step, and we hope to inspire future works that expand the range of applications open to sighted robotic systems. △ Less

Submitted 10 May, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

Comments: 7 pages, 6 figures

arXiv:2303.06656 [pdf, other]

Re-evaluating Parallel Finger-tip Tactile Sensing for Inferring Object Adjectives: An Empirical Study

Authors: Fangyi Zhang, Peter Corke

Abstract: Finger-tip tactile sensors are increasingly used for robotic sensing to establish stable grasps and to infer object properties. Promising performance has been shown in a number of works for inferring adjectives that describe the object, but there remains a question about how each taxel contributes to the performance. This paper explores this question with empirical experiments, leading insights fo… ▽ More Finger-tip tactile sensors are increasingly used for robotic sensing to establish stable grasps and to infer object properties. Promising performance has been shown in a number of works for inferring adjectives that describe the object, but there remains a question about how each taxel contributes to the performance. This paper explores this question with empirical experiments, leading insights for future finger-tip tactile sensor usage and design. △ Less

Submitted 12 March, 2023; originally announced March 2023.

Comments: under review for IROS 2023

arXiv:2302.13442 [pdf, other]

Understanding URDF: A Survey Based on User Experience

Authors: Daniella Tola, Peter Corke

Abstract: With the increasing complexity of robot systems, it is necessary to simulate them before deployment. To do this, a model of the robot's kinematics or dynamics is required. One of the most commonly used formats for modeling robots is the Unified Robot Description Format (URDF). The goal of this article is to understand how URDF is currently used, what challenges people face when working with it, an… ▽ More With the increasing complexity of robot systems, it is necessary to simulate them before deployment. To do this, a model of the robot's kinematics or dynamics is required. One of the most commonly used formats for modeling robots is the Unified Robot Description Format (URDF). The goal of this article is to understand how URDF is currently used, what challenges people face when working with it, and how the community sees the future of URDF. The outcome can potentially be used to guide future research. This article presents the results from a survey based on 510 anonymous responses from robotic developers of different backgrounds and levels of experience. We find that 96.8% of the participants have simulated robots before, and of them 95.5% had used URDF. We identify a number of challenges and limitations that complicate the use of URDF, such as the inability to model parallel linkages and closed-chain systems, no real standard, lack of documentation, and a limited number of dynamic parameters to model the robot. Future perspectives for URDF are also determined, where 53.5% believe URDF will be more commonly used in the future, 12.2% believe other standards or tools will make URDF obsolete, and 34.4% are not sure what the future of URDF will be. Most participants agree that there is a need for better tooling to ensure URDF's future use. △ Less

Submitted 12 March, 2023; v1 submitted 26 February, 2023; originally announced February 2023.

Comments: Corrected errors in figures 10 and 11, and added additional information in paragraph "Methods to obtain URDF (S14, 439 responses)" on page 5

arXiv:2212.06991 [pdf, other]

An Architecture for Reactive Mobile Manipulation On-The-Move

Authors: Ben Burgess-Limerick, Chris Lehnert, Jurgen Leitner, Peter Corke

Abstract: We present a generalised architecture for reactive mobile manipulation while a robot's base is in motion toward the next objective in a high-level task. By performing tasks on-the-move, overall cycle time is reduced compared to methods where the base pauses during manipulation. Reactive control of the manipulator enables grasping objects with unpredictable motion while improving robustness against… ▽ More We present a generalised architecture for reactive mobile manipulation while a robot's base is in motion toward the next objective in a high-level task. By performing tasks on-the-move, overall cycle time is reduced compared to methods where the base pauses during manipulation. Reactive control of the manipulator enables grasping objects with unpredictable motion while improving robustness against perception errors, environmental disturbances, and inaccurate robot control compared to open-loop, trajectory-based planning approaches. We present an example implementation of the architecture and investigate the performance on a series of pick and place tasks with both static and dynamic objects and compare the performance to baseline methods. Our method demonstrated a real-world success rate of over 99%, failing in only a single trial from 120 attempts with a physical robot system. The architecture is further demonstrated on other mobile manipulator platforms in simulation. Our approach reduces task time by up to 48%, while also improving reliability, gracefulness, and predictability compared to existing architectures for mobile manipulation. See https://benburgesslimerick.github.io/ManipulationOnTheMove for supplementary materials. △ Less

Submitted 13 December, 2022; originally announced December 2022.

arXiv:2211.02832 [pdf, other]

Learning Fabric Manipulation in the Real World with Human Videos

Authors: Robert Lee, Jad Abou-Chakra, Fangyi Zhang, Peter Corke

Abstract: Fabric manipulation is a long-standing challenge in robotics due to the enormous state space and complex dynamics. Learning approaches stand out as promising for this domain as they allow us to learn behaviours directly from data. Most prior methods however rely heavily on simulation, which is still limited by the large sim-to-real gap of deformable objects or rely on large datasets. A promising a… ▽ More Fabric manipulation is a long-standing challenge in robotics due to the enormous state space and complex dynamics. Learning approaches stand out as promising for this domain as they allow us to learn behaviours directly from data. Most prior methods however rely heavily on simulation, which is still limited by the large sim-to-real gap of deformable objects or rely on large datasets. A promising alternative is to learn fabric manipulation directly from watching humans perform the task. In this work, we explore how demonstrations for fabric manipulation tasks can be collected directly by humans, providing an extremely natural and fast data collection pipeline. Then, using only a handful of such demonstrations, we show how a pick-and-place policy can be learned and deployed on a real robot, without any robot data collection at all. We demonstrate our approach on a fabric folding task, showing that our policy can reliably reach folded states from crumpled initial configurations. Videos are available at: https://sites.google.com/view/foldingbyhand △ Less

Submitted 12 November, 2022; v1 submitted 5 November, 2022; originally announced November 2022.

arXiv:2207.01796 [pdf, ps, other]

doi 10.1109/MRA.2023.3270228

Manipulator Differential Kinematics: Part 1: Kinematics, Velocity, and Applications

Authors: Jesse Haviland, Peter Corke

Abstract: Manipulator kinematics is concerned with the motion of each link within a manipulator without considering mass or force. In this article, which is the first in a two-part tutorial, we provide an introduction to modelling manipulator kinematics using the elementary transform sequence (ETS). Then we formulate the first-order differential kinematics, which leads to the manipulator Jacobian, which is… ▽ More Manipulator kinematics is concerned with the motion of each link within a manipulator without considering mass or force. In this article, which is the first in a two-part tutorial, we provide an introduction to modelling manipulator kinematics using the elementary transform sequence (ETS). Then we formulate the first-order differential kinematics, which leads to the manipulator Jacobian, which is the basis for velocity control and inverse kinematics. We describe essential classical techniques which rely on the manipulator Jacobian before exhibiting some contemporary applications. Part II of this tutorial provides a formulation of second and higher-order differential kinematics, introduces the manipulator Hessian, and illustrates advanced techniques, some of which improve the performance of techniques demonstrated in Part I. We have provided Jupyter Notebooks to accompany each section within this tutorial. The Notebooks are written in Python code and use the Robotics Toolbox for Python, and the Swift Simulator to provide examples and implementations of algorithms. While not absolutely essential, for the most engaging and informative experience, we recommend working through the Jupyter Notebooks while reading this article. The Notebooks and setup instructions can be accessed at https://github.com/jhavl/dkt. △ Less

Submitted 16 May, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: See associated Jupyter Notebooks https://github.com/jhavl/dkt. Accepted for publication in IEEE Robotics and Automation Magazine. Published version available at https://doi.org/10.1109/MRA.2023.3270228. arXiv admin note: text overlap with arXiv:2010.08696

arXiv:2207.01794 [pdf, ps, other]

doi 10.1109/MRA.2023.3270221

Manipulator Differential Kinematics: Part 2: Acceleration and Advanced Applications

Authors: Jesse Haviland, Peter Corke

Abstract: This is the second and final article on the tutorial on manipulator differential kinematics. In Part 1, we described a method of modelling kinematics using the elementary transform sequence (ETS), before formulating forward kinematics and the manipulator Jacobian. We then described some basic applications of the manipulator Jacobian including resolved-rate motion control (RRMC), inverse kinematics… ▽ More This is the second and final article on the tutorial on manipulator differential kinematics. In Part 1, we described a method of modelling kinematics using the elementary transform sequence (ETS), before formulating forward kinematics and the manipulator Jacobian. We then described some basic applications of the manipulator Jacobian including resolved-rate motion control (RRMC), inverse kinematics (IK), and some manipulator performance measures. In this article, we formulate the second-order differential kinematics, leading to a definition of manipulator Hessian. We then describe the differential kinematics' analytical forms, which are essential to dynamics applications. Subsequently, we provide a general formula for higher-order derivatives. The first application we consider is advanced velocity control. In this section, we extend resolved-rate motion control to perform sub-tasks while still achieving the goal before redefining the algorithm as a quadratic program to enable greater flexibility and additional constraints. We then take another look at numerical inverse kinematics with an emphasis on adding constraints. Finally, we analyse how the manipulator Hessian can help to escape singularities. We have provided Jupyter Notebooks to accompany each section within this tutorial. The Notebooks are written in Python code and use the Robotics Toolbox for Python, and the Swift Simulator to provide examples and implementations of algorithms. While not absolutely essential, for the most engaging and informative experience, we recommend working through the Jupyter Notebooks while reading this article. The Notebooks and setup instructions can be accessed at https://github.com/jhavl/dkt. △ Less

Submitted 16 May, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: See associated Jupyter Notebooks https://github.com/jhavl/dkt Accepted for publication in IEEE Robotics and Automation Magazine. Published version available at https://doi.org/10.1109/MRA.2023.3270221

arXiv:2204.13879 [pdf, other]

DGBench: An Open-Source, Reproducible Benchmark for Dynamic Grasping

Authors: Ben Burgess-Limerick, Chris Lehnert, Jurgen Leitner, Peter Corke

Abstract: This paper introduces DGBench, a fully reproducible open-source testing system to enable benchmarking of dynamic grasping in environments with unpredictable relative motion between robot and object. We use the proposed benchmark to compare several visual perception arrangements. Traditional perception systems developed for static grasping are unable to provide feedback during the final phase of a… ▽ More This paper introduces DGBench, a fully reproducible open-source testing system to enable benchmarking of dynamic grasping in environments with unpredictable relative motion between robot and object. We use the proposed benchmark to compare several visual perception arrangements. Traditional perception systems developed for static grasping are unable to provide feedback during the final phase of a grasp due to sensor minimum range, occlusion, and a limited field of view. A multi-camera eye-in-hand perception system is presented that has advantages over commonly used camera configurations. We quantitatively evaluate the performance on a real robot with an image-based visual servoing grasp controller and show a significantly improved success rate on a dynamic grasping task. △ Less

Submitted 13 July, 2022; v1 submitted 29 April, 2022; originally announced April 2022.

Comments: Dynamic Grasping Benchmark available: https://github.com/BenBurgessLimerick/DGBench

arXiv:2202.12557 [pdf, other]

doi 10.1109/LRA.2022.3188430

Visibility Maximization Controller for Robotic Manipulation

Authors: Kerry He, Rhys Newbury, Tin Tran, Jesse Haviland, Ben Burgess-Limerick, Dana Kulić, Peter Corke, Akansel Cosgun

Abstract: Occlusions caused by a robot's own body is a common problem for closed-loop control methods employed in eye-to-hand camera setups. We propose an optimization-based reactive controller that minimizes self-occlusions while achieving a desired goal pose. The approach allows coordinated control between the robot's base, arm and head by encoding the line-of-sight visibility to the target as a soft cons… ▽ More Occlusions caused by a robot's own body is a common problem for closed-loop control methods employed in eye-to-hand camera setups. We propose an optimization-based reactive controller that minimizes self-occlusions while achieving a desired goal pose. The approach allows coordinated control between the robot's base, arm and head by encoding the line-of-sight visibility to the target as a soft constraint along with other task-related constraints, and solving for feasible joint and base velocities. The generalizability of the approach is demonstrated in simulated and real-world experiments, on robots with fixed or mobile bases, with moving or fixed objects, and multiple objects. The experiments revealed a trade-off between occlusion rates and other task metrics. While a planning-based baseline achieved lower occlusion rates than the proposed controller, it came at the expense of highly inefficient paths and a significant drop in the task success. On the other hand, the proposed controller is shown to improve visibility to the line target object(s) without sacrificing too much from the task success and efficiency. Videos and code can be found at: rhys-newbury.github.io/projects/vmc/. △ Less

Submitted 25 February, 2022; originally announced February 2022.

Comments: 8 pages, 6 figures, 7 tables, submitted to RA-L and IROS 2022

arXiv:2109.04749 [pdf, ps, other]

doi 10.1109/LRA.2022.3146554

A Holistic Approach to Reactive Mobile Manipulation

Authors: Jesse Haviland, Niko Sünderhauf, Peter Corke

Abstract: We present the design and implementation of a taskable reactive mobile manipulation system. In contrary to related work, we treat the arm and base degrees of freedom as a holistic structure which greatly improves the speed and fluidity of the resulting motion. At the core of this approach is a robust and reactive motion controller which can achieve a desired end-effector pose, while avoiding joint… ▽ More We present the design and implementation of a taskable reactive mobile manipulation system. In contrary to related work, we treat the arm and base degrees of freedom as a holistic structure which greatly improves the speed and fluidity of the resulting motion. At the core of this approach is a robust and reactive motion controller which can achieve a desired end-effector pose, while avoiding joint position and velocity limits, and ensuring the mobile manipulator is manoeuvrable throughout the trajectory. This can support sensor-based behaviours such as closed-loop visual grasping. As no planning is involved in our approach, the robot is never stationary thinking about what to do next. We show the versatility of our holistic motion controller by implementing a pick and place system using behaviour trees and demonstrate this task on a 9-degree-of-freedom mobile manipulator. Additionally, we provide an open-source implementation of our motion controller for both non-holonomic and omnidirectional mobile manipulators available at jhavl.github.io/holistic. △ Less

Submitted 2 February, 2022; v1 submitted 10 September, 2021; originally announced September 2021.

Comments: IEEE Robotics and Automation Letters (RA-L). Preprint Version. Accepted January, 2022. The code and videos can be found at https://jhavl.github.io/holistic/

Journal ref: IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3122-3129, April 2022

arXiv:2108.08748 [pdf, other]

FSNet: A Failure Detection Framework for Semantic Segmentation

Authors: Quazi Marufur Rahman, Niko Sünderhauf, Peter Corke, Feras Dayoub

Abstract: Semantic segmentation is an important task that helps autonomous vehicles understand their surroundings and navigate safely. During deployment, even the most mature segmentation models are vulnerable to various external factors that can degrade the segmentation performance with potentially catastrophic consequences for the vehicle and its surroundings. To address this issue, we propose a failure d… ▽ More Semantic segmentation is an important task that helps autonomous vehicles understand their surroundings and navigate safely. During deployment, even the most mature segmentation models are vulnerable to various external factors that can degrade the segmentation performance with potentially catastrophic consequences for the vehicle and its surroundings. To address this issue, we propose a failure detection framework to identify pixel-level misclassification. We do so by exploiting internal features of the segmentation model and training it simultaneously with a failure detection network. During deployment, the failure detector can flag areas in the image where the segmentation model have failed to segment correctly. We evaluate the proposed approach against state-of-the-art methods and achieve 12.30%, 9.46%, and 9.65% performance improvement in the AUPR-Error metric for Cityscapes, BDD100K, and Mapillary semantic segmentation datasets. △ Less

Submitted 27 September, 2021; v1 submitted 19 August, 2021; originally announced August 2021.

arXiv:2104.07188 [pdf, other]

Tabletop Object Rearrangement: Team ACRV's Entry to OCRTOC

Authors: Zheyu Zhang, Rhys Newbury, Kerry He, Steven Martin, Gavin Suddrey, Jun Kwan, Peter Corke, Akansel Cosgun

Abstract: Open Cloud Robot Table Organization Challenge (OCRTOC) is one of the most comprehensive cloud-based robotic manipulation competitions. It focuses on rearranging tabletop objects using vision as its primary sensing modality. In this extended abstract, we present our entry to the OCRTOC2020 and the key challenges the team has experienced. Open Cloud Robot Table Organization Challenge (OCRTOC) is one of the most comprehensive cloud-based robotic manipulation competitions. It focuses on rearranging tabletop objects using vision as its primary sensing modality. In this extended abstract, we present our entry to the OCRTOC2020 and the key challenges the team has experienced. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Comments: ICRA 2021 Workshop on Cloud-Based Competitions and Benchmarks for Robotic Manipulation and Grasping

arXiv:2103.15349 [pdf, other]

Refractive Light-Field Features for Curved Transparent Objects in Structure from Motion

Authors: Dorian Tsai, Peter Corke, Thierry Peynot, Donald G. Dansereau

Abstract: Curved refractive objects are common in the human environment, and have a complex visual appearance that can cause robotic vision algorithms to fail. Light-field cameras allow us to address this challenge by capturing the view-dependent appearance of such objects in a single exposure. We propose a novel image feature for light fields that detects and describes the patterns of light refracted throu… ▽ More Curved refractive objects are common in the human environment, and have a complex visual appearance that can cause robotic vision algorithms to fail. Light-field cameras allow us to address this challenge by capturing the view-dependent appearance of such objects in a single exposure. We propose a novel image feature for light fields that detects and describes the patterns of light refracted through curved transparent objects. We derive characteristic points based on these features allowing them to be used in place of conventional 2D features. Using our features, we demonstrate improved structure-from-motion performance in challenging scenes containing refractive objects, including quantitative evaluations that show improved camera pose estimates and 3D reconstructions. Additionally, our methods converge 15-35% more frequently than the state-of-the-art. Our method is a critical step towards allowing robots to operate around refractive objects, with applications in manufacturing, quality assurance, pick-and-place, and domestic robots working with acrylic, glass and other transparent materials. △ Less

Submitted 17 April, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: submitted to IROS-RAL 2021. 8 pages, 9 figures, 2 tables

arXiv:2101.01364 [pdf]

doi 10.1109/ACCESS.2021.3055015

Run-Time Monitoring of Machine Learning for Robotic Perception: A Survey of Emerging Trends

Authors: Quazi Marufur Rahman, Peter Corke, Feras Dayoub

Abstract: As deep learning continues to dominate all state-of-the-art computer vision tasks, it is increasingly becoming an essential building block for robotic perception. This raises important questions concerning the safety and reliability of learning-based perception systems. There is an established field that studies safety certification and convergence guarantees of complex software systems at design-… ▽ More As deep learning continues to dominate all state-of-the-art computer vision tasks, it is increasingly becoming an essential building block for robotic perception. This raises important questions concerning the safety and reliability of learning-based perception systems. There is an established field that studies safety certification and convergence guarantees of complex software systems at design-time. However, the unknown future deployment environments of an autonomous system and the complexity of learning-based perception make the generalization of design-time verification to run-time problematic. In the face of this challenge, more attention is starting to focus on run-time monitoring of performance and reliability of perception systems with several trends emerging in the literature. This paper attempts to identify these trends and summarise the various approaches to the topic. △ Less

Submitted 11 July, 2021; v1 submitted 5 January, 2021; originally announced January 2021.

Comments: Updated version of 10.1109/ACCESS.2021.3055015. Published at IEEE Access. 27 January 2021

arXiv:2101.00443 [pdf, ps, other]

doi 10.1561/2300000059

Semantics for Robotic Mapping, Perception and Interaction: A Survey

Authors: Sourav Garg, Niko Sünderhauf, Feras Dayoub, Douglas Morrison, Akansel Cosgun, Gustavo Carneiro, Qi Wu, Tat-Jun Chin, Ian Reid, Stephen Gould, Peter Corke, Michael Milford

Abstract: For robots to navigate and interact more richly with the world around them, they will likely require a deeper understanding of the world in which they operate. In robotics and related research fields, the study of understanding is often referred to as semantics, which dictates what does the world "mean" to a robot, and is strongly tied to the question of how to represent that meaning. With humans… ▽ More For robots to navigate and interact more richly with the world around them, they will likely require a deeper understanding of the world in which they operate. In robotics and related research fields, the study of understanding is often referred to as semantics, which dictates what does the world "mean" to a robot, and is strongly tied to the question of how to represent that meaning. With humans and robots increasingly operating in the same world, the prospects of human-robot interaction also bring semantics and ontology of natural language into the picture. Driven by need, as well as by enablers like increasing availability of training data and computational resources, semantics is a rapidly growing research area in robotics. The field has received significant attention in the research literature to date, but most reviews and surveys have focused on particular aspects of the topic: the technical research issues regarding its use in specific robotic topics like mapping or segmentation, or its relevance to one particular application domain like autonomous driving. A new treatment is therefore required, and is also timely because so much relevant research has occurred since many of the key surveys were published. This survey therefore provides an overarching snapshot of where semantics in robotics stands today. We establish a taxonomy for semantics research in or relevant to robotics, split into four broad categories of activity, in which semantics are extracted, used, or both. Within these broad categories we survey dozens of major topics including fundamentals from the computer vision field and key robotics research areas utilizing semantics, including mapping, navigation and interaction with the world. The survey also covers key practical considerations, including enablers like increased data availability and improved computational hardware, and major application areas where... △ Less

Submitted 2 January, 2021; originally announced January 2021.

Comments: 81 pages, 1 figure, published in Foundations and Trends in Robotics, 2020

Journal ref: Foundations and Trends in Robotics: Vol. 8: No. 1-2, pp 1-224 (2020)

arXiv:2010.08696 [pdf, ps, other]

A Systematic Approach to Computing the Manipulator Jacobian and Hessian using the Elementary Transform Sequence

Authors: Jesse Haviland, Peter Corke

Abstract: The elementary transform sequence (ETS) provides a universal method of describing the kinematics of any serial-link manipulator. The ETS notation is intuitive and easy to understand, while avoiding the complexity and limitations of Denvit-Hartenberg frame assignment. In this paper, we describe a systematic method for computing the manipulator Jacobian and Hessian (differential kinematics) using th… ▽ More The elementary transform sequence (ETS) provides a universal method of describing the kinematics of any serial-link manipulator. The ETS notation is intuitive and easy to understand, while avoiding the complexity and limitations of Denvit-Hartenberg frame assignment. In this paper, we describe a systematic method for computing the manipulator Jacobian and Hessian (differential kinematics) using the ETS notation. Differential kinematics have many applications including numerical inverse kinematics, resolved-rate motion control and manipulability motion control. Furthermore, we provide an open-source Python library which implements our algorithm and can be interfaced with any serial-link manipulator (available at github.com/petercorke/robotics-toolbox-python). △ Less

Submitted 16 October, 2020; originally announced October 2020.

arXiv:2010.08686 [pdf, ps, other]

doi 10.1109/LRA.2021.3056060

NEO: A Novel Expeditious Optimisation Algorithm for Reactive Motion Control of Manipulators

Authors: Jesse Haviland, Peter Corke

Abstract: We present NEO, a fast and purely reactive motion controller for manipulators which can avoid static and dynamic obstacles while moving to the desired end-effector pose. Additionally, our controller maximises the manipulability of the robot during the trajectory, while avoiding joint position and velocity limits. NEO is wrapped into a strictly convex quadratic programme which, when considering obs… ▽ More We present NEO, a fast and purely reactive motion controller for manipulators which can avoid static and dynamic obstacles while moving to the desired end-effector pose. Additionally, our controller maximises the manipulability of the robot during the trajectory, while avoiding joint position and velocity limits. NEO is wrapped into a strictly convex quadratic programme which, when considering obstacles, joint limits, and manipulability on a 7 degree-of-freedom robot, is generally solved in a few ms. While NEO is not intended to replace state-of-the-art motion planners, our experiments show that it is a viable alternative for scenes with moderate complexity while also being capable of reactive control. For more complex scenes, NEO is better suited as a reactive local controller, in conjunction with a global motion planner. We compare NEO to motion planners on a standard benchmark in simulation and additionally illustrate and verify its operation on a physical robot in a dynamic environment. We provide an open-source library which implements our controller. △ Less

Submitted 2 February, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

Comments: IEEE Robotics and Automation Letters (RA-L). Preprint Version. Accepted January, 2021. The code and videos can be found at https://jhavl.github.io/neo/

Journal ref: IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1043-1050, April 2021

arXiv:2010.03209 [pdf, other]

Learning Arbitrary-Goal Fabric Folding with One Hour of Real Robot Experience

Authors: Robert Lee, Daniel Ward, Akansel Cosgun, Vibhavari Dasagi, Peter Corke, Jurgen Leitner

Abstract: Manipulating deformable objects, such as fabric, is a long standing problem in robotics, with state estimation and control posing a significant challenge for traditional methods. In this paper, we show that it is possible to learn fabric folding skills in only an hour of self-supervised real robot experience, without human supervision or simulation. Our approach relies on fully convolutional netwo… ▽ More Manipulating deformable objects, such as fabric, is a long standing problem in robotics, with state estimation and control posing a significant challenge for traditional methods. In this paper, we show that it is possible to learn fabric folding skills in only an hour of self-supervised real robot experience, without human supervision or simulation. Our approach relies on fully convolutional networks and the manipulation of visual inputs to exploit learned features, allowing us to create an expressive goal-conditioned pick and place policy that can be trained efficiently with real world robot data only. Folding skills are learned with only a sparse reward function and thus do not require reward function engineering, merely an image of the goal configuration. We demonstrate our method on a set of towel-folding tasks, and show that our approach is able to discover sequential folding strategies, purely from trial-and-error. We achieve state-of-the-art results without the need for demonstrations or simulation, used in prior approaches. Videos available at: https://sites.google.com/view/learningtofold △ Less

Submitted 7 October, 2020; originally announced October 2020.

arXiv:2006.01797 [pdf, other]

Object-Independent Human-to-Robot Handovers using Real Time Robotic Vision

Authors: Patrick Rosenberger, Akansel Cosgun, Rhys Newbury, Jun Kwan, Valerio Ortenzi, Peter Corke, Manfred Grafinger

Abstract: We present an approach for safe and object-independent human-to-robot handovers using real time robotic vision and manipulation. We aim for general applicability with a generic object detector, a fast grasp selection algorithm and by using a single gripper-mounted RGB-D camera, hence not relying on external sensors. The robot is controlled via visual servoing towards the object of interest. Puttin… ▽ More We present an approach for safe and object-independent human-to-robot handovers using real time robotic vision and manipulation. We aim for general applicability with a generic object detector, a fast grasp selection algorithm and by using a single gripper-mounted RGB-D camera, hence not relying on external sensors. The robot is controlled via visual servoing towards the object of interest. Putting a high emphasis on safety, we use two perception modules: human body part segmentation and hand/finger segmentation. Pixels that are deemed to belong to the human are filtered out from candidate grasp poses, hence ensuring that the robot safely picks the object without colliding with the human partner. The grasp selection and perception modules run concurrently in real-time, which allows monitoring of the progress. In experiments with 13 objects, the robot was able to successfully take the object from the human in 81.9% of the trials. △ Less

Submitted 21 September, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

Comments: IEEE Robotics and Automation Letters (RA-L). Preprint Version. Accepted September, 2020. The code and videos can be found at https://patrosat.github.io/h2r_handovers/

arXiv:2003.01314 [pdf, other]

EGAD! an Evolved Grasping Analysis Dataset for diversity and reproducibility in robotic manipulation

Authors: Douglas Morrison, Peter Corke, Jürgen Leitner

Abstract: We present the Evolved Grasping Analysis Dataset (EGAD), comprising over 2000 generated objects aimed at training and evaluating robotic visual grasp detection algorithms. The objects in EGAD are geometrically diverse, filling a space ranging from simple to complex shapes and from easy to difficult to grasp, compared to other datasets for robotic grasping, which may be limited in size or contain o… ▽ More We present the Evolved Grasping Analysis Dataset (EGAD), comprising over 2000 generated objects aimed at training and evaluating robotic visual grasp detection algorithms. The objects in EGAD are geometrically diverse, filling a space ranging from simple to complex shapes and from easy to difficult to grasp, compared to other datasets for robotic grasping, which may be limited in size or contain only a small number of object classes. Additionally, we specify a set of 49 diverse 3D-printable evaluation objects to encourage reproducible testing of robotic grasping systems across a range of complexity and difficulty. The dataset, code and videos can be found at https://dougsm.github.io/egad/ △ Less

Submitted 23 April, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

Comments: IEEE Robotics and Automation Letters (RA-L). Preprint Version. Accepted April, 2020. The dataset, code and videos can be found at https://dougsm.github.io/egad/

arXiv:2002.11901 [pdf, ps, other]

A Purely-Reactive Manipulability-Maximising Motion Controller

Authors: Jesse Haviland, Peter Corke

Abstract: We present a novel approach to controlling the instantaneous velocity of a robot end-effector that is able to simultaneously maximise manipulability and avoid joint limits. It operates on non-redundant and redundant robots, which is achieved by adding artificial redundancy in the form of controlled path deviation. We formulate the problem as a quadratic programme and provide an open-source Python… ▽ More We present a novel approach to controlling the instantaneous velocity of a robot end-effector that is able to simultaneously maximise manipulability and avoid joint limits. It operates on non-redundant and redundant robots, which is achieved by adding artificial redundancy in the form of controlled path deviation. We formulate the problem as a quadratic programme and provide an open-source Python implementation that provides solutions in just a few milliseconds. It accepts a robot model expressed using URDF or Denavit-Hartenberg parameterisation. We compare our method to previous work in simulation and on a physical robot. △ Less

Submitted 16 October, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

Comments: See project website https://jhavl.github.io/mmc

arXiv:2001.11684 [pdf, other]

doi 10.1109/TCDS.2020.2993855

Robot Navigation in Unseen Spaces using an Abstract Map

Authors: Ben Talbot, Feras Dayoub, Peter Corke, Gordon Wyeth

Abstract: Human navigation in built environments depends on symbolic spatial information which has unrealised potential to enhance robot navigation capabilities. Information sources such as labels, signs, maps, planners, spoken directions, and navigational gestures communicate a wealth of spatial information to the navigators of built environments; a wealth of information that robots typically ignore. We pr… ▽ More Human navigation in built environments depends on symbolic spatial information which has unrealised potential to enhance robot navigation capabilities. Information sources such as labels, signs, maps, planners, spoken directions, and navigational gestures communicate a wealth of spatial information to the navigators of built environments; a wealth of information that robots typically ignore. We present a robot navigation system that uses the same symbolic spatial information employed by humans to purposefully navigate in unseen built environments with a level of performance comparable to humans. The navigation system uses a novel data structure called the abstract map to imagine malleable spatial models for unseen spaces from spatial symbols. Sensorimotor perceptions from a robot are then employed to provide purposeful navigation to symbolic goal locations in the unseen environment. We show how a dynamic system can be used to create malleable spatial models for the abstract map, and provide an open source implementation to encourage future work in the area of symbolic navigation. Symbolic navigation performance of humans and a robot is evaluated in a real-world built environment. The paper concludes with a qualitative analysis of human navigation strategies, providing further insights into how the symbolic navigation capabilities of robots in unseen built environments can be improved in the future. △ Less

Submitted 15 May, 2020; v1 submitted 31 January, 2020; originally announced January 2020.

Comments: 15 pages, published in IEEE Transactions on Cognitive and Developmental Systems (http://doi.org/10.1109/TCDS.2020.2993855), see https://btalb.github.io/abstract_map/ for access to software

arXiv:2001.11196 [pdf, other]

Model-free vision-based shaping of deformable plastic materials

Authors: Andrea Cherubini, Valerio Ortenzi, Akansel Cosgun, Robert Lee, Peter Corke

Abstract: We address the problem of shaping deformable plastic materials using non-prehensile actions. Shaping plastic objects is challenging, since they are difficult to model and to track visually. We study this problem, by using kinetic sand, a plastic toy material which mimics the physical properties of wet sand. Inspired by a pilot study where humans shape kinetic sand, we define two types of actions:… ▽ More We address the problem of shaping deformable plastic materials using non-prehensile actions. Shaping plastic objects is challenging, since they are difficult to model and to track visually. We study this problem, by using kinetic sand, a plastic toy material which mimics the physical properties of wet sand. Inspired by a pilot study where humans shape kinetic sand, we define two types of actions: \textit{pushing} the material from the sides and \textit{tapping} from above. The chosen actions are executed with a robotic arm using image-based visual servoing. From the current and desired view of the material, we define states based on visual features such as the outer contour shape and the pixel luminosity values. These are mapped to actions, which are repeated iteratively to reduce the image error until convergence is reached. For pushing, we propose three methods for mapping the visual state to an action. These include heuristic methods and a neural network, trained from human actions. We show that it is possible to obtain simple shapes with the kinetic sand, without explicitly modeling the material. Our approach is limited in the types of shapes it can achieve. A richer set of action types and multi-step reasoning is needed to achieve more sophisticated shapes. △ Less

Submitted 30 January, 2020; originally announced January 2020.

Comments: Accepted to The International Journal of Robotics Research (IJRR)

arXiv:2001.05650 [pdf, ps, other]

Control of the Final-Phase of Closed-Loop Visual Grasping using Image-Based Visual Servoing

Authors: Jesse Haviland, Feras Dayoub, Peter Corke

Abstract: This paper considers the final approach phase of visual-closed-loop grasping where the RGB-D camera is no longer able to provide valid depth information. Many current robotic grasping controllers are not closed-loop and therefore fail for moving objects. Closed-loop grasp controllers based on RGB-D imagery can track a moving object, but fail when the sensor's minimum object distance is violated ju… ▽ More This paper considers the final approach phase of visual-closed-loop grasping where the RGB-D camera is no longer able to provide valid depth information. Many current robotic grasping controllers are not closed-loop and therefore fail for moving objects. Closed-loop grasp controllers based on RGB-D imagery can track a moving object, but fail when the sensor's minimum object distance is violated just before grasping. To overcome this we propose the use of image-based visual servoing (IBVS) to guide the robot to the object-relative grasp pose using camera RGB information. IBVS robustly moves the camera to a goal pose defined implicitly in terms of an image-plane feature configuration. In this work, the goal image feature coordinates are predicted from RGB-D data to enable RGB-only tracking once depth data becomes unavailable -- this enables more reliable grasping of previously unseen moving objects. Experimental results are provided. △ Less

Submitted 27 February, 2020; v1 submitted 16 January, 2020; originally announced January 2020.

Comments: Under review for RA-L and IROS 2020

arXiv:2001.02366 [pdf, other]

What can robotics research learn from computer vision research?

Authors: Peter Corke, Feras Dayoub, David Hall, John Skinner, Niko Sünderhauf

Abstract: The computer vision and robotics research communities are each strong. However progress in computer vision has become turbo-charged in recent years due to big data, GPU computing, novel learning algorithms and a very effective research methodology. By comparison, progress in robotics seems slower. It is true that robotics came later to exploring the potential of learning -- the advantages over the… ▽ More The computer vision and robotics research communities are each strong. However progress in computer vision has become turbo-charged in recent years due to big data, GPU computing, novel learning algorithms and a very effective research methodology. By comparison, progress in robotics seems slower. It is true that robotics came later to exploring the potential of learning -- the advantages over the well-established body of knowledge in dynamics, kinematics, planning and control is still being debated, although reinforcement learning seems to offer real potential. However, the rapid development of computer vision compared to robotics cannot be only attributed to the former's adoption of deep learning. In this paper, we argue that the gains in computer vision are due to research methodology -- evaluation under strict constraints versus experiments; bold numbers versus videos. △ Less

Submitted 11 June, 2020; v1 submitted 7 January, 2020; originally announced January 2020.

Comments: 15 pages, to appear in the proceeding of the International Symposium on Robotics Research (ISRR) 2019

arXiv:1811.10800 [pdf, other]

Probabilistic Object Detection: Definition and Evaluation

Authors: David Hall, Feras Dayoub, John Skinner, Haoyang Zhang, Dimity Miller, Peter Corke, Gustavo Carneiro, Anelia Angelova, Niko Sünderhauf

Abstract: We introduce Probabilistic Object Detection, the task of detecting objects in images and accurately quantifying the spatial and semantic uncertainties of the detections. Given the lack of methods capable of assessing such probabilistic object detections, we present the new Probability-based Detection Quality measure (PDQ).Unlike AP-based measures, PDQ has no arbitrary thresholds and rewards spatia… ▽ More We introduce Probabilistic Object Detection, the task of detecting objects in images and accurately quantifying the spatial and semantic uncertainties of the detections. Given the lack of methods capable of assessing such probabilistic object detections, we present the new Probability-based Detection Quality measure (PDQ).Unlike AP-based measures, PDQ has no arbitrary thresholds and rewards spatial and label quality, and foreground/background separation quality while explicitly penalising false positive and false negative detections. We contrast PDQ with existing mAP and moLRP measures by evaluating state-of-the-art detectors and a Bayesian object detector based on Monte Carlo Dropout. Our experiments indicate that conventional object detectors tend to be spatially overconfident and thus perform poorly on the task of probabilistic object detection. Our paper aims to encourage the development of new object detection approaches that provide detections with accurately estimated spatial and label uncertainties and are of critical importance for deployment on robots and embodied AI systems in the real world. △ Less

Submitted 30 January, 2020; v1 submitted 26 November, 2018; originally announced November 2018.

Comments: 21 pages, 25 figures, to appear in the proceedings of the winter conference on applications of computer vision WACV 2020

arXiv:1809.08564 [pdf, other]

Multi-View Picking: Next-best-view Reaching for Improved Grasping in Clutter

Authors: Douglas Morrison, Peter Corke, Jürgen Leitner

Abstract: Camera viewpoint selection is an important aspect of visual grasp detection, especially in clutter where many occlusions are present. Where other approaches use a static camera position or fixed data collection routines, our Multi-View Picking (MVP) controller uses an active perception approach to choose informative viewpoints based directly on a distribution of grasp pose estimates in real time,… ▽ More Camera viewpoint selection is an important aspect of visual grasp detection, especially in clutter where many occlusions are present. Where other approaches use a static camera position or fixed data collection routines, our Multi-View Picking (MVP) controller uses an active perception approach to choose informative viewpoints based directly on a distribution of grasp pose estimates in real time, reducing uncertainty in the grasp poses caused by clutter and occlusions. In trials of grasping 20 objects from clutter, our MVP controller achieves 80% grasp success, outperforming a single-viewpoint grasp detector by 12%. We also show that our approach is both more accurate and more efficient than approaches which consider multiple fixed viewpoints. △ Less

Submitted 10 May, 2019; v1 submitted 23 September, 2018; originally announced September 2018.

Comments: ICRA 2019 Video: https://youtu.be/Vn3vSPKlaEk Code: https://github.com/dougsm/mvp_grasp

arXiv:1806.07375 [pdf, other]

Distinguishing Refracted Features using Light Field Cameras with Application to Structure from Motion

Authors: Dorian Tsai, Donald G Dansereau, Thierry Peynot, Peter Corke

Abstract: Robots must reliably interact with refractive objects in many applications; however, refractive objects can cause many robotic vision algorithms to become unreliable or even fail, particularly feature-based matching applications, such as structure-from-motion. We propose a method to distinguish between refracted and Lambertian image features using a light field camera. Specifically, we propose to… ▽ More Robots must reliably interact with refractive objects in many applications; however, refractive objects can cause many robotic vision algorithms to become unreliable or even fail, particularly feature-based matching applications, such as structure-from-motion. We propose a method to distinguish between refracted and Lambertian image features using a light field camera. Specifically, we propose to use textural cross-correlation to characterise apparent feature motion in a single light field, and compare this motion to its Lambertian equivalent based on 4D light field geometry. Our refracted feature distinguisher has a 34.3% higher rate of detection compared to state-of-the-art for light fields captured with large baselines relative to the refractive object. Our method also applies to light field cameras with much smaller baselines than previously considered, yielding up to 2 times better detection for 2D-refractive objects, such as a sphere, and up to 8 times better for 1D-refractive objects, such as a cylinder. For structure from motion, we demonstrate that rejecting refracted features using our distinguisher yields up to 42.4% lower reprojection error, and lower failure rate when the robot is approaching refractive objects. Our method lead to more robust robot vision in the presence of refractive objects. △ Less

Submitted 31 May, 2018; originally announced June 2018.

Comments: 8 pages, 8 figures, submission to IROS 2018

arXiv:1804.06557 [pdf, other]

The Limits and Potentials of Deep Learning for Robotics

Authors: Niko Sünderhauf, Oliver Brock, Walter Scheirer, Raia Hadsell, Dieter Fox, Jürgen Leitner, Ben Upcroft, Pieter Abbeel, Wolfram Burgard, Michael Milford, Peter Corke

Abstract: The application of deep learning in robotics leads to very specific problems and research questions that are typically not addressed by the computer vision and machine learning communities. In this paper we discuss a number of robotics-specific learning, reasoning, and embodiment challenges for deep learning. We explain the need for better evaluation metrics, highlight the importance and unique ch… ▽ More The application of deep learning in robotics leads to very specific problems and research questions that are typically not addressed by the computer vision and machine learning communities. In this paper we discuss a number of robotics-specific learning, reasoning, and embodiment challenges for deep learning. We explain the need for better evaluation metrics, highlight the importance and unique challenges for deep robotic learning in simulation, and explore the spectrum between purely data-driven and model-driven approaches. We hope this paper provides a motivating overview of important research directions to overcome the current limitations, and help fulfill the promising potentials of deep learning in robotics. △ Less

Submitted 18 April, 2018; originally announced April 2018.

arXiv:1804.05172 [pdf, other]

Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach

Authors: Douglas Morrison, Peter Corke, Jürgen Leitner

Abstract: This paper presents a real-time, object-independent grasp synthesis method which can be used for closed-loop grasping. Our proposed Generative Grasping Convolutional Neural Network (GG-CNN) predicts the quality and pose of grasps at every pixel. This one-to-one mapping from a depth image overcomes limitations of current deep-learning grasping techniques by avoiding discrete sampling of grasp candi… ▽ More This paper presents a real-time, object-independent grasp synthesis method which can be used for closed-loop grasping. Our proposed Generative Grasping Convolutional Neural Network (GG-CNN) predicts the quality and pose of grasps at every pixel. This one-to-one mapping from a depth image overcomes limitations of current deep-learning grasping techniques by avoiding discrete sampling of grasp candidates and long computation times. Additionally, our GG-CNN is orders of magnitude smaller while detecting stable grasps with equivalent performance to current state-of-the-art techniques. The light-weight and single-pass generative nature of our GG-CNN allows for closed-loop control at up to 50Hz, enabling accurate grasping in non-static environments where objects move and in the presence of robot control inaccuracies. In our real-world tests, we achieve an 83% grasp success rate on a set of previously unseen objects with adversarial geometry and 88% on a set of household objects that are moved during the grasp attempt. We also achieve 81% accuracy when grasping in dynamic clutter. △ Less

Submitted 15 May, 2018; v1 submitted 14 April, 2018; originally announced April 2018.

Comments: Robotics: Science and Systems (RSS), 2018. Code: http://github.com/dougsm/ggcnn Video: http://www.youtube.com/watch?v=7nOoxuGEcxA

arXiv:1804.02154 [pdf, ps, other]

Assisted Control for Semi-Autonomous Power Infrastructure Inspection using Aerial Vehicles

Authors: Aaron McFadyen, Feras Dayoub, Steve Martin, Jason Ford, Peter Corke

Abstract: This paper presents the design and implementation of an assisted control technology for a small multirotor platform for aerial inspection of fixed energy infrastructure. Sensor placement is supported by a theoretical analysis of expected sensor performance and constrained platform behaviour to speed up implementation. The optical sensors provide relative position information between the platform a… ▽ More This paper presents the design and implementation of an assisted control technology for a small multirotor platform for aerial inspection of fixed energy infrastructure. Sensor placement is supported by a theoretical analysis of expected sensor performance and constrained platform behaviour to speed up implementation. The optical sensors provide relative position information between the platform and the asset, which enables human operator inputs to be autonomously adjusted to ensure safe separation. The assisted control approach is designed to reduced operator workload during close proximity inspection tasks, with collision avoidance and safe separation managed autonomously. The energy infrastructure includes single vertical wooden poles and crossarm with attached overhead wires. Simulated and real experimental results are provided. △ Less

Submitted 1 August, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

Comments: to appear in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018)

arXiv:1710.01439 [pdf, other]

Design of a Multi-Modal End-Effector and Grasping System: How Integrated Design helped win the Amazon Robotics Challenge

Authors: S. Wade-McCue, N. Kelly-Boxall, M. McTaggart, D. Morrison, A. W. Tow, J. Erskine, R. Grinover, A. Gurman, T. Hunn, D. Lee, A. Milan, T. Pham, G. Rallos, A. Razjigaev, T. Rowntree, R. Smith, K. Vijay, Z. Zhuang, C. Lehnert, I. Reid, P. Corke, J. Leitner

Abstract: We present the grasping system and design approach behind Cartman, the winning entrant in the 2017 Amazon Robotics Challenge. We investigate the design processes leading up to the final iteration of the system and describe the emergent solution by comparing it with key robotics design aspects. Following our experience, we propose a new design aspect, precision vs. redundancy, that should be consid… ▽ More We present the grasping system and design approach behind Cartman, the winning entrant in the 2017 Amazon Robotics Challenge. We investigate the design processes leading up to the final iteration of the system and describe the emergent solution by comparing it with key robotics design aspects. Following our experience, we propose a new design aspect, precision vs. redundancy, that should be considered alongside the previously proposed design aspects of modularity vs. integration, generality vs. assumptions, computation vs. embodiment and planning vs. feedback. We present the grasping system behind Cartman, the winning robot in the 2017 Amazon Robotics Challenge. The system makes strong use of redundancy in design by implementing complimentary tools, a suction gripper and a parallel gripper. This multi-modal end-effector is combined with three grasp synthesis algorithms to accommodate the range of objects provided by Amazon during the challenge. We provide a detailed system description and an evaluation of its performance before discussing the broader nature of the system with respect to the key aspects of robotic design as initially proposed by the winners of the first Amazon Picking Challenge. To address the principal nature of our grasping system and the reason for its success, we propose an additional robotic design aspect `precision vs. redundancy'. The full design of our robotic system, including the end-effector, is open sourced and available at http://juxi.net/projects/AmazonRoboticsChallenge/ △ Less

Submitted 19 June, 2018; v1 submitted 3 October, 2017; originally announced October 2017.

Comments: ACRV Technical Report

Report number: ACRV-TR-2017-03

arXiv:1710.00967 [pdf, other]

Mechanical Design of a Cartesian Manipulator for Warehouse Pick and Place

Authors: M. McTaggart, D. Morrison, A. W. Tow, R. Smith, Norton Kelly-Boxall, Anton Milan, T. Pham Zheyu Zhuang, J. Leitner, I. Reid, P. Corke, C. Lehnert

Abstract: Robotic manipulation and grasping in cluttered and unstructured environments is a current challenge for robotics. Enabling robots to operate in these challenging environments have direct applications from automating warehouses to harvesting fruit in agriculture. One of the main challenges associated with these difficult robotic manipulation tasks is the motion planning and control problem for mult… ▽ More Robotic manipulation and grasping in cluttered and unstructured environments is a current challenge for robotics. Enabling robots to operate in these challenging environments have direct applications from automating warehouses to harvesting fruit in agriculture. One of the main challenges associated with these difficult robotic manipulation tasks is the motion planning and control problem for multi-DoF (Degree of Freedom) manipulators. This paper presents the design and performance evaluation of a low-cost Cartesian manipulator, Cartman who took first place in the Amazon Robotics Challenge 2017. It can perform pick and place tasks of household items in a cluttered environment. The robot is capable of linear speeds of 1 m/s and angular speeds of 1.5 rad/s, capable of sub-millimetre static accuracy and safe payload capacity of 2kg. Cartman can be produced for under 10 000 AUD. The complete design is open sourced and can be found at http://juxi.net/projects/AmazonRoboticsChallenge. △ Less

Submitted 18 June, 2018; v1 submitted 2 October, 2017; originally announced October 2017.

Comments: ACRV Tech Report

Report number: ACRV-TR-2017-02

arXiv:1709.07665 [pdf, other]

Semantic Segmentation from Limited Training Data

Authors: A. Milan, T. Pham, K. Vijay, D. Morrison, A. W. Tow, L. Liu, J. Erskine, R. Grinover, A. Gurman, T. Hunn, N. Kelly-Boxall, D. Lee, M. McTaggart, G. Rallos, A. Razjigaev, T. Rowntree, T. Shen, R. Smith, S. Wade-McCue, Z. Zhuang, C. Lehnert, G. Lin, I. Reid, P. Corke, J. Leitner

Abstract: We present our approach for robotic perception in cluttered scenes that led to winning the recent Amazon Robotics Challenge (ARC) 2017. Next to small objects with shiny and transparent surfaces, the biggest challenge of the 2017 competition was the introduction of unseen categories. In contrast to traditional approaches which require large collections of annotated data and many hours of training,… ▽ More We present our approach for robotic perception in cluttered scenes that led to winning the recent Amazon Robotics Challenge (ARC) 2017. Next to small objects with shiny and transparent surfaces, the biggest challenge of the 2017 competition was the introduction of unseen categories. In contrast to traditional approaches which require large collections of annotated data and many hours of training, the task here was to obtain a robust perception pipeline with only few minutes of data acquisition and training time. To that end, we present two strategies that we explored. One is a deep metric learning approach that works in three separate steps: semantic-agnostic boundary detection, patch classification and pixel-wise voting. The other is a fully-supervised semantic segmentation approach with efficient dataset collection. We conduct an extensive analysis of the two methods on our ARC 2017 dataset. Interestingly, only few examples of each class are sufficient to fine-tune even very deep convolutional neural networks for this specific task. △ Less

Submitted 22 September, 2017; originally announced September 2017.

arXiv:1709.06283 [pdf, other]

Cartman: The low-cost Cartesian Manipulator that won the Amazon Robotics Challenge

Authors: D. Morrison, A. W. Tow, M. McTaggart, R. Smith, N. Kelly-Boxall, S. Wade-McCue, J. Erskine, R. Grinover, A. Gurman, T. Hunn, D. Lee, A. Milan, T. Pham, G. Rallos, A. Razjigaev, T. Rowntree, K. Vijay, Z. Zhuang, C. Lehnert, I. Reid, P. Corke, J. Leitner

Abstract: The Amazon Robotics Challenge enlisted sixteen teams to each design a pick-and-place robot for autonomous warehousing, addressing development in robotic vision and manipulation. This paper presents the design of our custom-built, cost-effective, Cartesian robot system Cartman, which won first place in the competition finals by stowing 14 (out of 16) and picking all 9 items in 27 minutes, scoring a… ▽ More The Amazon Robotics Challenge enlisted sixteen teams to each design a pick-and-place robot for autonomous warehousing, addressing development in robotic vision and manipulation. This paper presents the design of our custom-built, cost-effective, Cartesian robot system Cartman, which won first place in the competition finals by stowing 14 (out of 16) and picking all 9 items in 27 minutes, scoring a total of 272 points. We highlight our experience-centred design methodology and key aspects of our system that contributed to our competitiveness. We believe these aspects are crucial to building robust and effective robotic systems. △ Less

Submitted 25 February, 2018; v1 submitted 19 September, 2017; originally announced September 2017.

Comments: To appear at the IEEE International Conference on Robotics and Automation (ICRA) 2018. 8 pages

arXiv:1709.05746 [pdf, other]

Adversarial Discriminative Sim-to-real Transfer of Visuo-motor Policies

Authors: Fangyi Zhang, Jürgen Leitner, Zongyuan Ge, Michael Milford, Peter Corke

Abstract: Various approaches have been proposed to learn visuo-motor policies for real-world robotic applications. One solution is first learning in simulation then transferring to the real world. In the transfer, most existing approaches need real-world images with labels. However, the labelling process is often expensive or even impractical in many robotic applications. In this paper, we propose an advers… ▽ More Various approaches have been proposed to learn visuo-motor policies for real-world robotic applications. One solution is first learning in simulation then transferring to the real world. In the transfer, most existing approaches need real-world images with labels. However, the labelling process is often expensive or even impractical in many robotic applications. In this paper, we propose an adversarial discriminative sim-to-real transfer approach to reduce the cost of labelling real data. The effectiveness of the approach is demonstrated with modular networks in a table-top object reaching task where a 7 DoF arm is controlled in velocity mode to reach a blue cuboid in clutter through visual observations. The adversarial transfer approach reduced the labelled real data requirement by 50%. Policies can be transferred to real environments with only 93 labelled and 186 unlabelled real images. The transferred visuo-motor policies are robust to novel (not seen in training) objects in clutter and even a moving target, achieving a 97.8% success rate and 1.8 cm control accuracy. △ Less

Submitted 31 May, 2018; v1 submitted 17 September, 2017; originally announced September 2017.

Comments: Under review for the International Journal of Robotics Research

arXiv:1705.08940 [pdf, other]

Visual Servoing from Deep Neural Networks

Authors: Quentin Bateux, Eric Marchand, Jürgen Leitner, Francois Chaumette, Peter Corke

Abstract: We present a deep neural network-based method to perform high-precision, robust and real-time 6 DOF visual servoing. The paper describes how to create a dataset simulating various perturbations (occlusions and lighting conditions) from a single real-world image of the scene. A convolutional neural network is fine-tuned using this dataset to estimate the relative pose between two images of the same… ▽ More We present a deep neural network-based method to perform high-precision, robust and real-time 6 DOF visual servoing. The paper describes how to create a dataset simulating various perturbations (occlusions and lighting conditions) from a single real-world image of the scene. A convolutional neural network is fine-tuned using this dataset to estimate the relative pose between two images of the same scene. The output of the network is then employed in a visual servoing control scheme. The method converges robustly even in difficult real-world settings with strong lighting variations and occlusions.A positioning error of less than one millimeter is obtained in experiments with a 6 DOF robot. △ Less

Submitted 7 June, 2017; v1 submitted 24 May, 2017; originally announced May 2017.

Comments: fixed authors list

arXiv:1705.05116 [pdf, other]

Tuning Modular Networks with Weighted Losses for Hand-Eye Coordination

Authors: Fangyi Zhang, Jürgen Leitner, Michael Milford, Peter I. Corke

Abstract: This paper introduces an end-to-end fine-tuning method to improve hand-eye coordination in modular deep visuo-motor policies (modular networks) where each module is trained independently. Benefiting from weighted losses, the fine-tuning method significantly improves the performance of the policies for a robotic planar reaching task. This paper introduces an end-to-end fine-tuning method to improve hand-eye coordination in modular deep visuo-motor policies (modular networks) where each module is trained independently. Benefiting from weighted losses, the fine-tuning method significantly improves the performance of the policies for a robotic planar reaching task. △ Less

Submitted 15 May, 2017; originally announced May 2017.

Comments: 2 pages, to appear in the Deep Learning for Robotic Vision (DLRV) Workshop in CVPR 2017

arXiv:1703.07473 [pdf, other]

Episode-Based Active Learning with Bayesian Neural Networks

Authors: Feras Dayoub, Niko Sünderhauf, Peter Corke

Abstract: We investigate different strategies for active learning with Bayesian deep neural networks. We focus our analysis on scenarios where new, unlabeled data is obtained episodically, such as commonly encountered in mobile robotics applications. An evaluation of different strategies for acquisition, updating, and final training on the CIFAR-10 dataset shows that incremental network updates with final t… ▽ More We investigate different strategies for active learning with Bayesian deep neural networks. We focus our analysis on scenarios where new, unlabeled data is obtained episodically, such as commonly encountered in mobile robotics applications. An evaluation of different strategies for acquisition, updating, and final training on the CIFAR-10 dataset shows that incremental network updates with final training on the accumulated acquisition set are essential for best performance, while limiting the amount of required human labeling labor. △ Less

Submitted 21 March, 2017; originally announced March 2017.

arXiv:1612.05335 [pdf, other]

Mirrored Light Field Video Camera Adapter

Authors: Dorian Tsai, Donald G. Dansereau, Steve Martin, Peter Corke

Abstract: This paper proposes the design of a custom mirror-based light field camera adapter that is cheap, simple in construction, and accessible. Mirrors of different shape and orientation reflect the scene into an upwards-facing camera to create an array of virtual cameras with overlapping field of view at specified depths, and deliver video frame rate light fields. We describe the design, construction,… ▽ More This paper proposes the design of a custom mirror-based light field camera adapter that is cheap, simple in construction, and accessible. Mirrors of different shape and orientation reflect the scene into an upwards-facing camera to create an array of virtual cameras with overlapping field of view at specified depths, and deliver video frame rate light fields. We describe the design, construction, decoding and calibration processes of our mirror-based light field camera adapter in preparation for an open-source release to benefit the robotic vision community. △ Less

Submitted 15 December, 2016; originally announced December 2016.

Comments: tech report, v0.5, 15 pages, 6 figures

arXiv:1610.06781 [pdf, other]

Modular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies

Authors: Fangyi Zhang, Jürgen Leitner, Michael Milford, Peter Corke

Abstract: While deep learning has had significant successes in computer vision thanks to the abundance of visual data, collecting sufficiently large real-world datasets for robot learning can be costly. To increase the practicality of these techniques on real robots, we propose a modular deep reinforcement learning method capable of transferring models trained in simulation to a real-world robotic task. We… ▽ More While deep learning has had significant successes in computer vision thanks to the abundance of visual data, collecting sufficiently large real-world datasets for robot learning can be costly. To increase the practicality of these techniques on real robots, we propose a modular deep reinforcement learning method capable of transferring models trained in simulation to a real-world robotic task. We introduce a bottleneck between perception and control, enabling the networks to be trained independently, but then merged and fine-tuned in an end-to-end manner to further improve hand-eye coordination. On a canonical, planar visually-guided robot reaching task a fine-tuned accuracy of 1.6 pixels is achieved, a significant improvement over naive transfer (17.5 pixels), showing the potential for more complicated and broader applications. Our method provides a technique for more efficient learning and transfer of visuo-motor policies for real robotic systems without relying entirely on large real-world robot datasets. △ Less

Submitted 18 December, 2017; v1 submitted 21 October, 2016; originally announced October 2016.

Comments: Australasian Conference on Robotics and Automation (ACRA) 2017, Student Paper Award Finalist

Journal ref: The proceedings of the Australasian Conference on Robotics and Automation (ACRA) 2017

arXiv:1609.05258 [pdf, other]

The ACRV Picking Benchmark (APB): A Robotic Shelf Picking Benchmark to Foster Reproducible Research

Authors: Jürgen Leitner, Adam W. Tow, Jake E. Dean, Niko Suenderhauf, Joseph W. Durham, Matthew Cooper, Markus Eich, Christopher Lehnert, Ruben Mangels, Christopher McCool, Peter Kujala, Lachlan Nicholson, Trung Pham, James Sergeant, Liao Wu, Fangyi Zhang, Ben Upcroft, Peter Corke

Abstract: Robotic challenges like the Amazon Picking Challenge (APC) or the DARPA Challenges are an established and important way to drive scientific progress. They make research comparable on a well-defined benchmark with equal test conditions for all participants. However, such challenge events occur only occasionally, are limited to a small number of contestants, and the test conditions are very difficul… ▽ More Robotic challenges like the Amazon Picking Challenge (APC) or the DARPA Challenges are an established and important way to drive scientific progress. They make research comparable on a well-defined benchmark with equal test conditions for all participants. However, such challenge events occur only occasionally, are limited to a small number of contestants, and the test conditions are very difficult to replicate after the main event. We present a new physical benchmark challenge for robotic picking: the ACRV Picking Benchmark (APB). Designed to be reproducible, it consists of a set of 42 common objects, a widely available shelf, and exact guidelines for object arrangement using stencils. A well-defined evaluation protocol enables the comparison of \emph{complete} robotic systems -- including perception and manipulation -- instead of sub-systems only. Our paper also describes and reports results achieved by an open baseline system based on a Baxter robot. △ Less

Submitted 14 December, 2016; v1 submitted 16 September, 2016; originally announced September 2016.

Comments: 8 pages, submitted to RA:Letters

arXiv:1608.00486 [pdf, other]

doi 10.1109/DICTA.2016.7797039

Exploiting Temporal Information for DCNN-based Fine-Grained Object Classification

Authors: ZongYuan Ge, Chris McCool, Conrad Sanderson, Peng Wang, Lingqiao Liu, Ian Reid, Peter Corke

Abstract: Fine-grained classification is a relatively new field that has concentrated on using information from a single image, while ignoring the enormous potential of using video data to improve classification. In this work we present the novel task of video-based fine-grained object classification, propose a corresponding new video dataset, and perform a systematic study of several recent deep convolutio… ▽ More Fine-grained classification is a relatively new field that has concentrated on using information from a single image, while ignoring the enormous potential of using video data to improve classification. In this work we present the novel task of video-based fine-grained object classification, propose a corresponding new video dataset, and perform a systematic study of several recent deep convolutional neural network (DCNN) based approaches, which we specifically adapt to the task. We evaluate three-dimensional DCNNs, two-stream DCNNs, and bilinear DCNNs. Two forms of the two-stream approach are used, where spatial and temporal data from two independent DCNNs are fused either via early fusion (combination of the fully-connected layers) and late fusion (concatenation of the softmax outputs of the DCNNs). For bilinear DCNNs, information from the convolutional layers of the spatial and temporal DCNNs is combined via local co-occurrences. We then fuse the bilinear DCNN and early fusion of the two-stream approach to combine the spatial and temporal information at the local and global level (Spatio-Temporal Co-occurrence). Using the new and challenging video dataset of birds, classification performance is improved from 23.1% (using single images) to 41.1% when using the Spatio-Temporal Co-occurrence system. Incorporating automatically detected bounding box location further improves the classification accuracy to 53.6%. △ Less

Submitted 24 October, 2016; v1 submitted 1 August, 2016; originally announced August 2016.

Comments: International Conference on Digital Image Computing: Techniques and Applications, 2016

ACM Class: I.2.6, I.4, I.5

arXiv:1511.09209 [pdf, other]

Fine-Grained Classification via Mixture of Deep Convolutional Neural Networks

Authors: ZongYuan Ge, Alex Bewley, Christopher McCool, Ben Upcroft, Peter Corke, Conrad Sanderson

Abstract: We present a novel deep convolutional neural network (DCNN) system for fine-grained image classification, called a mixture of DCNNs (MixDCNN). The fine-grained image classification problem is characterised by large intra-class variations and small inter-class variations. To overcome these problems our proposed MixDCNN system partitions images into K subsets of similar images and learns an expert D… ▽ More We present a novel deep convolutional neural network (DCNN) system for fine-grained image classification, called a mixture of DCNNs (MixDCNN). The fine-grained image classification problem is characterised by large intra-class variations and small inter-class variations. To overcome these problems our proposed MixDCNN system partitions images into K subsets of similar images and learns an expert DCNN for each subset. The output from each of the K DCNNs is combined to form a single classification decision. In contrast to previous techniques, we provide a formulation to perform joint end-to-end training of the K DCNNs simultaneously. Extensive experiments, on three datasets using two network structures (AlexNet and GoogLeNet), show that the proposed MixDCNN system consistently outperforms other methods. It provides a relative improvement of 12.7% and achieves state-of-the-art results on two datasets. △ Less

Submitted 30 November, 2015; originally announced November 2015.

arXiv:1511.03791 [pdf, other]

Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control

Authors: Fangyi Zhang, Jürgen Leitner, Michael Milford, Ben Upcroft, Peter Corke

Abstract: This paper introduces a machine learning based system for controlling a robotic manipulator with visual perception only. The capability to autonomously learn robot controllers solely from raw-pixel images and without any prior knowledge of configuration is shown for the first time. We build upon the success of recent deep reinforcement learning and develop a system for learning target reaching wit… ▽ More This paper introduces a machine learning based system for controlling a robotic manipulator with visual perception only. The capability to autonomously learn robot controllers solely from raw-pixel images and without any prior knowledge of configuration is shown for the first time. We build upon the success of recent deep reinforcement learning and develop a system for learning target reaching with a three-joint robot manipulator using external visual observation. A Deep Q Network (DQN) was demonstrated to perform target reaching after training in simulation. Transferring the network to real hardware and real observation in a naive approach failed, but experiments show that the network works when replacing camera images with synthetic images. △ Less

Submitted 13 November, 2015; v1 submitted 12 November, 2015; originally announced November 2015.

Comments: 8 pages, to appear in the proceedings of Australasian Conference on Robotics and Automation (ACRA) 2015

Showing 1–50 of 53 results for author: Corke, P