Search | arXiv e-print repository

The Limits and Potentials of Deep Learning for Robotics

Authors: Niko Sünderhauf, Oliver Brock, Walter Scheirer, Raia Hadsell, Dieter Fox, Jürgen Leitner, Ben Upcroft, Pieter Abbeel, Wolfram Burgard, Michael Milford, Peter Corke

Abstract: The application of deep learning in robotics leads to very specific problems and research questions that are typically not addressed by the computer vision and machine learning communities. In this paper we discuss a number of robotics-specific learning, reasoning, and embodiment challenges for deep learning. We explain the need for better evaluation metrics, highlight the importance and unique ch… ▽ More The application of deep learning in robotics leads to very specific problems and research questions that are typically not addressed by the computer vision and machine learning communities. In this paper we discuss a number of robotics-specific learning, reasoning, and embodiment challenges for deep learning. We explain the need for better evaluation metrics, highlight the importance and unique challenges for deep robotic learning in simulation, and explore the spectrum between purely data-driven and model-driven approaches. We hope this paper provides a motivating overview of important research directions to overcome the current limitations, and help fulfill the promising potentials of deep learning in robotics. △ Less

Submitted 18 April, 2018; originally announced April 2018.

arXiv:1706.06718 [pdf, other]

Multi-Modal Trip Hazard Affordance Detection On Construction Sites

Authors: Sean McMahon, Niko Sünderhauf, Ben Upcroft, Michael Milford

Abstract: Trip hazards are a significant contributor to accidents on construction and manufacturing sites, where over a third of Australian workplace injuries occur [1]. Current safety inspections are labour intensive and limited by human fallibility,making automation of trip hazard detection appealing from both a safety and economic perspective. Trip hazards present an interesting challenge to modern learn… ▽ More Trip hazards are a significant contributor to accidents on construction and manufacturing sites, where over a third of Australian workplace injuries occur [1]. Current safety inspections are labour intensive and limited by human fallibility,making automation of trip hazard detection appealing from both a safety and economic perspective. Trip hazards present an interesting challenge to modern learning techniques because they are defined as much by affordance as by object type; for example wires on a table are not a trip hazard, but can be if lying on the ground. To address these challenges, we conduct a comprehensive investigation into the performance characteristics of 11 different colour and depth fusion approaches, including 4 fusion and one non fusion approach; using colour and two types of depth images. Trained and tested on over 600 labelled trip hazards over 4 floors and 2000m$\mathrm{^{2}}$ in an active construction site,this approach was able to differentiate between identical objects in different physical configurations (see Figure 1). Outperforming a colour-only detector, our multi-modal trip detector fuses colour and depth information to achieve a 4% absolute improvement in F1-score. These investigative results and the extensive publicly available dataset moves us one step closer to assistive or fully automated safety inspection systems on construction sites. △ Less

Submitted 20 June, 2017; originally announced June 2017.

Comments: 9 Pages, 12 Figures, 2 Tables, Accepted to Robotics and Automation Letters (RA-L)

arXiv:1701.08608 [pdf, other]

doi 10.1109/LRA.2017.2651952

Peduncle Detection of Sweet Pepper for Autonomous Crop Harvesting - Combined Colour and 3D Information

Authors: Inkyu Sa, Chris Lehnert, Andrew English, Chris McCool, Feras Dayoub, Ben Upcroft, Tristan Perez

Abstract: This paper presents a 3D visual detection method for the challenging task of detecting peduncles of sweet peppers (Capsicum annuum) in the field. Cutting the peduncle cleanly is one of the most difficult stages of the harvesting process, where the peduncle is the part of the crop that attaches it to the main stem of the plant. Accurate peduncle detection in 3D space is therefore a vital step in re… ▽ More This paper presents a 3D visual detection method for the challenging task of detecting peduncles of sweet peppers (Capsicum annuum) in the field. Cutting the peduncle cleanly is one of the most difficult stages of the harvesting process, where the peduncle is the part of the crop that attaches it to the main stem of the plant. Accurate peduncle detection in 3D space is therefore a vital step in reliable autonomous harvesting of sweet peppers, as this can lead to precise cutting while avoiding damage to the surrounding plant. This paper makes use of both colour and geometry information acquired from an RGB-D sensor and utilises a supervised-learning approach for the peduncle detection task. The performance of the proposed method is demonstrated and evaluated using qualitative and quantitative results (the Area-Under-the-Curve (AUC) of the detection precision-recall curve). We are able to achieve an AUC of 0.71 for peduncle detection on field-grown sweet peppers. We release a set of manually annotated 3D sweet pepper and peduncle images to assist the research community in performing further research on this topic. △ Less

Submitted 30 January, 2017; originally announced January 2017.

Comments: 8 pages, 14 figures, Robotics and Automation Letters

arXiv:1701.05105 [pdf]

Deep Learning Features at Scale for Visual Place Recognition

Authors: Zetao Chen, Adam Jacobson, Niko Sunderhauf, Ben Upcroft, Lingqiao Liu, Chunhua Shen, Ian Reid, Michael Milford

Abstract: The success of deep learning techniques in the computer vision domain has triggered a range of initial investigations into their utility for visual place recognition, all using generic features from networks that were trained for other types of recognition tasks. In this paper, we train, at large scale, two CNN architectures for the specific place recognition task and employ a multi-scale feature… ▽ More The success of deep learning techniques in the computer vision domain has triggered a range of initial investigations into their utility for visual place recognition, all using generic features from networks that were trained for other types of recognition tasks. In this paper, we train, at large scale, two CNN architectures for the specific place recognition task and employ a multi-scale feature encoding method to generate condition- and viewpoint-invariant features. To enable this training to occur, we have developed a massive Specific PlacEs Dataset (SPED) with hundreds of examples of place appearance change at thousands of different places, as opposed to the semantic place type datasets currently available. This new dataset enables us to set up a training regime that interprets place recognition as a classification problem. We comprehensively evaluate our trained networks on several challenging benchmark place recognition datasets and demonstrate that they achieve an average 10% increase in performance over other place recognition algorithms and pre-trained CNNs. By analyzing the network responses and their differences from pre-trained networks, we provide insights into what a network learns when training for place recognition, and what these results signify for future research in this area. △ Less

Submitted 18 January, 2017; originally announced January 2017.

Comments: 8 pages, 10 figures. Accepted by International Conference on Robotics and Automation (ICRA) 2017. This is the submitted version. The final published version may be slightly different

arXiv:1701.04925 [pdf]

doi 10.1109/ICRA.2017.7989361

Action Recognition: From Static Datasets to Moving Robots

Authors: Fahimeh Rezazadegan, Sareh Shirazi, Ben Upcroft, Michael Milford

Abstract: Deep learning models have achieved state-of-the- art performance in recognizing human activities, but often rely on utilizing background cues present in typical computer vision datasets that predominantly have a stationary camera. If these models are to be employed by autonomous robots in real world environments, they must be adapted to perform independently of background cues and camera motion ef… ▽ More Deep learning models have achieved state-of-the- art performance in recognizing human activities, but often rely on utilizing background cues present in typical computer vision datasets that predominantly have a stationary camera. If these models are to be employed by autonomous robots in real world environments, they must be adapted to perform independently of background cues and camera motion effects. To address these challenges, we propose a new method that firstly generates generic action region proposals with good potential to locate one human action in unconstrained videos regardless of camera motion and then uses action proposals to extract and classify effective shape and motion features by a ConvNet framework. In a range of experiments, we demonstrate that by actively proposing action regions during both training and testing, state-of-the-art or better performance is achieved on benchmarks. We show the outperformance of our approach compared to the state-of-the-art in two new datasets; one emphasizes on irrelevant background, the other highlights the camera motion. We also validate our action recognition method in an abnormal behavior detection scenario to improve workplace safety. The results verify a higher success rate for our method due to the ability of our system to recognize human actions regardless of environment and camera motion. △ Less

Submitted 17 January, 2017; originally announced January 2017.

Journal ref: Robotics and Automation (ICRA), 2017 IEEE International Conference on

arXiv:1609.05258 [pdf, other]

The ACRV Picking Benchmark (APB): A Robotic Shelf Picking Benchmark to Foster Reproducible Research

Authors: Jürgen Leitner, Adam W. Tow, Jake E. Dean, Niko Suenderhauf, Joseph W. Durham, Matthew Cooper, Markus Eich, Christopher Lehnert, Ruben Mangels, Christopher McCool, Peter Kujala, Lachlan Nicholson, Trung Pham, James Sergeant, Liao Wu, Fangyi Zhang, Ben Upcroft, Peter Corke

Abstract: Robotic challenges like the Amazon Picking Challenge (APC) or the DARPA Challenges are an established and important way to drive scientific progress. They make research comparable on a well-defined benchmark with equal test conditions for all participants. However, such challenge events occur only occasionally, are limited to a small number of contestants, and the test conditions are very difficul… ▽ More Robotic challenges like the Amazon Picking Challenge (APC) or the DARPA Challenges are an established and important way to drive scientific progress. They make research comparable on a well-defined benchmark with equal test conditions for all participants. However, such challenge events occur only occasionally, are limited to a small number of contestants, and the test conditions are very difficult to replicate after the main event. We present a new physical benchmark challenge for robotic picking: the ACRV Picking Benchmark (APB). Designed to be reproducible, it consists of a set of 42 common objects, a widely available shelf, and exact guidelines for object arrangement using stencils. A well-defined evaluation protocol enables the comparison of \emph{complete} robotic systems -- including perception and manipulation -- instead of sub-systems only. Our paper also describes and reports results achieved by an open baseline system based on a Baxter robot. △ Less

Submitted 14 December, 2016; v1 submitted 16 September, 2016; originally announced September 2016.

Comments: 8 pages, submitted to RA:Letters

arXiv:1602.00763 [pdf, ps, other]

doi 10.1109/ICIP.2016.7533003

Simple Online and Realtime Tracking

Authors: Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos, Ben Upcroft

Abstract: This paper explores a pragmatic approach to multiple object tracking where the main focus is to associate objects efficiently for online and realtime applications. To this end, detection quality is identified as a key factor influencing tracking performance, where changing the detector can improve tracking by up to 18.9%. Despite only using a rudimentary combination of familiar techniques such as… ▽ More This paper explores a pragmatic approach to multiple object tracking where the main focus is to associate objects efficiently for online and realtime applications. To this end, detection quality is identified as a key factor influencing tracking performance, where changing the detector can improve tracking by up to 18.9%. Despite only using a rudimentary combination of familiar techniques such as the Kalman Filter and Hungarian algorithm for the tracking components, this approach achieves an accuracy comparable to state-of-the-art online trackers. Furthermore, due to the simplicity of our tracking method, the tracker updates at a rate of 260 Hz which is over 20x faster than other state-of-the-art trackers. △ Less

Submitted 7 July, 2017; v1 submitted 1 February, 2016; originally announced February 2016.

Comments: Presented at ICIP 2016, code is available at https://github.com/abewley/sort

arXiv:1512.03424 [pdf]

Evaluation of Object Detection Proposals Under Condition Variations

Authors: Fahimeh Rezazadegan, Sareh Shirazi, Michael Milford, Ben Upcroft

Abstract: Object detection is a fundamental task in many computer vision applications, therefore the importance of evaluating the quality of object detection is well acknowledged in this domain. This process gives insight into the capabilities of methods in handling environmental changes. In this paper, a new method for object detection is introduced that combines the Selective Search and EdgeBoxes. We test… ▽ More Object detection is a fundamental task in many computer vision applications, therefore the importance of evaluating the quality of object detection is well acknowledged in this domain. This process gives insight into the capabilities of methods in handling environmental changes. In this paper, a new method for object detection is introduced that combines the Selective Search and EdgeBoxes. We tested these three methods under environmental variations. Our experiments demonstrate the outperformance of the combination method under illumination and view point variations. △ Less

Submitted 10 December, 2015; originally announced December 2015.

Comments: 2 pages, 6 figures, CVPR Workshop, 2015

arXiv:1511.09209 [pdf, other]

Fine-Grained Classification via Mixture of Deep Convolutional Neural Networks

Authors: ZongYuan Ge, Alex Bewley, Christopher McCool, Ben Upcroft, Peter Corke, Conrad Sanderson

Abstract: We present a novel deep convolutional neural network (DCNN) system for fine-grained image classification, called a mixture of DCNNs (MixDCNN). The fine-grained image classification problem is characterised by large intra-class variations and small inter-class variations. To overcome these problems our proposed MixDCNN system partitions images into K subsets of similar images and learns an expert D… ▽ More We present a novel deep convolutional neural network (DCNN) system for fine-grained image classification, called a mixture of DCNNs (MixDCNN). The fine-grained image classification problem is characterised by large intra-class variations and small inter-class variations. To overcome these problems our proposed MixDCNN system partitions images into K subsets of similar images and learns an expert DCNN for each subset. The output from each of the K DCNNs is combined to form a single classification decision. In contrast to previous techniques, we provide a formulation to perform joint end-to-end training of the K DCNNs simultaneously. Extensive experiments, on three datasets using two network structures (AlexNet and GoogLeNet), show that the proposed MixDCNN system consistently outperforms other methods. It provides a relative improvement of 12.7% and achieves state-of-the-art results on two datasets. △ Less

Submitted 30 November, 2015; originally announced November 2015.

arXiv:1511.03791 [pdf, other]

Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control

Authors: Fangyi Zhang, Jürgen Leitner, Michael Milford, Ben Upcroft, Peter Corke

Abstract: This paper introduces a machine learning based system for controlling a robotic manipulator with visual perception only. The capability to autonomously learn robot controllers solely from raw-pixel images and without any prior knowledge of configuration is shown for the first time. We build upon the success of recent deep reinforcement learning and develop a system for learning target reaching wit… ▽ More This paper introduces a machine learning based system for controlling a robotic manipulator with visual perception only. The capability to autonomously learn robot controllers solely from raw-pixel images and without any prior knowledge of configuration is shown for the first time. We build upon the success of recent deep reinforcement learning and develop a system for learning target reaching with a three-joint robot manipulator using external visual observation. A Deep Q Network (DQN) was demonstrated to perform target reaching after training in simulation. Transferring the network to real hardware and real observation in a naive approach failed, but experiments show that the network works when replacing camera images with synthetic images. △ Less

Submitted 13 November, 2015; v1 submitted 12 November, 2015; originally announced November 2015.

Comments: 8 pages, to appear in the proceedings of Australasian Conference on Robotics and Automation (ACRA) 2015

arXiv:1507.02428 [pdf, other]

Place Categorization and Semantic Mapping on a Mobile Robot

Authors: Niko Sünderhauf, Feras Dayoub, Sean McMahon, Ben Talbot, Ruth Schulz, Peter Corke, Gordon Wyeth, Ben Upcroft, Michael Milford

Abstract: In this paper we focus on the challenging problem of place categorization and semantic mapping on a robot without environment-specific training. Motivated by their ongoing success in various visual recognition tasks, we build our system upon a state-of-the-art convolutional network. We overcome its closed-set limitations by complementing the network with a series of one-vs-all classifiers that can… ▽ More In this paper we focus on the challenging problem of place categorization and semantic mapping on a robot without environment-specific training. Motivated by their ongoing success in various visual recognition tasks, we build our system upon a state-of-the-art convolutional network. We overcome its closed-set limitations by complementing the network with a series of one-vs-all classifiers that can learn to recognize new semantic classes online. Prior domain knowledge is incorporated by embedding the classification system into a Bayesian filter framework that also ensures temporal coherence. We evaluate the classification accuracy of the system on a robot that maps a variety of places on our campus in real-time. We show how semantic information can boost robotic object detection performance and how the semantic map can be used to modulate the robot's behaviour during navigation tasks. The system is made available to the community as a ROS module. △ Less

Submitted 9 July, 2015; originally announced July 2015.

arXiv:1501.04158 [pdf, other]

On the Performance of ConvNet Features for Place Recognition

Authors: Niko Sünderhauf, Feras Dayoub, Sareh Shirazi, Ben Upcroft, Michael Milford

Abstract: After the incredible success of deep learning in the computer vision domain, there has been much interest in applying Convolutional Network (ConvNet) features in robotic fields such as visual navigation and SLAM. Unfortunately, there are fundamental differences and challenges involved. Computer vision datasets are very different in character to robotic camera data, real-time performance is essenti… ▽ More After the incredible success of deep learning in the computer vision domain, there has been much interest in applying Convolutional Network (ConvNet) features in robotic fields such as visual navigation and SLAM. Unfortunately, there are fundamental differences and challenges involved. Computer vision datasets are very different in character to robotic camera data, real-time performance is essential, and performance priorities can be different. This paper comprehensively evaluates and compares the utility of three state-of-the-art ConvNets on the problems of particular relevance to navigation for robots; viewpoint-invariance and condition-invariance, and for the first time enables real-time place recognition performance using ConvNets with large maps by integrating a variety of existing (locality-sensitive hashing) and novel (semantic search space partitioning) optimization techniques. We present extensive experiments on four real world datasets cultivated to evaluate each of the specific challenges in place recognition. The results demonstrate that speed-ups of two orders of magnitude can be achieved with minimal accuracy degradation, enabling real-time performance. We confirm that networks trained for semantic place categorization also perform better at (specific) place recognition when faced with severe appearance changes and provide a reference for which networks and layers are optimal for different aspects of the place recognition problem. △ Less

Submitted 28 July, 2015; v1 submitted 17 January, 2015; originally announced January 2015.

arXiv:cond-mat/0406015 [pdf, ps, other]

doi 10.1088/0953-4075/37/14/009

Foil-based atom chip for Bose-Einstein condensates

Authors: C. J. Vale, B. Upcroft, M. J. Davis, N. R. Heckenberg, H. Rubinsztein-Dunlop

Abstract: We describe a novel method of fabricating atom chips that are well suited to the production and manipulation of atomic Bose-Einstein condensates. Our chip was created using a silver foil and simple micro-cutting techniques without the need for photolithography. It can sustain larger currents than conventional chips, and is compatible with the patterning of complex trapping potentials. A near pur… ▽ More We describe a novel method of fabricating atom chips that are well suited to the production and manipulation of atomic Bose-Einstein condensates. Our chip was created using a silver foil and simple micro-cutting techniques without the need for photolithography. It can sustain larger currents than conventional chips, and is compatible with the patterning of complex trapping potentials. A near pure Bose-Einstein condensate of 4 $\times$ 10$^4$ $^{87}$Rb atoms has been created in a magnetic microtrap formed by currents through wires on the chip. We have observed the fragmentation of atom clouds in close proximity to the silver conductors. The fragmentation has different characteristic features to those seen with copper conductors. △ Less

Submitted 1 June, 2004; originally announced June 2004.

Comments: 10 pages, 5 figures, uses iopart.cls

Journal ref: J. Phys. B 37, 2959-2967 (2004)

Showing 1–13 of 13 results for author: Upcroft, B