Search | arXiv e-print repository

Label-Efficient 3D Object Detection For Road-Side Units

Authors: Minh-Quan Dao, Holger Caesar, Julie Stephany Berrio, Mao Shan, Stewart Worrall, Vincent Frémont, Ezio Malis

Abstract: Occlusion presents a significant challenge for safety-critical applications such as autonomous driving. Collaborative perception has recently attracted a large research interest thanks to the ability to enhance the perception of autonomous vehicles via deep information fusion with intelligent roadside units (RSU), thus minimizing the impact of occlusion. While significant advancement has been made… ▽ More Occlusion presents a significant challenge for safety-critical applications such as autonomous driving. Collaborative perception has recently attracted a large research interest thanks to the ability to enhance the perception of autonomous vehicles via deep information fusion with intelligent roadside units (RSU), thus minimizing the impact of occlusion. While significant advancement has been made, the data-hungry nature of these methods creates a major hurdle for their real-world deployment, particularly due to the need for annotated RSU data. Manually annotating the vast amount of RSU data required for training is prohibitively expensive, given the sheer number of intersections and the effort involved in annotating point clouds. We address this challenge by devising a label-efficient object detection method for RSU based on unsupervised object discovery. Our paper introduces two new modules: one for object discovery based on a spatial-temporal aggregation of point clouds, and another for refinement. Furthermore, we demonstrate that fine-tuning on a small portion of annotated data allows our object discovery models to narrow the performance gap with, or even surpass, fully supervised models. Extensive experiments are carried out in simulated and real-world datasets to evaluate our method. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: IV 2024

arXiv:2403.01644 [pdf, other]

OccFusion: Multi-Sensor Fusion Framework for 3D Semantic Occupancy Prediction

Authors: Zhenxing Ming, Julie Stephany Berrio, Mao Shan, Stewart Worrall

Abstract: A comprehensive understanding of 3D scenes is crucial in autonomous vehicles (AVs), and recent models for 3D semantic occupancy prediction have successfully addressed the challenge of describing real-world objects with varied shapes and classes. However, existing methods for 3D occupancy prediction heavily rely on surround-view camera images, making them susceptible to changes in lighting and weat… ▽ More A comprehensive understanding of 3D scenes is crucial in autonomous vehicles (AVs), and recent models for 3D semantic occupancy prediction have successfully addressed the challenge of describing real-world objects with varied shapes and classes. However, existing methods for 3D occupancy prediction heavily rely on surround-view camera images, making them susceptible to changes in lighting and weather conditions. This paper introduces OccFusion, a novel sensor fusion framework for predicting 3D occupancy. By integrating features from additional sensors, such as lidar and surround view radars, our framework enhances the accuracy and robustness of occupancy prediction, resulting in top-tier performance on the nuScenes benchmark. Furthermore, extensive experiments conducted on the nuScenes and semanticKITTI dataset, including challenging night and rainy scenarios, confirm the superior performance of our sensor fusion strategy across various perception ranges. The code for this framework will be made available at https://github.com/DanielMing123/OccFusion. △ Less

Submitted 9 May, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

arXiv:2401.12422 [pdf, other]

InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction

Authors: Zhenxing Ming, Julie Stephany Berrio, Mao Shan, Stewart Worrall

Abstract: This paper introduces InverseMatrixVT3D, an efficient method for transforming multi-view image features into 3D feature volumes for 3D semantic occupancy prediction. Existing methods for constructing 3D volumes often rely on depth estimation, device-specific operators, or transformer queries, which hinders the widespread adoption of 3D occupancy models. In contrast, our approach leverages two proj… ▽ More This paper introduces InverseMatrixVT3D, an efficient method for transforming multi-view image features into 3D feature volumes for 3D semantic occupancy prediction. Existing methods for constructing 3D volumes often rely on depth estimation, device-specific operators, or transformer queries, which hinders the widespread adoption of 3D occupancy models. In contrast, our approach leverages two projection matrices to store the static mapping relationships and matrix multiplications to efficiently generate global Bird's Eye View (BEV) features and local 3D feature volumes. Specifically, we achieve this by performing matrix multiplications between multi-view image feature maps and two sparse projection matrices. We introduce a sparse matrix handling technique for the projection matrices to optimize GPU memory usage. Moreover, a global-local attention fusion module is proposed to integrate the global BEV features with the local 3D feature volumes to obtain the final 3D volume. We also employ a multi-scale supervision mechanism to enhance performance further. Extensive experiments performed on the nuScenes and SemanticKITTI datasets reveal that our approach not only stands out for its simplicity and effectiveness but also achieves the top performance in detecting vulnerable road users (VRU), crucial for autonomous driving and road safety. The code has been made available at: https://github.com/DanielMing123/InverseMatrixVT3D △ Less

Submitted 29 April, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

arXiv:2310.11608 [pdf, other]

Classification of Safety Driver Attention During Autonomous Vehicle Operation

Authors: Santiago Gerling Konrad, Julie Stephany Berrio, Mao Shan, Favio Masson, Stewart Worrall

Abstract: Despite the continual advances in Advanced Driver Assistance Systems (ADAS) and the development of high-level autonomous vehicles (AV), there is a general consensus that for the short to medium term, there is a requirement for a human supervisor to handle the edge cases that inevitably arise. Given this requirement, it is essential that the state of the vehicle operator is monitored to ensure they… ▽ More Despite the continual advances in Advanced Driver Assistance Systems (ADAS) and the development of high-level autonomous vehicles (AV), there is a general consensus that for the short to medium term, there is a requirement for a human supervisor to handle the edge cases that inevitably arise. Given this requirement, it is essential that the state of the vehicle operator is monitored to ensure they are contributing to the vehicle's safe operation. This paper introduces a dual-source approach integrating data from an infrared camera facing the vehicle operator and vehicle perception systems to produce a metric for driver alertness in order to promote and ensure safe operator behaviour. The infrared camera detects the driver's head, enabling the calculation of head orientation, which is relevant as the head typically moves according to the individual's focus of attention. By incorporating environmental data from the perception system, it becomes possible to determine whether the vehicle operator observes objects in the surroundings. Experiments were conducted using data collected in Sydney, Australia, simulating AV operations in an urban environment. Our results demonstrate that the proposed system effectively determines a metric for the attention levels of the vehicle operator, enabling interventions such as warnings or reducing autonomous functionality as appropriate. This comprehensive solution shows promise in contributing to ADAS and AVs' overall safety and efficiency in a real-world setting. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2308.05988 [pdf, other]

doi 10.1109/TIV.2024.3441527

MS3D++: Ensemble of Experts for Multi-Source Unsupervised Domain Adaptation in 3D Object Detection

Authors: Darren Tsai, Julie Stephany Berrio, Mao Shan, Eduardo Nebot, Stewart Worrall

Abstract: Deploying 3D detectors in unfamiliar domains has been demonstrated to result in a significant 70-90% drop in detection rate due to variations in lidar, geography, or weather from their training dataset. This domain gap leads to missing detections for densely observed objects, misaligned confidence scores, and increased high-confidence false positives, rendering the detector highly unreliable. To a… ▽ More Deploying 3D detectors in unfamiliar domains has been demonstrated to result in a significant 70-90% drop in detection rate due to variations in lidar, geography, or weather from their training dataset. This domain gap leads to missing detections for densely observed objects, misaligned confidence scores, and increased high-confidence false positives, rendering the detector highly unreliable. To address this, we introduce MS3D++, a self-training framework for multi-source unsupervised domain adaptation in 3D object detection. MS3D++ generates high-quality pseudo-labels, allowing 3D detectors to achieve high performance on a range of lidar types, regardless of their density. Our approach effectively fuses predictions of an ensemble of multi-frame pre-trained detectors from different source domains to improve domain generalization. We subsequently refine predictions temporally to ensure temporal consistency in box localization and object classification. Furthermore, we present an in-depth study into the performance and idiosyncrasies of various 3D detector components in a cross-domain context, providing valuable insights for improved cross-domain detector ensembling. Experimental results on Waymo, nuScenes and Lyft demonstrate that detectors trained with MS3D++ pseudo-labels achieve state-of-the-art performance, comparable to training with human-annotated labels in Bird's Eye View (BEV) evaluation for both low and high density lidar. Code is available at https://github.com/darrenjkt/MS3D △ Less

Submitted 4 September, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

arXiv:2307.07196 [pdf, other]

LightFormer: An End-to-End Model for Intersection Right-of-Way Recognition Using Traffic Light Signals and an Attention Mechanism

Authors: Zhenxing Ming, Julie Stephany Berrio, Mao Shan, Eduardo Nebot, Stewart Worrall

Abstract: For smart vehicles driving through signalised intersections, it is crucial to determine whether the vehicle has right of way given the state of the traffic lights. To address this issue, camera based sensors can be used to determine whether the vehicle has permission to proceed straight, turn left or turn right. This paper proposes a novel end to end intersection right of way recognition model cal… ▽ More For smart vehicles driving through signalised intersections, it is crucial to determine whether the vehicle has right of way given the state of the traffic lights. To address this issue, camera based sensors can be used to determine whether the vehicle has permission to proceed straight, turn left or turn right. This paper proposes a novel end to end intersection right of way recognition model called LightFormer to generate right of way status for available driving directions in complex urban intersections. The model includes a spatial temporal inner structure with an attention mechanism, which incorporates features from past image to contribute to the classification of the current frame right of way status. In addition, a modified, multi weight arcface loss is introduced to enhance the model classification performance. Finally, the proposed LightFormer is trained and tested on two public traffic light datasets with manually augmented labels to demonstrate its effectiveness. △ Less

Submitted 14 July, 2023; originally announced July 2023.

arXiv:2307.01462 [pdf, other]

Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection

Authors: Minh-Quan Dao, Julie Stephany Berrio, Vincent Frémont, Mao Shan, Elwan Héry, Stewart Worrall

Abstract: Occlusion is a major challenge for LiDAR-based object detection methods. This challenge becomes safety-critical in urban traffic where the ego vehicle must have reliable object detection to avoid collision while its field of view is severely reduced due to the obstruction posed by a large number of road users. Collaborative perception via Vehicle-to-Everything (V2X) communication, which leverages… ▽ More Occlusion is a major challenge for LiDAR-based object detection methods. This challenge becomes safety-critical in urban traffic where the ego vehicle must have reliable object detection to avoid collision while its field of view is severely reduced due to the obstruction posed by a large number of road users. Collaborative perception via Vehicle-to-Everything (V2X) communication, which leverages the diverse perspective thanks to the presence at multiple locations of connected agents to form a complete scene representation, is an appealing solution. State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach where the Bird-Eye View images of point clouds are exchanged so that the bandwidth consumption is lower than communicating point clouds as in early collaboration, and the detection performance is higher than late collaboration, which fuses agents' output, thanks to a deeper interaction among connected agents. While achieving strong performance, the real-world deployment of most mid-collaboration approaches is hindered by their overly complicated architectures, involving learnable collaboration graphs and autoencoder-based compressor/ decompressor, and unrealistic assumptions about inter-agent synchronization. In this work, we devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior state-of-the-art methods while minimizing changes made to the single-vehicle detection models and relaxing unrealistic assumptions on inter-agent synchronization. Experiments on the V2X-Sim dataset show that our collaboration method achieves 98\% of the performance of an early-collaboration method, while only consuming the equivalent bandwidth of a late-collaboration method. △ Less

Submitted 19 September, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: The code is available at https://github.com/quan-dao/practical-collab-perception

arXiv:2304.02431 [pdf, other]

MS3D: Leveraging Multiple Detectors for Unsupervised Domain Adaptation in 3D Object Detection

Authors: Darren Tsai, Julie Stephany Berrio, Mao Shan, Eduardo Nebot, Stewart Worrall

Abstract: We introduce Multi-Source 3D (MS3D), a new self-training pipeline for unsupervised domain adaptation in 3D object detection. Despite the remarkable accuracy of 3D detectors, they often overfit to specific domain biases, leading to suboptimal performance in various sensor setups and environments. Existing methods typically focus on adapting a single detector to the target domain, overlooking the fa… ▽ More We introduce Multi-Source 3D (MS3D), a new self-training pipeline for unsupervised domain adaptation in 3D object detection. Despite the remarkable accuracy of 3D detectors, they often overfit to specific domain biases, leading to suboptimal performance in various sensor setups and environments. Existing methods typically focus on adapting a single detector to the target domain, overlooking the fact that different detectors possess distinct expertise on different unseen domains. MS3D leverages this by combining different pre-trained detectors from multiple source domains and incorporating temporal information to produce high-quality pseudo-labels for fine-tuning. Our proposed Kernel-Density Estimation (KDE) Box Fusion method fuses box proposals from multiple domains to obtain pseudo-labels that surpass the performance of the best source domain detectors. MS3D exhibits greater robustness to domain shift and produces accurate pseudo-labels over greater distances, making it well-suited for high-to-low beam domain adaptation and vice versa. Our method achieved state-of-the-art performance on all evaluated datasets, and we demonstrate that the pre-trained detector's source dataset has minimal impact on the fine-tuned result, making MS3D suitable for real-world applications. △ Less

Submitted 8 May, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

Comments: Our code is available at https://github.com/darrenjkt/MS3D

arXiv:2209.06407 [pdf, other]

Viewer-Centred Surface Completion for Unsupervised Domain Adaptation in 3D Object Detection

Authors: Darren Tsai, Julie Stephany Berrio, Mao Shan, Eduardo Nebot, Stewart Worrall

Abstract: Every autonomous driving dataset has a different configuration of sensors, originating from distinct geographic regions and covering various scenarios. As a result, 3D detectors tend to overfit the datasets they are trained on. This causes a drastic decrease in accuracy when the detectors are trained on one dataset and tested on another. We observe that lidar scan pattern differences form a large… ▽ More Every autonomous driving dataset has a different configuration of sensors, originating from distinct geographic regions and covering various scenarios. As a result, 3D detectors tend to overfit the datasets they are trained on. This causes a drastic decrease in accuracy when the detectors are trained on one dataset and tested on another. We observe that lidar scan pattern differences form a large component of this reduction in performance. We address this in our approach, SEE-VCN, by designing a novel viewer-centred surface completion network (VCN) to complete the surfaces of objects of interest within an unsupervised domain adaptation framework, SEE. With SEE-VCN, we obtain a unified representation of objects across datasets, allowing the network to focus on learning geometry, rather than overfitting on scan patterns. By adopting a domain-invariant representation, SEE-VCN can be classed as a multi-target domain adaptation approach where no annotations or re-training is required to obtain 3D detections for new scan patterns. Through extensive experiments, we show that our approach outperforms previous domain adaptation methods in multiple domain adaptation settings. Our code and data are available at https://github.com/darrenjkt/SEE-VCN. △ Less

Submitted 14 September, 2022; originally announced September 2022.

arXiv:2208.13329 [pdf, other]

Critical concrete scenario generation using scenario-based falsification

Authors: Dhanoop Karunakaran, Julie Stephany Berrio, Stewart Worrall, Eduardo Nebot

Abstract: Autonomous vehicles have the potential to lower the accident rate when compared to human driving. Moreover, it is the driving force of the automated vehicles' rapid development over the last few years. In the higher Society of Automotive Engineers (SAE) automation level, the vehicle's and passengers' safety responsibility is transferred from the driver to the automated system, so thoroughly valida… ▽ More Autonomous vehicles have the potential to lower the accident rate when compared to human driving. Moreover, it is the driving force of the automated vehicles' rapid development over the last few years. In the higher Society of Automotive Engineers (SAE) automation level, the vehicle's and passengers' safety responsibility is transferred from the driver to the automated system, so thoroughly validating such a system is essential. Recently, academia and industry have embraced scenario-based evaluation as the complementary approach to road testing, reducing the overall testing effort required. It is essential to determine the system's flaws before deploying it on public roads as there is no safety driver to guarantee the reliability of such a system. This paper proposes a Reinforcement Learning (RL) based scenario-based falsification method to search for a high-risk scenario in a pedestrian crossing traffic situation. We define a scenario as risky when a system under testing (SUT) does not satisfy the requirement. The reward function for our RL approach is based on Intel's Responsibility Sensitive Safety(RSS), Euclidean distance, and distance to a potential collision. △ Less

Submitted 28 August, 2022; originally announced August 2022.

Comments: Submitted to RASSE 2022

ACM Class: I.2

arXiv:2206.09744 [pdf, other]

Parameterisation of lane-change scenarios from real-world data

Authors: Dhanoop Karunakaran, Julie Stephany Berrio, Stewart Worrall, Eduardo Nebot

Abstract: Recent Autonomous Vehicles (AV) technology includes machine learning and probabilistic techniques that add significant complexity to the traditional verification and validation methods. The research community and industry have widely accepted scenario-based testing in the last few years. As it is focused directly on the relevant crucial road situations, it can reduce the effort required in testing… ▽ More Recent Autonomous Vehicles (AV) technology includes machine learning and probabilistic techniques that add significant complexity to the traditional verification and validation methods. The research community and industry have widely accepted scenario-based testing in the last few years. As it is focused directly on the relevant crucial road situations, it can reduce the effort required in testing. Encoding real-world traffic participants' behaviour is essential to efficiently assess the System Under Test (SUT) in scenario-based testing. So, it is necessary to capture the scenario parameters from the real-world data that can model scenarios realistically in simulation. The primary emphasis of the paper is to identify the list of meaningful parameters that adequately model real-world lane-change scenarios. With these parameters, it is possible to build a parameter space capable of generating a range of challenging scenarios for AV testing efficiently. We validate our approach using Root Mean Square Error(RMSE) to compare the scenarios generated using the proposed parameters against the real-world trajectory data. In addition to that, we demonstrate that adding a slight disturbance to a few scenario parameters can generate different scenarios and utilise Responsibility-Sensitive Safety (RSS) metric to measure the scenarios' risk. △ Less

Submitted 20 June, 2022; originally announced June 2022.

Comments: Accepted to IEEE ITSC 2022 conference

arXiv:2203.07521 [pdf, other]

Automatic lane change scenario extraction and generation of scenarios in OpenX format from real-world data

Authors: Dhanoop Karunakaran, Julie Stephany Berrio, Stewart Worrall, Eduardo Nebot

Abstract: Autonomous Vehicles (AV)'s wide-scale deployment appears imminent despite many safety challenges yet to be resolved. The modern autonomous vehicles will undoubtedly include machine learning and probabilistic techniques that add significant complexity to the traditional verification and validation methods. Road testing is essential before the deployment, but scenarios are repeatable, and it's hard… ▽ More Autonomous Vehicles (AV)'s wide-scale deployment appears imminent despite many safety challenges yet to be resolved. The modern autonomous vehicles will undoubtedly include machine learning and probabilistic techniques that add significant complexity to the traditional verification and validation methods. Road testing is essential before the deployment, but scenarios are repeatable, and it's hard to collect challenging events. Exploring numerous, diverse and crucial scenarios is a time-consuming and expensive approach. The research community and industry have widely accepted scenario-based testing in the last few years. As it is focused directly on the relevant critical road situations, it can reduce the effort required in testing. The scenario-based testing in simulation requires the realistic behaviour of the traffic participants to assess the System Under Test (SUT). It is essential to capture the scenarios from the real world to encode the behaviour of actual traffic participants. This paper proposes a novel scenario extraction method to capture the lane change scenarios using point-cloud data and object tracking information. This method enables fully automatic scenario extraction compared to similar approaches in this area. The generated scenarios are represented in OpenX format to reuse them in the SUT evaluation easily. The motivation of this framework is to build a validation dataset to generate many critical concrete scenarios. The code is available online at https://github.com/dkarunakaran/scenario_extraction_framework. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: Submitted to IEEE IV 22 conference

ACM Class: I.6.4

arXiv:2111.09450 [pdf, other]

doi 10.1109/LRA.2022.3185783

See Eye to Eye: A Lidar-Agnostic 3D Detection Framework for Unsupervised Multi-Target Domain Adaptation

Authors: Darren Tsai, Julie Stephany Berrio, Mao Shan, Stewart Worrall, Eduardo Nebot

Abstract: Sampling discrepancies between different manufacturers and models of lidar sensors result in inconsistent representations of objects. This leads to performance degradation when 3D detectors trained for one lidar are tested on other types of lidars. Remarkable progress in lidar manufacturing has brought about advances in mechanical, solid-state, and recently, adjustable scan pattern lidars. For the… ▽ More Sampling discrepancies between different manufacturers and models of lidar sensors result in inconsistent representations of objects. This leads to performance degradation when 3D detectors trained for one lidar are tested on other types of lidars. Remarkable progress in lidar manufacturing has brought about advances in mechanical, solid-state, and recently, adjustable scan pattern lidars. For the latter, existing works often require fine-tuning the model each time scan patterns are adjusted, which is infeasible. We explicitly deal with the sampling discrepancy by proposing a novel unsupervised multi-target domain adaptation framework, SEE, for transferring the performance of state-of-the-art 3D detectors across both fixed and flexible scan pattern lidars without requiring fine-tuning of models by end-users. Our approach interpolates the underlying geometry and normalizes the scan pattern of objects from different lidars before passing them to the detection network. We demonstrate the effectiveness of SEE on public datasets, achieving state-of-the-art results, and additionally provide quantitative results on a novel high-resolution lidar to prove the industry applications of our framework. △ Less

Submitted 10 April, 2023; v1 submitted 17 November, 2021; originally announced November 2021.

Comments: Published in RAL and presented in IROS 2022. Code is available at https://github.com/darrenjkt/SEE-MTDA

Journal ref: IEEE Robotics and Automation Letters (2022)

arXiv:2008.12449 [pdf, other]

Long-term map maintenance pipeline for autonomous vehicles

Authors: Julie Stephany Berrio, Stewart Worrall, Mao Shan, Eduardo Nebot

Abstract: For autonomous vehicles to operate persistently in a typical urban environment, it is essential to have high accuracy position information. This requires a mapping and localisation system that can adapt to changes over time. A localisation approach based on a single-survey map will not be suitable for long-term operation as it does not incorporate variations in the environment. In this paper, we p… ▽ More For autonomous vehicles to operate persistently in a typical urban environment, it is essential to have high accuracy position information. This requires a mapping and localisation system that can adapt to changes over time. A localisation approach based on a single-survey map will not be suitable for long-term operation as it does not incorporate variations in the environment. In this paper, we present new algorithms to maintain a featured-based map. A map maintenance pipeline is proposed that can continuously update a map with the most relevant features taking advantage of the changes in the surroundings. Our pipeline detects and removes transient features based on their geometrical relationships with the vehicle's pose. Newly identified features became part of a new feature map and are assessed by the pipeline as candidates for the localisation map. By purging out-of-date features and adding newly detected features, we continually update the prior map to more accurately represent the most recent environment. We have validated our approach using the USyd Campus Dataset, which includes more than 18 months of data. The results presented demonstrate that our maintenance pipeline produces a resilient map which can provide sustained localisation performance over time. △ Less

Submitted 27 August, 2020; originally announced August 2020.

Comments: Paper submitted to IEE ITS Transactions

MSC Class: 00-02 ACM Class: I.4

arXiv:2007.05490 [pdf, other]

Camera-Lidar Integration: Probabilistic sensor fusion for semantic mapping

Authors: Julie Stephany Berrio, Mao Shan, Stewart Worrall, Eduardo Nebot

Abstract: An automated vehicle operating in an urban environment must be able to perceive and recognise object/obstacles in a three-dimensional world while navigating in a constantly changing environment. In order to plan and execute accurate sophisticated driving maneuvers, a high-level contextual understanding of the surroundings is essential. Due to the recent progress in image processing, it is now poss… ▽ More An automated vehicle operating in an urban environment must be able to perceive and recognise object/obstacles in a three-dimensional world while navigating in a constantly changing environment. In order to plan and execute accurate sophisticated driving maneuvers, a high-level contextual understanding of the surroundings is essential. Due to the recent progress in image processing, it is now possible to obtain high definition semantic information in 2D from monocular cameras, though cameras cannot reliably provide the highly accurate 3D information provided by lasers. The fusion of these two sensor modalities can overcome the shortcomings of each individual sensor, though there are a number of important challenges that need to be addressed in a probabilistic manner. In this paper, we address the common, yet challenging, lidar/camera/semantic fusion problems which are seldom approached in a wholly probabilistic manner. Our approach is capable of using a multi-sensor platform to build a three-dimensional semantic voxelized map that considers the uncertainty of all of the processes involved. We present a probabilistic pipeline that incorporates uncertainties from the sensor readings (cameras, lidar, IMU and wheel encoders), compensation for the motion of the vehicle, and heuristic label probabilities for the semantic images. We also present a novel and efficient viewpoint validation algorithm to check for occlusions from the camera frames. A probabilistic projection is performed from the camera images to the lidar point cloud. Each labelled lidar scan then feeds into an octree map building algorithm that updates the class probabilities of the map voxels every time a new observation is available. We validate our approach using a set of qualitative and quantitative experimental tests on the USyd Dataset. △ Less

Submitted 9 July, 2020; originally announced July 2020.

Comments: 15 pages. arXiv admin note: text overlap with arXiv:2003.01871

arXiv:2003.03954 [pdf, other]

Probabilistic Egocentric Motion Correction of Lidar Point Cloud and Projection to Camera Images for Moving Platforms

Authors: Mao Shan, Julie Stephany Berrio, Stewart Worrall, Eduardo Nebot

Abstract: The fusion of sensor data from heterogeneous sensors is crucial for robust perception in various robotics applications that involve moving platforms, for instance, autonomous vehicle navigation. In particular, combining camera and lidar sensors enables the projection of precise range information of the surrounding environment onto visual images. It also makes it possible to label each lidar point… ▽ More The fusion of sensor data from heterogeneous sensors is crucial for robust perception in various robotics applications that involve moving platforms, for instance, autonomous vehicle navigation. In particular, combining camera and lidar sensors enables the projection of precise range information of the surrounding environment onto visual images. It also makes it possible to label each lidar point with visual segmentation/classification results for 3D mapping, which facilitates a higher level understanding of the scene. The task is however considered non-trivial due to intrinsic and extrinsic sensor calibration, and the distortion of lidar points resulting from the ego-motion of the platform. Despite the existence of many lidar ego-motion correction methods, the errors in the correction process due to uncertainty in ego-motion estimation are not possible to remove completely. It is thus essential to consider the problem a probabilistic process where the ego-motion estimation uncertainty is modelled and considered consistently. The paper investigates the probabilistic lidar ego-motion correction and lidar-to-camera projection, where both the uncertainty in the ego-motion estimation and time jitter in sensory measurements are incorporated. The proposed approach is validated both in simulation and using real-world data collected from an electric vehicle retrofitted with wide-angle cameras and a 16-beam scanning lidar. △ Less

Submitted 9 March, 2020; originally announced March 2020.

Comments: 8 pages, 9 figures, submitted to ITSC 2020 for review

ACM Class: I.4.0

arXiv:2003.01871 [pdf, other]

Semantic sensor fusion: from camera to sparse lidar information

Authors: Julie Stephany Berrio, Mao Shan, Stewart Worrall, James Ward, Eduardo Nebot

Abstract: To navigate through urban roads, an automated vehicle must be able to perceive and recognize objects in a three-dimensional environment. A high-level contextual understanding of the surroundings is necessary to plan and execute accurate driving maneuvers. This paper presents an approach to fuse different sensory information, Light Detection and Ranging (lidar) scans and camera images. The output o… ▽ More To navigate through urban roads, an automated vehicle must be able to perceive and recognize objects in a three-dimensional environment. A high-level contextual understanding of the surroundings is necessary to plan and execute accurate driving maneuvers. This paper presents an approach to fuse different sensory information, Light Detection and Ranging (lidar) scans and camera images. The output of a convolutional neural network (CNN) is used as classifier to obtain the labels of the environment. The transference of semantic information between the labelled image and the lidar point cloud is performed in four steps: initially, we use heuristic methods to associate probabilities to all the semantic classes contained in the labelled images. Then, the lidar points are corrected to compensate for the vehicle's motion given the difference between the timestamps of each lidar scan and camera image. In a third step, we calculate the pixel coordinate for the corresponding camera image. In the last step we perform the transfer of semantic information from the heuristic probability images to the lidar frame, while removing the lidar information that is not visible to the camera. We tested our approach in the Usyd Dataset \cite{usyd_dataset}, obtaining qualitative and quantitative results that demonstrate the validity of our probabilistic sensory fusion approach. △ Less

Submitted 3 March, 2020; originally announced March 2020.

Comments: 8 pages, this paper was submitted to ITSC 2020

MSC Class: 00-02 ACM Class: I.4

arXiv:1904.12433 [pdf, other]

Automatic extrinsic calibration between a camera and a 3D Lidar using 3D point and plane correspondences

Authors: Surabhi Verma, Julie Stephany Berrio, Stewart Worrall, Eduardo Nebot

Abstract: This paper proposes an automated method to obtain the extrinsic calibration parameters between a camera and a 3D lidar with as low as 16 beams. We use a checkerboard as a reference to obtain features of interest in both sensor frames. The calibration board centre point and normal vector are automatically extracted from the lidar point cloud by exploiting the geometry of the board. The correspondin… ▽ More This paper proposes an automated method to obtain the extrinsic calibration parameters between a camera and a 3D lidar with as low as 16 beams. We use a checkerboard as a reference to obtain features of interest in both sensor frames. The calibration board centre point and normal vector are automatically extracted from the lidar point cloud by exploiting the geometry of the board. The corresponding features in the camera image are obtained from the camera's extrinsic matrix. We explain the reasons behind selecting these features, and why they are more robust compared to other possibilities. To obtain the optimal extrinsic parameters, we choose a genetic algorithm to address the highly non-linear state space. The process is automated after defining the bounds of the 3D experimental region relative to the lidar, and the true board dimensions. In addition, the camera is assumed to be intrinsically calibrated. Our method requires a minimum of 3 checkerboard poses, and the calibration accuracy is demonstrated by evaluating our algorithm using real world and simulated features. △ Less

Submitted 28 April, 2019; originally announced April 2019.

arXiv:1810.10193 [pdf, other]

doi 10.1109/TITS.2019.2909066

Automated Evaluation of Semantic Segmentation Robustness for Autonomous Driving

Authors: Wei Zhou, Julie Stephany Berrio, Stewart Worrall, Eduardo Nebot

Abstract: One of the fundamental challenges in the design of perception systems for autonomous vehicles is validating the performance of each algorithm under a comprehensive variety of operating conditions. In the case of vision-based semantic segmentation, there are known issues when encountering new scenarios that are sufficiently different to the training data. In addition, even small variations in envir… ▽ More One of the fundamental challenges in the design of perception systems for autonomous vehicles is validating the performance of each algorithm under a comprehensive variety of operating conditions. In the case of vision-based semantic segmentation, there are known issues when encountering new scenarios that are sufficiently different to the training data. In addition, even small variations in environmental conditions such as illumination and precipitation can affect the classification performance of the segmentation model. Given the reliance on visual information, these effects often translate into poor semantic pixel classification which can potentially lead to catastrophic consequences when driving autonomously. This paper presents a novel method for analysing the robustness of semantic segmentation models and provides a number of metrics to evaluate the classification performance over a variety of environmental conditions. The process incorporates an additional sensor (lidar) to automate the process, eliminating the need for labour-intensive hand labelling of validation data. The system integrity can be monitored as the performance of the vision sensors are validated against a different sensor modality. This is necessary for detecting failures that are inherent to vision technology. Experimental results are presented based on multiple datasets collected at different times of the year with different environmental conditions. These results show that the semantic segmentation performance varies depending on the weather, camera parameters, existence of shadows, etc.. The results also demonstrate how the metrics can be used to compare and validate the performance after making improvements to a model, and compare the performance of different networks. △ Less

Submitted 24 October, 2018; originally announced October 2018.

arXiv:1809.09774 [pdf, other]

Identifying robust landmarks in feature-based maps

Authors: Julie Stephany Berrio, James Ward, Stewart Worrall, Eduardo Nebot

Abstract: To operate in an urban environment, an automated vehicle must be capable of accurately estimating its position within a global map reference frame. This is necessary for optimal path planning and safe navigation. To accomplish this over an extended period of time, the global map requires long-term maintenance. This includes the addition of newly observable features and the removal of transient fea… ▽ More To operate in an urban environment, an automated vehicle must be capable of accurately estimating its position within a global map reference frame. This is necessary for optimal path planning and safe navigation. To accomplish this over an extended period of time, the global map requires long-term maintenance. This includes the addition of newly observable features and the removal of transient features belonging to dynamic objects. The latter is especially important for the long-term use of the map as matching against a map with features that no longer exist can result in incorrect data associations, and consequently erroneous localisation. This paper addresses the problem of removing features from the map that correspond to objects that are no longer observable/present in the environment. This is achieved by assigning a single score which depends on the geometric distribution and characteristics when the features are re-detected (or not) on different occasions. Our approach not only eliminates ephemeral features, but also can be used as a reduction algorithm for highly dense maps. We tested our approach using half a year of weekly drives over the same 500-metre section of road in an urban environment. The results presented demonstrate the validity of the long-term approach to map maintenance. △ Less

Submitted 25 September, 2018; originally announced September 2018.

Comments: Submitted to ICRA2019

MSC Class: 62J02; 62J07;

Showing 1–20 of 20 results for author: Berrio, J S