Search | arXiv e-print repository

doi 10.1109/TIV.2024.3441527

MS3D++: Ensemble of Experts for Multi-Source Unsupervised Domain Adaptation in 3D Object Detection

Authors: Darren Tsai, Julie Stephany Berrio, Mao Shan, Eduardo Nebot, Stewart Worrall

Abstract: Deploying 3D detectors in unfamiliar domains has been demonstrated to result in a significant 70-90% drop in detection rate due to variations in lidar, geography, or weather from their training dataset. This domain gap leads to missing detections for densely observed objects, misaligned confidence scores, and increased high-confidence false positives, rendering the detector highly unreliable. To a… ▽ More Deploying 3D detectors in unfamiliar domains has been demonstrated to result in a significant 70-90% drop in detection rate due to variations in lidar, geography, or weather from their training dataset. This domain gap leads to missing detections for densely observed objects, misaligned confidence scores, and increased high-confidence false positives, rendering the detector highly unreliable. To address this, we introduce MS3D++, a self-training framework for multi-source unsupervised domain adaptation in 3D object detection. MS3D++ generates high-quality pseudo-labels, allowing 3D detectors to achieve high performance on a range of lidar types, regardless of their density. Our approach effectively fuses predictions of an ensemble of multi-frame pre-trained detectors from different source domains to improve domain generalization. We subsequently refine predictions temporally to ensure temporal consistency in box localization and object classification. Furthermore, we present an in-depth study into the performance and idiosyncrasies of various 3D detector components in a cross-domain context, providing valuable insights for improved cross-domain detector ensembling. Experimental results on Waymo, nuScenes and Lyft demonstrate that detectors trained with MS3D++ pseudo-labels achieve state-of-the-art performance, comparable to training with human-annotated labels in Bird's Eye View (BEV) evaluation for both low and high density lidar. Code is available at https://github.com/darrenjkt/MS3D △ Less

Submitted 4 September, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

arXiv:2304.02431 [pdf, other]

MS3D: Leveraging Multiple Detectors for Unsupervised Domain Adaptation in 3D Object Detection

Authors: Darren Tsai, Julie Stephany Berrio, Mao Shan, Eduardo Nebot, Stewart Worrall

Abstract: We introduce Multi-Source 3D (MS3D), a new self-training pipeline for unsupervised domain adaptation in 3D object detection. Despite the remarkable accuracy of 3D detectors, they often overfit to specific domain biases, leading to suboptimal performance in various sensor setups and environments. Existing methods typically focus on adapting a single detector to the target domain, overlooking the fa… ▽ More We introduce Multi-Source 3D (MS3D), a new self-training pipeline for unsupervised domain adaptation in 3D object detection. Despite the remarkable accuracy of 3D detectors, they often overfit to specific domain biases, leading to suboptimal performance in various sensor setups and environments. Existing methods typically focus on adapting a single detector to the target domain, overlooking the fact that different detectors possess distinct expertise on different unseen domains. MS3D leverages this by combining different pre-trained detectors from multiple source domains and incorporating temporal information to produce high-quality pseudo-labels for fine-tuning. Our proposed Kernel-Density Estimation (KDE) Box Fusion method fuses box proposals from multiple domains to obtain pseudo-labels that surpass the performance of the best source domain detectors. MS3D exhibits greater robustness to domain shift and produces accurate pseudo-labels over greater distances, making it well-suited for high-to-low beam domain adaptation and vice versa. Our method achieved state-of-the-art performance on all evaluated datasets, and we demonstrate that the pre-trained detector's source dataset has minimal impact on the fine-tuned result, making MS3D suitable for real-world applications. △ Less

Submitted 8 May, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

Comments: Our code is available at https://github.com/darrenjkt/MS3D

arXiv:2211.09283 [pdf, other]

Active Learning with Expected Error Reduction

Authors: Stephen Mussmann, Julia Reisler, Daniel Tsai, Ehsan Mousavi, Shayne O'Brien, Moises Goldszmidt

Abstract: Active learning has been studied extensively as a method for efficient data collection. Among the many approaches in literature, Expected Error Reduction (EER) (Roy and McCallum) has been shown to be an effective method for active learning: select the candidate sample that, in expectation, maximally decreases the error on an unlabeled set. However, EER requires the model to be retrained for every… ▽ More Active learning has been studied extensively as a method for efficient data collection. Among the many approaches in literature, Expected Error Reduction (EER) (Roy and McCallum) has been shown to be an effective method for active learning: select the candidate sample that, in expectation, maximally decreases the error on an unlabeled set. However, EER requires the model to be retrained for every candidate sample and thus has not been widely used for modern deep neural networks due to this large computational cost. In this paper we reformulate EER under the lens of Bayesian active learning and derive a computationally efficient version that can use any Bayesian parameter sampling method (such as arXiv:1506.02142). We then compare the empirical performance of our method using Monte Carlo dropout for parameter sampling against state of the art methods in the deep active learning literature. Experiments are performed on four standard benchmark datasets and three WILDS datasets (arXiv:2012.07421). The results indicate that our method outperforms all other methods except one in the data shift scenario: a model dependent, non-information theoretic method that requires an order of magnitude higher computational cost (arXiv:1906.03671). △ Less

Submitted 16 November, 2022; originally announced November 2022.

arXiv:2209.06407 [pdf, other]

Viewer-Centred Surface Completion for Unsupervised Domain Adaptation in 3D Object Detection

Authors: Darren Tsai, Julie Stephany Berrio, Mao Shan, Eduardo Nebot, Stewart Worrall

Abstract: Every autonomous driving dataset has a different configuration of sensors, originating from distinct geographic regions and covering various scenarios. As a result, 3D detectors tend to overfit the datasets they are trained on. This causes a drastic decrease in accuracy when the detectors are trained on one dataset and tested on another. We observe that lidar scan pattern differences form a large… ▽ More Every autonomous driving dataset has a different configuration of sensors, originating from distinct geographic regions and covering various scenarios. As a result, 3D detectors tend to overfit the datasets they are trained on. This causes a drastic decrease in accuracy when the detectors are trained on one dataset and tested on another. We observe that lidar scan pattern differences form a large component of this reduction in performance. We address this in our approach, SEE-VCN, by designing a novel viewer-centred surface completion network (VCN) to complete the surfaces of objects of interest within an unsupervised domain adaptation framework, SEE. With SEE-VCN, we obtain a unified representation of objects across datasets, allowing the network to focus on learning geometry, rather than overfitting on scan patterns. By adopting a domain-invariant representation, SEE-VCN can be classed as a multi-target domain adaptation approach where no annotations or re-training is required to obtain 3D detections for new scan patterns. Through extensive experiments, we show that our approach outperforms previous domain adaptation methods in multiple domain adaptation settings. Our code and data are available at https://github.com/darrenjkt/SEE-VCN. △ Less

Submitted 14 September, 2022; originally announced September 2022.

arXiv:2205.04612 [pdf, other]

Reconfigurable Robots for Scaling Reef Restoration

Authors: Serena Mou, Dorian Tsai, Matthew Dunbabin

Abstract: Coral reefs are under increasing threat from the impacts of climate change. Whilst current restoration approaches are effective, they require significant human involvement and equipment, and have limited deployment scale. Harvesting wild coral spawn from mass spawning events, rearing them to the larval stage and releasing the larvae onto degraded reefs is an emerging solution for reef restoration… ▽ More Coral reefs are under increasing threat from the impacts of climate change. Whilst current restoration approaches are effective, they require significant human involvement and equipment, and have limited deployment scale. Harvesting wild coral spawn from mass spawning events, rearing them to the larval stage and releasing the larvae onto degraded reefs is an emerging solution for reef restoration known as coral reseeding. This paper presents a reconfigurable autonomous surface vehicle system that can eliminate risky diving, cover greater areas with coral larvae, has a sensory suite for additional data measurement, and requires minimal non-technical expert training. A key feature is an on-board real-time benthic substrate classification model that predicts when to release larvae to increase settlement rate and ultimately, survivability. The presented robot design is reconfigurable, light weight, scalable, and easy to transport. Results from restoration deployments at Lizard Island demonstrate improved coral larvae release onto appropriate coral substrate, while also achieving 21.8 times more area coverage compared to manual methods. △ Less

Submitted 9 May, 2022; originally announced May 2022.

arXiv:2111.09450 [pdf, other]

doi 10.1109/LRA.2022.3185783

See Eye to Eye: A Lidar-Agnostic 3D Detection Framework for Unsupervised Multi-Target Domain Adaptation

Authors: Darren Tsai, Julie Stephany Berrio, Mao Shan, Stewart Worrall, Eduardo Nebot

Abstract: Sampling discrepancies between different manufacturers and models of lidar sensors result in inconsistent representations of objects. This leads to performance degradation when 3D detectors trained for one lidar are tested on other types of lidars. Remarkable progress in lidar manufacturing has brought about advances in mechanical, solid-state, and recently, adjustable scan pattern lidars. For the… ▽ More Sampling discrepancies between different manufacturers and models of lidar sensors result in inconsistent representations of objects. This leads to performance degradation when 3D detectors trained for one lidar are tested on other types of lidars. Remarkable progress in lidar manufacturing has brought about advances in mechanical, solid-state, and recently, adjustable scan pattern lidars. For the latter, existing works often require fine-tuning the model each time scan patterns are adjusted, which is infeasible. We explicitly deal with the sampling discrepancy by proposing a novel unsupervised multi-target domain adaptation framework, SEE, for transferring the performance of state-of-the-art 3D detectors across both fixed and flexible scan pattern lidars without requiring fine-tuning of models by end-users. Our approach interpolates the underlying geometry and normalizes the scan pattern of objects from different lidars before passing them to the detection network. We demonstrate the effectiveness of SEE on public datasets, achieving state-of-the-art results, and additionally provide quantitative results on a novel high-resolution lidar to prove the industry applications of our framework. △ Less

Submitted 10 April, 2023; v1 submitted 17 November, 2021; originally announced November 2021.

Comments: Published in RAL and presented in IROS 2022. Code is available at https://github.com/darrenjkt/SEE-MTDA

Journal ref: IEEE Robotics and Automation Letters (2022)

arXiv:2103.15349 [pdf, other]

Refractive Light-Field Features for Curved Transparent Objects in Structure from Motion

Authors: Dorian Tsai, Peter Corke, Thierry Peynot, Donald G. Dansereau

Abstract: Curved refractive objects are common in the human environment, and have a complex visual appearance that can cause robotic vision algorithms to fail. Light-field cameras allow us to address this challenge by capturing the view-dependent appearance of such objects in a single exposure. We propose a novel image feature for light fields that detects and describes the patterns of light refracted throu… ▽ More Curved refractive objects are common in the human environment, and have a complex visual appearance that can cause robotic vision algorithms to fail. Light-field cameras allow us to address this challenge by capturing the view-dependent appearance of such objects in a single exposure. We propose a novel image feature for light fields that detects and describes the patterns of light refracted through curved transparent objects. We derive characteristic points based on these features allowing them to be used in place of conventional 2D features. Using our features, we demonstrate improved structure-from-motion performance in challenging scenes containing refractive objects, including quantitative evaluations that show improved camera pose estimates and 3D reconstructions. Additionally, our methods converge 15-35% more frequently than the state-of-the-art. Our method is a critical step towards allowing robots to operate around refractive objects, with applications in manufacturing, quality assurance, pick-and-place, and domestic robots working with acrylic, glass and other transparent materials. △ Less

Submitted 17 April, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: submitted to IROS-RAL 2021. 8 pages, 9 figures, 2 tables

arXiv:2103.12287 [pdf, other]

Optimising the selection of samples for robust lidar camera calibration

Authors: Darren Tsai, Stewart Worrall, Mao Shan, Anton Lohr, Eduardo Nebot

Abstract: We propose a robust calibration pipeline that optimises the selection of calibration samples for the estimation of calibration parameters that fit the entire scene. We minimise user error by automating the data selection process according to a metric, called Variability of Quality (VOQ) that gives a score to each calibration set of samples. We show that this VOQ score is correlated with the estima… ▽ More We propose a robust calibration pipeline that optimises the selection of calibration samples for the estimation of calibration parameters that fit the entire scene. We minimise user error by automating the data selection process according to a metric, called Variability of Quality (VOQ) that gives a score to each calibration set of samples. We show that this VOQ score is correlated with the estimated calibration parameter's ability to generalise well to the entire scene, thereby overcoming the overfitting problems of existing calibration algorithms. Our approach has the benefits of simplifying the calibration process for practitioners of any calibration expertise level and providing an objective measure of the quality for our calibration pipeline's input and output data. We additionally use a novel method of assessing the accuracy of the calibration parameters. It involves computing reprojection errors for the entire scene to ensure that the parameters are well fitted to all features in the scene. Our proposed calibration pipeline takes 90s, and obtains an average reprojection error of 1-1.2cm, with standard deviation of 0.4-0.5cm over 46 poses evenly distributed in a scene. This process has been validated by experimentation on a high resolution, software definable lidar, Baraja Spectrum-Scan; and a low, fixed resolution lidar, Velodyne VLP-16. We have shown that despite the vast differences in lidar technologies, our proposed approach manages to estimate robust calibration parameters for both. Our code and data set used for this paper are made available as open-source. △ Less

Submitted 22 September, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

Comments: ITSC2021

MSC Class: 68U10 ACM Class: I.4.8

arXiv:2003.10128 [pdf, other]

Soteria: A Provably Compliant User Right Manager Using a Novel Two-Layer Blockchain Technology

Authors: Wei-Kang Fu, Yi-Shan Lin, Giovanni Campagna, De-Yi Tsai, Chun-Ting Liu, Chung-Huan Mei, Edward Y. Chang, Monica S. Lam, Shih-Wei Liao

Abstract: Soteria is a user right management system designed to safeguard user-data privacy in a transparent and provable manner in compliance to regulations such as GDPR and CCPA. Soteria represents user data rights as formal executable sharing agreements, which can automatically be translated into a human readable form and enforced as data are queried. To support revocation and to prove compliance, an ind… ▽ More Soteria is a user right management system designed to safeguard user-data privacy in a transparent and provable manner in compliance to regulations such as GDPR and CCPA. Soteria represents user data rights as formal executable sharing agreements, which can automatically be translated into a human readable form and enforced as data are queried. To support revocation and to prove compliance, an indelible, audited trail of the hash of data access and sharing agreements are stored on a two-layer distributed ledger. The main chain ensures partition tolerance and availability (PA) properties while side chains ensure consistency and availability (CA), thus providing the three properties of the CAP (consistency, availability, and partition tolerance) theorem. Besides depicting the two-layer architecture of Soteria, this paper evaluates representative consensus protocols and reports performance statistics. △ Less

Submitted 24 March, 2020; v1 submitted 23 March, 2020; originally announced March 2020.

Comments: 12 pages, 6 figures, 2 tables

arXiv:1910.14540 [pdf, other]

Team NCTU: Toward AI-Driving for Autonomous Surface Vehicles -- From Duckietown to RobotX

Authors: Yi-Wei Huang, Tzu-Kuan Chuang, Ni-Ching Lin, Yu-Chieh Hsiao, Pin-Wei Chen, Ching-Tang Hung, Shih-Hsing Liu, Hsiao-Sheng Chen, Ya-Hsiu Hsieh, Ching-Tang Hung, Yen-Hsiang Huang, Yu-Xuan Chen, Kuan-Lin Chen, Ya-Jou Lan, Chao-Chun Hsu, Chun-Yi Lin, Jhih-Ying Li, Jui-Te Huang, Yu-Jen Menn, Sin-Kiat Lim, Kim-Boon Lua, Chia-Hung Dylan Tsai, Chi-Fang Chen, Hsueh-Cheng Wang

Abstract: Robotic software and hardware systems of autonomous surface vehicles have been developed in transportation, military, and ocean researches for decades. Previous efforts in RobotX Challenges 2014 and 2016 facilitates the developments for important tasks such as obstacle avoidance and docking. Team NCTU is motivated by the AI Driving Olympics (AI-DO) developed by the Duckietown community, and adopts… ▽ More Robotic software and hardware systems of autonomous surface vehicles have been developed in transportation, military, and ocean researches for decades. Previous efforts in RobotX Challenges 2014 and 2016 facilitates the developments for important tasks such as obstacle avoidance and docking. Team NCTU is motivated by the AI Driving Olympics (AI-DO) developed by the Duckietown community, and adopts the principles to RobotX challenge. With the containerization (Docker) and uniformed AI agent (with observations and actions), we could better 1) integrate solutions developed in different middlewares (ROS and MOOS), 2) develop essential functionalities of from simulation (Gazebo) to real robots (either miniaturized or full-sized WAM-V), and 3) compare different approaches either from classic model-based or learning-based. Finally, we setup an outdoor on-surface platform with localization services for evaluation. Some of the preliminary results will be presented for the Team NCTU participations of the RobotX competition in Hawaii in 2018. △ Less

Submitted 31 October, 2019; originally announced October 2019.

arXiv:1809.07896 [pdf, other]

3D Move to See: Multi-perspective visual servoing for improving object views with semantic segmentation

Authors: Chris Lehnert, Dorian Tsai, Anders Eriksson, Chris McCool

Abstract: In this paper, we present a new approach to visual servoing for robotics, referred to as 3D Move to See (3DMTS), based on the principle of finding the next best view using a 3D camera array and a robotic manipulator to obtain multiple samples of the scene from different perspectives. The method uses semantic vision and an objective function applied to each perspective to sample a gradient represen… ▽ More In this paper, we present a new approach to visual servoing for robotics, referred to as 3D Move to See (3DMTS), based on the principle of finding the next best view using a 3D camera array and a robotic manipulator to obtain multiple samples of the scene from different perspectives. The method uses semantic vision and an objective function applied to each perspective to sample a gradient representing the direction of the next best view. The method is demonstrated within simulation and on a real robotic platform containing a custom 3D camera array for the challenging scenario of robotic harvesting in a highly occluded and unstructured environment. It was shown on a real robotic platform that by moving the end effector using the gradient of an objective function leads to a locally optimal view of the object of interest, even amongst occlusions. The overall performance of the 3DMTS method obtained a mean increase in target size by 29.3% compared to a baseline method using a single RGB-D camera, which obtained 9.17%. The results demonstrate qualitatively and quantitatively that the 3DMTS method performed better in most scenarios, and yielded three times the target size compared to the baseline method. The increased target size in the final view will improve the detection of key features of the object of interest for further manipulation, such as grasping and harvesting. △ Less

Submitted 20 September, 2018; originally announced September 2018.

arXiv:1806.07375 [pdf, other]

Distinguishing Refracted Features using Light Field Cameras with Application to Structure from Motion

Authors: Dorian Tsai, Donald G Dansereau, Thierry Peynot, Peter Corke

Abstract: Robots must reliably interact with refractive objects in many applications; however, refractive objects can cause many robotic vision algorithms to become unreliable or even fail, particularly feature-based matching applications, such as structure-from-motion. We propose a method to distinguish between refracted and Lambertian image features using a light field camera. Specifically, we propose to… ▽ More Robots must reliably interact with refractive objects in many applications; however, refractive objects can cause many robotic vision algorithms to become unreliable or even fail, particularly feature-based matching applications, such as structure-from-motion. We propose a method to distinguish between refracted and Lambertian image features using a light field camera. Specifically, we propose to use textural cross-correlation to characterise apparent feature motion in a single light field, and compare this motion to its Lambertian equivalent based on 4D light field geometry. Our refracted feature distinguisher has a 34.3% higher rate of detection compared to state-of-the-art for light fields captured with large baselines relative to the refractive object. Our method also applies to light field cameras with much smaller baselines than previously considered, yielding up to 2 times better detection for 2D-refractive objects, such as a sphere, and up to 8 times better for 1D-refractive objects, such as a cylinder. For structure from motion, we demonstrate that rejecting refracted features using our distinguisher yields up to 42.4% lower reprojection error, and lower failure rate when the robot is approaching refractive objects. Our method lead to more robust robot vision in the presence of refractive objects. △ Less

Submitted 31 May, 2018; originally announced June 2018.

Comments: 8 pages, 8 figures, submission to IROS 2018

arXiv:1801.04541 [pdf, other]

Cooperative Multi-Agent Reinforcement Learning for Low-Level Wireless Communication

Authors: Colin de Vrieze, Shane Barratt, Daniel Tsai, Anant Sahai

Abstract: Traditional radio systems are strictly co-designed on the lower levels of the OSI stack for compatibility and efficiency. Although this has enabled the success of radio communications, it has also introduced lengthy standardization processes and imposed static allocation of the radio spectrum. Various initiatives have been undertaken by the research community to tackle the problem of artificial sp… ▽ More Traditional radio systems are strictly co-designed on the lower levels of the OSI stack for compatibility and efficiency. Although this has enabled the success of radio communications, it has also introduced lengthy standardization processes and imposed static allocation of the radio spectrum. Various initiatives have been undertaken by the research community to tackle the problem of artificial spectrum scarcity by both making frequency allocation more dynamic and building flexible radios to replace the static ones. There is reason to believe that just as computer vision and control have been overhauled by the introduction of machine learning, wireless communication can also be improved by utilizing similar techniques to increase the flexibility of wireless networks. In this work, we pose the problem of discovering low-level wireless communication schemes ex-nihilo between two agents in a fully decentralized fashion as a reinforcement learning problem. Our proposed approach uses policy gradients to learn an optimal bi-directional communication scheme and shows surprisingly sophisticated and intelligent learning behavior. We present the results of extensive experiments and an analysis of the fidelity of our approach. △ Less

Submitted 14 January, 2018; originally announced January 2018.

arXiv:1612.05335 [pdf, other]

Mirrored Light Field Video Camera Adapter

Authors: Dorian Tsai, Donald G. Dansereau, Steve Martin, Peter Corke

Abstract: This paper proposes the design of a custom mirror-based light field camera adapter that is cheap, simple in construction, and accessible. Mirrors of different shape and orientation reflect the scene into an upwards-facing camera to create an array of virtual cameras with overlapping field of view at specified depths, and deliver video frame rate light fields. We describe the design, construction,… ▽ More This paper proposes the design of a custom mirror-based light field camera adapter that is cheap, simple in construction, and accessible. Mirrors of different shape and orientation reflect the scene into an upwards-facing camera to create an array of virtual cameras with overlapping field of view at specified depths, and deliver video frame rate light fields. We describe the design, construction, decoding and calibration processes of our mirror-based light field camera adapter in preparation for an open-source release to benefit the robotic vision community. △ Less

Submitted 15 December, 2016; originally announced December 2016.

Comments: tech report, v0.5, 15 pages, 6 figures

arXiv:1505.06807 [pdf, other]

MLlib: Machine Learning in Apache Spark

Authors: Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, Ameet Talwalkar

Abstract: Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark's open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shippe… ▽ More Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark's open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shipped with Spark, MLlib supports several languages and provides a high-level API that leverages Spark's rich ecosystem to simplify the development of end-to-end machine learning pipelines. MLlib has experienced a rapid growth due to its vibrant open-source community of over 140 contributors, and includes extensive documentation to support further growth and to let users quickly get up to speed. △ Less

Submitted 26 May, 2015; originally announced May 2015.

Showing 1–15 of 15 results for author: Tsai, D