-
Open-Vocabulary SAM3D: Understand Any 3D Scene
Authors:
Hanchen Tai,
Qingdong He,
Jiangning Zhang,
Yijie Qian,
Zhenyu Zhang,
Xiaobin Hu,
Yabiao Wang,
Yong Liu
Abstract:
Open-vocabulary 3D scene understanding presents a significant challenge in the field. Recent advancements have sought to transfer knowledge embedded in vision language models from the 2D domain to 3D domain. However, these approaches often require learning prior knowledge from specific 3D scene datasets, which limits their applicability in open-world scenarios. The Segment Anything Model (SAM) has…
▽ More
Open-vocabulary 3D scene understanding presents a significant challenge in the field. Recent advancements have sought to transfer knowledge embedded in vision language models from the 2D domain to 3D domain. However, these approaches often require learning prior knowledge from specific 3D scene datasets, which limits their applicability in open-world scenarios. The Segment Anything Model (SAM) has demonstrated remarkable zero-shot segmentation capabilities, prompting us to investigate its potential for comprehending 3D scenes without the need for training. In this paper, we introduce OV-SAM3D, a universal framework for open-vocabulary 3D scene understanding. This framework is designed to perform understanding tasks for any 3D scene without requiring prior knowledge of the scene. Specifically, our method is composed of two key sub-modules: First, we initiate the process by generating superpoints as the initial 3D prompts and refine these prompts using segment masks derived from SAM. Moreover, we then integrate a specially designed overlapping score table with open tags from the Recognize Anything Model (RAM) to produce final 3D instances with open-world label. Empirical evaluations conducted on the ScanNet200 and nuScenes datasets demonstrate that our approach surpasses existing open-vocabulary methods in unknown open-world environments.
△ Less
Submitted 21 June, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Towards Seamless Serverless Computing Across an Edge-Cloud Continuum
Authors:
Emilian Simion,
Yuandou Wang,
Hsiang-ling Tai,
Uraz Odyurt,
Zhiming Zhao
Abstract:
Serverless computing has emerged as an attractive paradigm due to the efficiency of development and the ease of deployment without managing any underlying infrastructure. Nevertheless, serverless computing approaches face numerous challenges to unlock their full potential in hybrid environments. To gain a deeper understanding and firsthand knowledge of serverless computing in edge-cloud deployment…
▽ More
Serverless computing has emerged as an attractive paradigm due to the efficiency of development and the ease of deployment without managing any underlying infrastructure. Nevertheless, serverless computing approaches face numerous challenges to unlock their full potential in hybrid environments. To gain a deeper understanding and firsthand knowledge of serverless computing in edge-cloud deployments, we review the current state of open-source serverless platforms and compare them based on predefined requirements. We then design and implement a serverless computing platform with a novel edge orchestration technique that seamlessly deploys serverless functions across the edge and cloud environments on top of the Knative serverless platform. Moreover, we propose an offloading strategy for edge environments and four different functions for experimentation and showcase the performance benefits of our solution. Our results demonstrate that such an approach can efficiently utilize both cloud and edge resources by dynamically offloading functions from the edge to the cloud during high activity, while reducing the overall application latency and increasing request throughput compared to an edge-only deployment.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
Neighbour Consistency Guided Pseudo-Label Refinement for Unsupervised Person Re-Identification
Authors:
De Cheng,
Haichun Tai,
Nannan Wang,
Zhen Wang,
Xinbo Gao
Abstract:
Unsupervised person re-identification (ReID) aims at learning discriminative identity features for person retrieval without any annotations. Recent advances accomplish this task by leveraging clustering-based pseudo labels, but these pseudo labels are inevitably noisy which deteriorate model performance. In this paper, we propose a Neighbour Consistency guided Pseudo Label Refinement (NCPLR) frame…
▽ More
Unsupervised person re-identification (ReID) aims at learning discriminative identity features for person retrieval without any annotations. Recent advances accomplish this task by leveraging clustering-based pseudo labels, but these pseudo labels are inevitably noisy which deteriorate model performance. In this paper, we propose a Neighbour Consistency guided Pseudo Label Refinement (NCPLR) framework, which can be regarded as a transductive form of label propagation under the assumption that the prediction of each example should be similar to its nearest neighbours'. Specifically, the refined label for each training instance can be obtained by the original clustering result and a weighted ensemble of its neighbours' predictions, with weights determined according to their similarities in the feature space. In addition, we consider the clustering-based unsupervised person ReID as a label-noise learning problem. Then, we proposed an explicit neighbour consistency regularization to reduce model susceptibility to over-fitting while improving the training stability. The NCPLR method is simple yet effective, and can be seamlessly integrated into existing clustering-based unsupervised algorithms. Extensive experimental results on five ReID datasets demonstrate the effectiveness of the proposed method, and showing superior performance to state-of-the-art methods by a large margin.
△ Less
Submitted 30 November, 2022;
originally announced November 2022.
-
Structured Knowledge Distillation Towards Efficient and Compact Multi-View 3D Detection
Authors:
Linfeng Zhang,
Yukang Shi,
Hung-Shuo Tai,
Zhipeng Zhang,
Yuan He,
Ke Wang,
Kaisheng Ma
Abstract:
Detecting 3D objects from multi-view images is a fundamental problem in 3D computer vision. Recently, significant breakthrough has been made in multi-view 3D detection tasks. However, the unprecedented detection performance of these vision BEV (bird's-eye-view) detection models is accompanied with enormous parameters and computation, which make them unaffordable on edge devices. To address this pr…
▽ More
Detecting 3D objects from multi-view images is a fundamental problem in 3D computer vision. Recently, significant breakthrough has been made in multi-view 3D detection tasks. However, the unprecedented detection performance of these vision BEV (bird's-eye-view) detection models is accompanied with enormous parameters and computation, which make them unaffordable on edge devices. To address this problem, in this paper, we propose a structured knowledge distillation framework, aiming to improve the efficiency of modern vision-only BEV detection models. The proposed framework mainly includes: (a) spatial-temporal distillation which distills teacher knowledge of information fusion from different timestamps and views, (b) BEV response distillation which distills teacher response to different pillars, and (c) weight-inheriting which solves the problem of inconsistent inputs between students and teacher in modern transformer architectures. Experimental results show that our method leads to an average improvement of 2.16 mAP and 2.27 NDS on the nuScenes benchmark, outperforming multiple baselines by a large margin.
△ Less
Submitted 21 November, 2022; v1 submitted 14 November, 2022;
originally announced November 2022.
-
PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection
Authors:
Linfeng Zhang,
Runpei Dong,
Hung-Shuo Tai,
Kaisheng Ma
Abstract:
The remarkable breakthroughs in point cloud representation learning have boosted their usage in real-world applications such as self-driving cars and virtual reality. However, these applications usually have an urgent requirement for not only accurate but also efficient 3D object detection. Recently, knowledge distillation has been proposed as an effective model compression technique, which transf…
▽ More
The remarkable breakthroughs in point cloud representation learning have boosted their usage in real-world applications such as self-driving cars and virtual reality. However, these applications usually have an urgent requirement for not only accurate but also efficient 3D object detection. Recently, knowledge distillation has been proposed as an effective model compression technique, which transfers the knowledge from an over-parameterized teacher to a lightweight student and achieves consistent effectiveness in 2D vision. However, due to point clouds' sparsity and irregularity, directly applying previous image-based knowledge distillation methods to point cloud detectors usually leads to unsatisfactory performance. To fill the gap, this paper proposes PointDistiller, a structured knowledge distillation framework for point clouds-based 3D detection. Concretely, PointDistiller includes local distillation which extracts and distills the local geometric structure of point clouds with dynamic graph convolution and reweighted learning strategy, which highlights student learning on the crucial points or voxels to improve knowledge distillation efficiency. Extensive experiments on both voxels-based and raw points-based detectors have demonstrated the effectiveness of our method over seven previous knowledge distillation methods. For instance, our 4X compressed PointPillars student achieves 2.8 and 3.4 mAP improvements on BEV and 3D object detection, outperforming its teacher by 0.9 and 1.8 mAP, respectively. Codes have been released at https://github.com/RunpeiDong/PointDistiller.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
A lightweight design for serverless Function-as-a-Service
Authors:
Ju Long,
Hung-Ying Tai,
Shen-Ta Hsieh,
Michael Juntao Yuan
Abstract:
FaaS (Function as a Service) allows developers to upload and execute code in the cloud without managing servers. FaaS offerings from leading public cloud providers are based on system microVM or application container technologies such as Firecracker or Docker. In this paper, we demonstrate that lightweight high-level runtimes, such as WebAssembly, could offer performance and scaling advantages ove…
▽ More
FaaS (Function as a Service) allows developers to upload and execute code in the cloud without managing servers. FaaS offerings from leading public cloud providers are based on system microVM or application container technologies such as Firecracker or Docker. In this paper, we demonstrate that lightweight high-level runtimes, such as WebAssembly, could offer performance and scaling advantages over existing solutions, and could enable finely-grained pay-as-you-use business models. We compared widely used performance benchmarks between Docker native and WebAssembly implementations of the same algorithms. We also discuss the barriers for WebAssembly adoption in serverless computing, such as the lack of tooling support.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
Benchmarking Minimax Linkage
Authors:
Xiao Hui Tai,
Kayla Frisoli
Abstract:
Minimax linkage was first introduced by Ao et al. [3] in 2004, as an alternative to standard linkage methods used in hierarchical clustering. Minimax linkage relies on distances to a prototype for each cluster; this prototype can be thought of as a representative object in the cluster, hence improving the interpretability of clustering results. Bien and Tibshirani analyzed properties of this metho…
▽ More
Minimax linkage was first introduced by Ao et al. [3] in 2004, as an alternative to standard linkage methods used in hierarchical clustering. Minimax linkage relies on distances to a prototype for each cluster; this prototype can be thought of as a representative object in the cluster, hence improving the interpretability of clustering results. Bien and Tibshirani analyzed properties of this method in 2011 [2], popularizing the method within the statistics community. Additionally, they performed comparisons of minimax linkage to standard linkage methods, making use of five data sets and two different evaluation metrics (distance to prototype and misclassification rate). In an effort to expand upon their work and evaluate minimax linkage more comprehensively, our benchmark study focuses on thorough method evaluation via multiple performance metrics on several well-described data sets. We also make all code and data publicly available through an R package, for full reproducibility. Similarly to [2], we find that minimax linkage often produces the smallest maximum minimax radius of all linkage methods, meaning that minimax linkage produces clusters where objects in a cluster are tightly clustered around their prototype. This is true across a range of values for the total number of clusters (k). However, this is not always the case, and special attention should be paid to the case when k is the true known value. For true k, minimax linkage does not always perform the best in terms of all the evaluation metrics studied, including maximum minimax radius. This paper was motivated by the IFCS Cluster Benchmarking Task Force's call for clustering benchmark studies and the white paper [5], which put forth guidelines and principles for comprehensive benchmarking in clustering. Our work is designed to be a neutral benchmark study of minimax linkage.
△ Less
Submitted 7 June, 2019;
originally announced June 2019.
-
Hand Segmentation for Hand-Object Interaction from Depth map
Authors:
Byeongkeun Kang,
Kar-Han Tan,
Nan Jiang,
Hung-Shuo Tai,
Daniel Tretter,
Truong Q. Nguyen
Abstract:
Hand segmentation for hand-object interaction is a necessary preprocessing step in many applications such as augmented reality, medical application, and human-robot interaction. However, typical methods are based on color information which is not robust to objects with skin color, skin pigment difference, and light condition variations. Thus, we propose hand segmentation method for hand-object int…
▽ More
Hand segmentation for hand-object interaction is a necessary preprocessing step in many applications such as augmented reality, medical application, and human-robot interaction. However, typical methods are based on color information which is not robust to objects with skin color, skin pigment difference, and light condition variations. Thus, we propose hand segmentation method for hand-object interaction using only a depth map. It is challenging because of the small depth difference between a hand and objects during an interaction. To overcome this challenge, we propose the two-stage random decision forest (RDF) method consisting of detecting hands and segmenting hands. To validate the proposed method, we demonstrate results on the publicly available dataset of hand segmentation for hand-object interaction. The proposed method achieves high accuracy in short processing time comparing to the other state-of-the-art methods.
△ Less
Submitted 9 January, 2018; v1 submitted 7 March, 2016;
originally announced March 2016.
-
An inventory model for group-buying auction
Authors:
Allen H. Tai
Abstract:
Group-buying auction has become a popular marketing strategy in the last decade. In this paper, a stochastic model is developed for an inventory system subjects to demands from group-buying auctions. The model discussed here takes into the account of the costs of inventory, transportation, dispatching and re-order as well as the penalty cost of non-successful auctions. Since a new cycle begins whe…
▽ More
Group-buying auction has become a popular marketing strategy in the last decade. In this paper, a stochastic model is developed for an inventory system subjects to demands from group-buying auctions. The model discussed here takes into the account of the costs of inventory, transportation, dispatching and re-order as well as the penalty cost of non-successful auctions. Since a new cycle begins whenever there is a replenishment of products, the long-run average costs of the model can be obtained by using the renewal theory. A closed form solution of the optimal replenishment quantity is also derived.
△ Less
Submitted 14 December, 2012;
originally announced December 2012.