Search | arXiv e-print repository

Stochastic Compositional Minimax Optimization with Provable Convergence Guarantees

Authors: Yuyang Deng, Fuli Qiao, Mehrdad Mahdavi

Abstract: Stochastic compositional minimax problems are prevalent in machine learning, yet there are only limited established on the convergence of this class of problems. In this paper, we propose a formal definition of the stochastic compositional minimax problem, which involves optimizing a minimax loss with a compositional structure either in primal , dual, or both primal and dual variables. We introduc… ▽ More Stochastic compositional minimax problems are prevalent in machine learning, yet there are only limited established on the convergence of this class of problems. In this paper, we propose a formal definition of the stochastic compositional minimax problem, which involves optimizing a minimax loss with a compositional structure either in primal , dual, or both primal and dual variables. We introduce a simple yet effective algorithm, stochastically Corrected stOchastic gradient Descent Ascent (CODA), which is a descent ascent type algorithm with compositional correction steps, and establish its convergence rate in aforementioned three settings. In the presence of the compositional structure in primal, the objective function typically becomes nonconvex in primal due to function composition. Thus, we consider the nonconvex-strongly-concave and nonconvex-concave settings and show that CODA can efficiently converge to a stationary point. In the case of composition on the dual, the objective function becomes nonconcave in the dual variable, and we demonstrate convergence in the strongly-convex-nonconcave and convex-nonconcave setting. In the case of composition on both variables, the primal and dual variables may lose convexity and concavity, respectively. Therefore, we anaylze the convergence in weakly-convex-weakly-concave setting. We also give a variance reduction version algorithm, CODA+, which achieves the best known rate on nonconvex-strongly-concave and nonconvex-concave compositional minimax problem. This work initiates the theoretical study of the stochastic compositional minimax problem on various settings and may inform modern machine learning scenarios such as domain adaptation or robust model-agnostic meta-learning. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.01653 [pdf, other]

MCPDepth: Omnidirectional Depth Estimation via Stereo Matching from Multi-Cylindrical Panoramas

Authors: Feng Qiao, Zhexiao Xiong, Xinge Zhu, Yuexin Ma, Qiumeng He, Nathan Jacobs

Abstract: We introduce Multi-Cylindrical Panoramic Depth Estimation (MCPDepth), a two-stage framework for omnidirectional depth estimation via stereo matching between multiple cylindrical panoramas. MCPDepth uses cylindrical panoramas for initial stereo matching and then fuses the resulting depth maps across views. A circular attention module is employed to overcome the distortion along the vertical axis. M… ▽ More We introduce Multi-Cylindrical Panoramic Depth Estimation (MCPDepth), a two-stage framework for omnidirectional depth estimation via stereo matching between multiple cylindrical panoramas. MCPDepth uses cylindrical panoramas for initial stereo matching and then fuses the resulting depth maps across views. A circular attention module is employed to overcome the distortion along the vertical axis. MCPDepth exclusively utilizes standard network components, simplifying deployment to embedded devices and outperforming previous methods that require custom kernels. We theoretically and experimentally compare spherical and cylindrical projections for stereo matching, highlighting the advantages of the cylindrical projection. MCPDepth achieves state-of-the-art performance with an 18.8% reduction in mean absolute error (MAE) for depth on the outdoor synthetic dataset Deep360 and a 19.9% reduction on the indoor real-scene dataset 3D60. △ Less

Submitted 2 August, 2024; originally announced August 2024.

arXiv:2407.12240 [pdf, other]

Adaptive Cascading Network for Continual Test-Time Adaptation

Authors: Kien X. Nguyen, Fengchun Qiao, Xi Peng

Abstract: We study the problem of continual test-time adaption where the goal is to adapt a source pre-trained model to a sequence of unlabelled target domains at test time. Existing methods on test-time training suffer from several limitations: (1) Mismatch between the feature extractor and classifier; (2) Interference between the main and self-supervised tasks; (3) Lack of the ability to quickly adapt to… ▽ More We study the problem of continual test-time adaption where the goal is to adapt a source pre-trained model to a sequence of unlabelled target domains at test time. Existing methods on test-time training suffer from several limitations: (1) Mismatch between the feature extractor and classifier; (2) Interference between the main and self-supervised tasks; (3) Lack of the ability to quickly adapt to the current distribution. In light of these challenges, we propose a cascading paradigm that simultaneously updates the feature extractor and classifier at test time, mitigating the mismatch between them and enabling long-term model adaptation. The pre-training of our model is structured within a meta-learning framework, thereby minimizing the interference between the main and self-supervised tasks and encouraging fast adaptation in the presence of limited unlabelled data. Additionally, we introduce innovative evaluation metrics, average accuracy and forward transfer, to effectively measure the model's adaptation capabilities in dynamic, real-world scenarios. Extensive experiments and ablation studies demonstrate the superiority of our approach in a range of tasks including image classification, text classification, and speech recognition. △ Less

Submitted 16 July, 2024; originally announced July 2024.

ACM Class: I.5.1; I.5.2

arXiv:2406.07400 [pdf, other]

Guiding LLM Temporal Logic Generation with Explicit Separation of Data and Control

Authors: William Murphy, Nikolaus Holzer, Nathan Koenig, Leyi Cui, Raven Rothkopf, Feitong Qiao, Mark Santolucito

Abstract: Temporal logics are powerful tools that are widely used for the synthesis and verification of reactive systems. The recent progress on Large Language Models (LLMs) has the potential to make the process of writing such specifications more accessible. However, writing specifications in temporal logics remains challenging for all but the most expert users. A key question in using LLMs for temporal lo… ▽ More Temporal logics are powerful tools that are widely used for the synthesis and verification of reactive systems. The recent progress on Large Language Models (LLMs) has the potential to make the process of writing such specifications more accessible. However, writing specifications in temporal logics remains challenging for all but the most expert users. A key question in using LLMs for temporal logic specification engineering is to understand what kind of guidance is most helpful to the LLM and the users to easily produce specifications. Looking specifically at the problem of reactive program synthesis, we explore the impact of providing an LLM with guidance on the separation of control and data--making explicit for the LLM what functionality is relevant for the specification, and treating the remaining functionality as an implementation detail for a series of pre-defined functions and predicates. We present a benchmark set and find that this separation of concerns improves specification generation. Our benchmark provides a test set against which to verify future work in LLM generation of temporal logic specifications. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2402.15632 [pdf, other]

Statically Inferring Usage Bounds for Infrastructure as Code

Authors: Feitong Qiao, Aryana Mohammadi, Jürgen Cito, Mark Santolucito

Abstract: Infrastructure as Code (IaC) has enabled cloud customers to have more agility in creating and modifying complex deployments of cloud-provisioned resources. By writing a configuration in IaC languages such as CloudFormation, users can declaratively specify their infrastructure and CloudFormation will handle the creation of the resources. However, understanding the complexity of IaC deployments has… ▽ More Infrastructure as Code (IaC) has enabled cloud customers to have more agility in creating and modifying complex deployments of cloud-provisioned resources. By writing a configuration in IaC languages such as CloudFormation, users can declaratively specify their infrastructure and CloudFormation will handle the creation of the resources. However, understanding the complexity of IaC deployments has emerged as an unsolved issue. In particular, estimating the cost of an IaC deployment requires estimating the future usage and pricing models of every cloud resource in the deployment. Gaining transparency into predicted usage/costs is a leading challenge in cloud management. Existing work either relies on historical usage metrics to predict cost or on coarse-grain static analysis that ignores interactions between resources. Our key insight is that the topology of an IaC deployment imposes constraints on the usage of each resource, and we can formalize and automate the reasoning on constraints by using an SMT solver. This allows customers to have formal guarantees on the bounds of their cloud usage. We propose a tool for fine-grained static usage analysis that works by modeling the inter-resource interactions in an IaC deployment as a set of SMT constraints, and evaluate our tool on a benchmark of over 1000 real world IaC configurations. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2310.08820 [pdf, other]

Learning to Adapt SAM for Segmenting Cross-domain Point Clouds

Authors: Xidong Peng, Runnan Chen, Feng Qiao, Lingdong Kong, Youquan Liu, Tai Wang, Xinge Zhu, Yuexin Ma

Abstract: Unsupervised domain adaptation (UDA) in 3D segmentation tasks presents a formidable challenge, primarily stemming from the sparse and unordered nature of point cloud data. Especially for LiDAR point clouds, the domain discrepancy becomes obvious across varying capture scenes, fluctuating weather conditions, and the diverse array of LiDAR devices in use. While previous UDA methodologies have often… ▽ More Unsupervised domain adaptation (UDA) in 3D segmentation tasks presents a formidable challenge, primarily stemming from the sparse and unordered nature of point cloud data. Especially for LiDAR point clouds, the domain discrepancy becomes obvious across varying capture scenes, fluctuating weather conditions, and the diverse array of LiDAR devices in use. While previous UDA methodologies have often sought to mitigate this gap by aligning features between source and target domains, this approach falls short when applied to 3D segmentation due to the substantial domain variations. Inspired by the remarkable generalization capabilities exhibited by the vision foundation model, SAM, in the realm of image segmentation, our approach leverages the wealth of general knowledge embedded within SAM to unify feature representations across diverse 3D domains and further solves the 3D domain adaptation problem. Specifically, we harness the corresponding images associated with point clouds to facilitate knowledge transfer and propose an innovative hybrid feature augmentation methodology, which significantly enhances the alignment between the 3D feature space and SAM's feature space, operating at both the scene and instance levels. Our method is evaluated on many widely-recognized datasets and achieves state-of-the-art performance. △ Less

Submitted 13 December, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2309.01842 [pdf, other]

StereoFlowGAN: Co-training for Stereo and Flow with Unsupervised Domain Adaptation

Authors: Zhexiao Xiong, Feng Qiao, Yu Zhang, Nathan Jacobs

Abstract: We introduce a novel training strategy for stereo matching and optical flow estimation that utilizes image-to-image translation between synthetic and real image domains. Our approach enables the training of models that excel in real image scenarios while relying solely on ground-truth information from synthetic images. To facilitate task-agnostic domain adaptation and the training of task-specific… ▽ More We introduce a novel training strategy for stereo matching and optical flow estimation that utilizes image-to-image translation between synthetic and real image domains. Our approach enables the training of models that excel in real image scenarios while relying solely on ground-truth information from synthetic images. To facilitate task-agnostic domain adaptation and the training of task-specific components, we introduce a bidirectional feature warping module that handles both left-right and forward-backward directions. Experimental results show competitive performance over previous domain translation-based methods, which substantiate the efficacy of our proposed framework, effectively leveraging the benefits of unsupervised domain adaptation, stereo matching, and optical flow estimation. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: Accepted by BMVC 2023

arXiv:2307.13943 [pdf, other]

Topology-aware Robust Optimization for Out-of-distribution Generalization

Authors: Fengchun Qiao, Xi Peng

Abstract: Out-of-distribution (OOD) generalization is a challenging machine learning problem yet highly desirable in many high-stake applications. Existing methods suffer from overly pessimistic modeling with low generalization confidence. As generalizing to arbitrary test distributions is impossible, we hypothesize that further structure on the topology of distributions is crucial in developing strong OOD… ▽ More Out-of-distribution (OOD) generalization is a challenging machine learning problem yet highly desirable in many high-stake applications. Existing methods suffer from overly pessimistic modeling with low generalization confidence. As generalizing to arbitrary test distributions is impossible, we hypothesize that further structure on the topology of distributions is crucial in developing strong OOD resilience. To this end, we propose topology-aware robust optimization (TRO) that seamlessly integrates distributional topology in a principled optimization framework. More specifically, TRO solves two optimization objectives: (1) Topology Learning which explores data manifold to uncover the distributional topology; (2) Learning on Topology which exploits the topology to constrain robust optimization for tightly-bounded generalization risks. We theoretically demonstrate the effectiveness of our approach and empirically show that it significantly outperforms the state of the arts in a wide range of tasks including classification, regression, and semantic segmentation. Moreover, we empirically find the data-driven distributional topology is consistent with domain knowledge, enhancing the explainability of our approach. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: In ICLR 2023 (17 pages including appendix). The source code and pre-trained models are publicly available at: https://github.com/joffery/TRO

Journal ref: International Conference on Learning Representations 2023

arXiv:2304.05821 [pdf, other]

DUFormer: Solving Power Line Detection Task in Aerial Images using Semantic Segmentation

Authors: Deyu An, Qiang Zhang, Jianshu Chao, Ting Li, Feng Qiao, Yong Deng, Zhenpeng Bian

Abstract: Unmanned aerial vehicles (UAVs) are frequently used for inspecting power lines and capturing high-resolution aerial images. However, detecting power lines in aerial images is difficult,as the foreground data(i.e, power lines) is small and the background information is abundant.To tackle this problem, we introduce DUFormer, a semantic segmentation algorithm explicitly designed to detect power lines… ▽ More Unmanned aerial vehicles (UAVs) are frequently used for inspecting power lines and capturing high-resolution aerial images. However, detecting power lines in aerial images is difficult,as the foreground data(i.e, power lines) is small and the background information is abundant.To tackle this problem, we introduce DUFormer, a semantic segmentation algorithm explicitly designed to detect power lines in aerial images. We presuppose that it is advantageous to train an efficient Transformer model with sufficient feature extraction using a convolutional neural network(CNN) with a strong inductive bias.With this goal in mind, we introduce a heavy token encoder that performs overlapping feature remodeling and tokenization. The encoder comprises a pyramid CNN feature extraction module and a power line feature enhancement module.After successful local feature extraction for power lines, feature fusion is conducted.Then,the Transformer block is used for global modeling. The final segmentation result is achieved by amalgamating local and global features in the decode head.Moreover, we demonstrate the importance of the joint multi-weight loss function in power line segmentation. Our experimental results show that our proposed method outperforms all state-of-the-art methods in power line segmentation on the publicly accessible TTPLA dataset. △ Less

Submitted 31 August, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

arXiv:2303.16390 [pdf, other]

Are Data-driven Explanations Robust against Out-of-distribution Data?

Authors: Tang Li, Fengchun Qiao, Mengmeng Ma, Xi Peng

Abstract: As black-box models increasingly power high-stakes applications, a variety of data-driven explanation methods have been introduced. Meanwhile, machine learning models are constantly challenged by distributional shifts. A question naturally arises: Are data-driven explanations robust against out-of-distribution data? Our empirical results show that even though predict correctly, the model might sti… ▽ More As black-box models increasingly power high-stakes applications, a variety of data-driven explanation methods have been introduced. Meanwhile, machine learning models are constantly challenged by distributional shifts. A question naturally arises: Are data-driven explanations robust against out-of-distribution data? Our empirical results show that even though predict correctly, the model might still yield unreliable explanations under distributional shifts. How to develop robust explanations against out-of-distribution data? To address this problem, we propose an end-to-end model-agnostic learning framework Distributionally Robust Explanations (DRE). The key idea is, inspired by self-supervised learning, to fully utilizes the inter-distribution information to provide supervisory signals for the learning of explanations without human annotation. Can robust explanations benefit the model's generalization capability? We conduct extensive experiments on a wide range of tasks and data types, including classification and regression on image and scientific tabular data. Our results demonstrate that the proposed method significantly improves the model's performance in terms of explanation and prediction robustness against distributional shifts. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023

arXiv:2211.16988 [pdf, other]

QuadFormer: Quadruple Transformer for Unsupervised Domain Adaptation in Power Line Segmentation of Aerial Images

Authors: Pratyaksh Prabhav Rao, Feng Qiao, Weide Zhang, Yiliang Xu, Yong Deng, Guangbin Wu, Qiang Zhang

Abstract: Accurate segmentation of power lines in aerial images is essential to ensure the flight safety of aerial vehicles. Acquiring high-quality ground truth annotations for training a deep learning model is a laborious process. Therefore, developing algorithms that can leverage knowledge from labelled synthetic data to unlabelled real images is highly demanded. This process is studied in Unsupervised do… ▽ More Accurate segmentation of power lines in aerial images is essential to ensure the flight safety of aerial vehicles. Acquiring high-quality ground truth annotations for training a deep learning model is a laborious process. Therefore, developing algorithms that can leverage knowledge from labelled synthetic data to unlabelled real images is highly demanded. This process is studied in Unsupervised domain adaptation (UDA). Recent approaches to self-training have achieved remarkable performance in UDA for semantic segmentation, which trains a model with pseudo labels on the target domain. However, the pseudo labels are noisy due to a discrepancy in the two data distributions. We identify that context dependency is important for bridging this domain gap. Motivated by this, we propose QuadFormer, a novel framework designed for domain adaptive semantic segmentation. The hierarchical quadruple transformer combines cross-attention and self-attention mechanisms to adapt transferable context. Based on cross-attentive and self-attentive feature representations, we introduce a pseudo label correction scheme to online denoise the pseudo labels and reduce the domain gap. Additionally, we present two datasets - ARPLSyn and ARPLReal to further advance research in unsupervised domain adaptive powerline segmentation. Finally, experimental results indicate that our method achieves state-of-the-art performance for the domain adaptive power line segmentation on ARPLSyn$\rightarrow$TTTPLA and ARPLSyn$\rightarrow$ARPLReal. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2204.01026 [pdf, other]

STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes

Authors: Peishan Cong, Xinge Zhu, Feng Qiao, Yiming Ren, Xidong Peng, Yuenan Hou, Lan Xu, Ruigang Yang, Dinesh Manocha, Yuexin Ma

Abstract: Accurately detecting and tracking pedestrians in 3D space is challenging due to large variations in rotations, poses and scales. The situation becomes even worse for dense crowds with severe occlusions. However, existing benchmarks either only provide 2D annotations, or have limited 3D annotations with low-density pedestrian distribution, making it difficult to build a reliable pedestrian percepti… ▽ More Accurately detecting and tracking pedestrians in 3D space is challenging due to large variations in rotations, poses and scales. The situation becomes even worse for dense crowds with severe occlusions. However, existing benchmarks either only provide 2D annotations, or have limited 3D annotations with low-density pedestrian distribution, making it difficult to build a reliable pedestrian perception system especially in crowded scenes. To better evaluate pedestrian perception algorithms in crowded scenarios, we introduce a large-scale multimodal dataset,STCrowd. Specifically, in STCrowd, there are a total of 219 K pedestrian instances and 20 persons per frame on average, with various levels of occlusion. We provide synchronized LiDAR point clouds and camera images as well as their corresponding 3D labels and joint IDs. STCrowd can be used for various tasks, including LiDAR-only, image-only, and sensor-fusion based pedestrian detection and tracking. We provide baselines for most of the tasks. In addition, considering the property of sparse global distribution and density-varying local distribution of pedestrians, we further propose a novel method, Density-aware Hierarchical heatmap Aggregation (DHA), to enhance pedestrian perception in crowded scenes. Extensive experiments show that our new method achieves state-of-the-art performance for pedestrian detection on various datasets. △ Less

Submitted 3 April, 2022; originally announced April 2022.

Comments: accepted at CVPR2022

arXiv:2203.13046 [pdf, other]

Facial Action Unit Recognition With Multi-models Ensembling

Authors: Wenqiang Jiang, Yannan Wu, Fengsheng Qiao, Liyu Meng, Yuanyuan Deng, Chuanhe Liu

Abstract: The Affective Behavior Analysis in-the-wild (ABAW) 2022 Competition gives Affective Computing a large promotion. In this paper, we present our method of AU challenge in this Competition. We use improved IResnet100 as backbone. Then we train AU dataset in Aff-Wild2 on three pertained models pretrained by our private au and expression dataset, and Glint360K respectively. Finally, we ensemble the res… ▽ More The Affective Behavior Analysis in-the-wild (ABAW) 2022 Competition gives Affective Computing a large promotion. In this paper, we present our method of AU challenge in this Competition. We use improved IResnet100 as backbone. Then we train AU dataset in Aff-Wild2 on three pertained models pretrained by our private au and expression dataset, and Glint360K respectively. Finally, we ensemble the results of our models. We achieved F1 score (macro) 0.731 on AU validation set. △ Less

Submitted 24 March, 2022; originally announced March 2022.

arXiv:2108.02888 [pdf, other]

Out-of-Domain Generalization from a Single Source: An Uncertainty Quantification Approach

Authors: Xi Peng, Fengchun Qiao, Long Zhao

Abstract: We are concerned with a worst-case scenario in model generalization, in the sense that a model aims to perform well on many unseen domains while there is only one single domain available for training. We propose Meta-Learning based Adversarial Domain Augmentation to solve this Out-of-Domain generalization problem. The key idea is to leverage adversarial training to create "fictitious" yet "challen… ▽ More We are concerned with a worst-case scenario in model generalization, in the sense that a model aims to perform well on many unseen domains while there is only one single domain available for training. We propose Meta-Learning based Adversarial Domain Augmentation to solve this Out-of-Domain generalization problem. The key idea is to leverage adversarial training to create "fictitious" yet "challenging" populations, from which a model can learn to generalize with theoretical guarantees. To facilitate fast and desirable domain augmentation, we cast the model training in a meta-learning scheme and use a Wasserstein Auto-Encoder to relax the widely used worst-case constraint. We further improve our method by integrating uncertainty quantification for efficient domain generalization. Extensive experiments on multiple benchmark datasets indicate its superior performance in tackling single domain generalization. △ Less

Submitted 16 June, 2022; v1 submitted 5 August, 2021; originally announced August 2021.

Comments: 13 pages, 11 figures, accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence. arXiv admin note: substantial text overlap with arXiv:2003.13216

arXiv:2103.12579 [pdf, other]

MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition

Authors: Shuang Li, Kaixiong Gong, Chi Harold Liu, Yulin Wang, Feng Qiao, Xinjing Cheng

Abstract: Real-world training data usually exhibits long-tailed distribution, where several majority classes have a significantly larger number of samples than the remaining minority classes. This imbalance degrades the performance of typical supervised learning algorithms designed for balanced training sets. In this paper, we address this issue by augmenting minority classes with a recently proposed implic… ▽ More Real-world training data usually exhibits long-tailed distribution, where several majority classes have a significantly larger number of samples than the remaining minority classes. This imbalance degrades the performance of typical supervised learning algorithms designed for balanced training sets. In this paper, we address this issue by augmenting minority classes with a recently proposed implicit semantic data augmentation (ISDA) algorithm, which produces diversified augmented samples by translating deep features along many semantically meaningful directions. Importantly, given that ISDA estimates the class-conditional statistics to obtain semantic directions, we find it ineffective to do this on minority classes due to the insufficient training data. To this end, we propose a novel approach to learn transformed semantic directions with meta-learning automatically. In specific, the augmentation strategy during training is dynamically optimized, aiming to minimize the loss on a small balanced validation set, which is approximated via a meta update step. Extensive empirical results on CIFAR-LT-10/100, ImageNet-LT, and iNaturalist 2017/2018 validate the effectiveness of our method. △ Less

Submitted 7 April, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

Comments: Accepted at CVPR 2021

arXiv:2103.07531 [pdf, other]

Uncertainty-guided Model Generalization to Unseen Domains

Authors: Fengchun Qiao, Xi Peng

Abstract: We study a worst-case scenario in generalization: Out-of-domain generalization from a single source. The goal is to learn a robust model from a single source and expect it to generalize over many unknown distributions. This challenging problem has been seldom investigated while existing solutions suffer from various limitations. In this paper, we propose a new solution. The key idea is to augment… ▽ More We study a worst-case scenario in generalization: Out-of-domain generalization from a single source. The goal is to learn a robust model from a single source and expect it to generalize over many unknown distributions. This challenging problem has been seldom investigated while existing solutions suffer from various limitations. In this paper, we propose a new solution. The key idea is to augment the source capacity in both input and label spaces, while the augmentation is guided by uncertainty assessment. To the best of our knowledge, this is the first work to (1) access the generalization uncertainty from a single source and (2) leverage it to guide both input and label augmentation for robust generalization. The model training and deployment are effectively organized in a Bayesian meta-learning framework. We conduct extensive comparisons and ablation study to validate our approach. The results prove our superior performance in a wide scope of tasks including image classification, semantic segmentation, text classification, and speech recognition. △ Less

Submitted 12 March, 2021; originally announced March 2021.

Comments: In CVPR 2021 (13 pages including supplementary material)

arXiv:2012.00234 [pdf, other]

RaP-Net: A Region-wise and Point-wise Weighting Network to Extract Robust Features for Indoor Localization

Authors: Dongjiang Li, Jinyu Miao, Xuesong Shi, Yuxin Tian, Qiwei Long, Tianyu Cai, Ping Guo, Hongfei Yu, Wei Yang, Haosong Yue, Qi Wei, Fei Qiao

Abstract: Feature extraction plays an important role in visual localization. Unreliable features on dynamic objects or repetitive regions will interfere with feature matching and challenge indoor localization greatly. To address the problem, we propose a novel network, RaP-Net, to simultaneously predict region-wise invariability and point-wise reliability, and then extract features by considering both of th… ▽ More Feature extraction plays an important role in visual localization. Unreliable features on dynamic objects or repetitive regions will interfere with feature matching and challenge indoor localization greatly. To address the problem, we propose a novel network, RaP-Net, to simultaneously predict region-wise invariability and point-wise reliability, and then extract features by considering both of them. We also introduce a new dataset, named OpenLORIS-Location, to train the proposed network. The dataset contains 1553 images from 93 indoor locations. Various appearance changes between images of the same location are included and can help the model to learn the invariability in typical indoor scenes. Experimental results show that the proposed RaP-Net trained with OpenLORIS-Location dataset achieves excellent performance in the feature matching task and significantly outperforms state-of-the-arts feature algorithms in indoor localization. The RaP-Net code and dataset are available at https://github.com/ivipsourcecode/RaP-Net. △ Less

Submitted 22 August, 2021; v1 submitted 30 November, 2020; originally announced December 2020.

Comments: IROS 2021

arXiv:2008.05416 [pdf, other]

DXSLAM: A Robust and Efficient Visual SLAM System with Deep Features

Authors: Dongjiang Li, Xuesong Shi, Qiwei Long, Shenghui Liu, Wei Yang, Fangshi Wang, Qi Wei, Fei Qiao

Abstract: A robust and efficient Simultaneous Localization and Mapping (SLAM) system is essential for robot autonomy. For visual SLAM algorithms, though the theoretical framework has been well established for most aspects, feature extraction and association is still empirically designed in most cases, and can be vulnerable in complex environments. This paper shows that feature extraction with deep convoluti… ▽ More A robust and efficient Simultaneous Localization and Mapping (SLAM) system is essential for robot autonomy. For visual SLAM algorithms, though the theoretical framework has been well established for most aspects, feature extraction and association is still empirically designed in most cases, and can be vulnerable in complex environments. This paper shows that feature extraction with deep convolutional neural networks (CNNs) can be seamlessly incorporated into a modern SLAM framework. The proposed SLAM system utilizes a state-of-the-art CNN to detect keypoints in each image frame, and to give not only keypoint descriptors, but also a global descriptor of the whole image. These local and global features are then used by different SLAM modules, resulting in much more robustness against environmental changes and viewpoint changes compared with using hand-crafted features. We also train a visual vocabulary of local features with a Bag of Words (BoW) method. Based on the local features, global features, and the vocabulary, a highly reliable loop closure detection method is built. Experimental results show that all the proposed modules significantly outperforms the baseline, and the full system achieves much lower trajectory errors and much higher correct rates on all evaluated data. Furthermore, by optimizing the CNN with Intel OpenVINO toolkit and utilizing the Fast BoW library, the system benefits greatly from the SIMD (single-instruction-multiple-data) techniques in modern CPUs. The full system can run in real-time without any GPU or other accelerators. The code is public at https://github.com/ivipsourcecode/dxslam. △ Less

Submitted 12 August, 2020; originally announced August 2020.

Comments: 8 pages, 5 figures, to be published in IROS 2020

arXiv:2003.13216 [pdf, other]

Learning to Learn Single Domain Generalization

Authors: Fengchun Qiao, Long Zhao, Xi Peng

Abstract: We are concerned with a worst-case scenario in model generalization, in the sense that a model aims to perform well on many unseen domains while there is only one single domain available for training. We propose a new method named adversarial domain augmentation to solve this Out-of-Distribution (OOD) generalization problem. The key idea is to leverage adversarial training to create "fictitious" y… ▽ More We are concerned with a worst-case scenario in model generalization, in the sense that a model aims to perform well on many unseen domains while there is only one single domain available for training. We propose a new method named adversarial domain augmentation to solve this Out-of-Distribution (OOD) generalization problem. The key idea is to leverage adversarial training to create "fictitious" yet "challenging" populations, from which a model can learn to generalize with theoretical guarantees. To facilitate fast and desirable domain augmentation, we cast the model training in a meta-learning scheme and use a Wasserstein Auto-Encoder (WAE) to relax the widely used worst-case constraint. Detailed theoretical analysis is provided to testify our formulation, while extensive experiments on multiple benchmark datasets indicate its superior performance in tackling single domain generalization. △ Less

Submitted 30 March, 2020; originally announced March 2020.

Comments: In CVPR 2020 (13 pages including supplementary material). The source code and pre-trained models are publicly available at: https://github.com/joffery/M-ADA

arXiv:1911.06487 [pdf, other]

OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for Lifelong Deep Learning

Authors: Qi She, Fan Feng, Xinyue Hao, Qihan Yang, Chuanlin Lan, Vincenzo Lomonaco, Xuesong Shi, Zhengwei Wang, Yao Guo, Yimin Zhang, Fei Qiao, Rosa H. M. Chan

Abstract: The recent breakthroughs in computer vision have benefited from the availability of large representative datasets (e.g. ImageNet and COCO) for training. Yet, robotic vision poses unique challenges for applying visual algorithms developed from these standard computer vision datasets due to their implicit assumption over non-varying distributions for a fixed set of tasks. Fully retraining models eac… ▽ More The recent breakthroughs in computer vision have benefited from the availability of large representative datasets (e.g. ImageNet and COCO) for training. Yet, robotic vision poses unique challenges for applying visual algorithms developed from these standard computer vision datasets due to their implicit assumption over non-varying distributions for a fixed set of tasks. Fully retraining models each time a new task becomes available is infeasible due to computational, storage and sometimes privacy issues, while naïve incremental strategies have been shown to suffer from catastrophic forgetting. It is crucial for the robots to operate continuously under open-set and detrimental conditions with adaptive visual perceptual systems, where lifelong learning is a fundamental capability. However, very few datasets and benchmarks are available to evaluate and compare emerging techniques. To fill this gap, we provide a new lifelong robotic vision dataset ("OpenLORIS-Object") collected via RGB-D cameras. The dataset embeds the challenges faced by a robot in the real-life application and provides new benchmarks for validating lifelong object recognition algorithms. Moreover, we have provided a testbed of $9$ state-of-the-art lifelong learning algorithms. Each of them involves $48$ tasks with $4$ evaluation metrics over the OpenLORIS-Object dataset. The results demonstrate that the object recognition task in the ever-changing difficulty environments is far from being solved and the bottlenecks are at the forward/backward transfer designs. Our dataset and benchmark are publicly available at at \href{https://lifelong-robotic-vision.github.io/dataset/object}{\underline{https://lifelong-robotic-vision.github.io/dataset/object}}. △ Less

Submitted 6 March, 2020; v1 submitted 15 November, 2019; originally announced November 2019.

Comments: 7 pages, 7 figures, 4 tables

arXiv:1911.05603 [pdf, other]

Are We Ready for Service Robots? The OpenLORIS-Scene Datasets for Lifelong SLAM

Authors: Xuesong Shi, Dongjiang Li, Pengpeng Zhao, Qinbin Tian, Yuxin Tian, Qiwei Long, Chunhao Zhu, Jingwei Song, Fei Qiao, Le Song, Yangquan Guo, Zhigang Wang, Yimin Zhang, Baoxing Qin, Wei Yang, Fangshi Wang, Rosa H. M. Chan, Qi She

Abstract: Service robots should be able to operate autonomously in dynamic and daily changing environments over an extended period of time. While Simultaneous Localization And Mapping (SLAM) is one of the most fundamental problems for robotic autonomy, most existing SLAM works are evaluated with data sequences that are recorded in a short period of time. In real-world deployment, there can be out-of-sight s… ▽ More Service robots should be able to operate autonomously in dynamic and daily changing environments over an extended period of time. While Simultaneous Localization And Mapping (SLAM) is one of the most fundamental problems for robotic autonomy, most existing SLAM works are evaluated with data sequences that are recorded in a short period of time. In real-world deployment, there can be out-of-sight scene changes caused by both natural factors and human activities. For example, in home scenarios, most objects may be movable, replaceable or deformable, and the visual features of the same place may be significantly different in some successive days. Such out-of-sight dynamics pose great challenges to the robustness of pose estimation, and hence a robot's long-term deployment and operation. To differentiate the forementioned problem from the conventional works which are usually evaluated in a static setting in a single run, the term \textit{lifelong SLAM} is used here to address SLAM problems in an ever-changing environment over a long period of time. To accelerate lifelong SLAM research, we release the OpenLORIS-Scene datasets. The data are collected in real-world indoor scenes, for multiple times in each place to include scene changes in real life. We also design benchmarking metrics for lifelong SLAM, with which the robustness and accuracy of pose estimation are evaluated separately. The datasets and benchmark are available online at https://lifelong-robotic-vision.github.io/dataset/scene. △ Less

Submitted 13 March, 2020; v1 submitted 13 November, 2019; originally announced November 2019.

Comments: To be published on ICRA 2020; 7 pages, 3 figures; v2 fixed a number in Table III

arXiv:1804.00651 [pdf, other]

doi 10.2352/ISSN.2470-1173.2018.2.VIPC-251

Interactive Hand Pose Estimation: Boosting accuracy in localizing extended finger joints

Authors: Cairong Zhang, Guijin Wang, Hengkai Guo, Xinghao Chen, Fei Qiao, Huazhong Yang

Abstract: Accurate 3D hand pose estimation plays an important role in Human Machine Interaction (HMI). In the reality of HMI, joints in fingers stretching out, especially corresponding fingertips, are much more important than other joints. We propose a novel method to refine stretching-out finger joint locations after obtaining rough hand pose estimation. It first detects which fingers are stretching out, t… ▽ More Accurate 3D hand pose estimation plays an important role in Human Machine Interaction (HMI). In the reality of HMI, joints in fingers stretching out, especially corresponding fingertips, are much more important than other joints. We propose a novel method to refine stretching-out finger joint locations after obtaining rough hand pose estimation. It first detects which fingers are stretching out, then neighbor pixels of certain joint vote for its new location based on random forests. The algorithm is tested on two public datasets: MSRA15 and ICVL. After the refinement stage of stretching-out fingers, errors of predicted HMI finger joint locations are significantly reduced. Mean error of all fingertips reduces around 5mm (relatively more than 20%). Stretching-out fingertip locations are even more precise, which in MSRA15 reduces 10.51mm (relatively 41.4%). △ Less

Submitted 25 July, 2018; v1 submitted 2 April, 2018; originally announced April 2018.

Comments: Original publication available on https://doi.org/10.2352/ISSN.2470-1173.2018.2.VIPC-251

Journal ref: Electronic Imaging, Visual Information Processing and Communication IX (2018), pp. 251-1-251-6(6)

arXiv:1802.01822 [pdf, other]

Geometry-Contrastive GAN for Facial Expression Transfer

Authors: Fengchun Qiao, Naiming Yao, Zirui Jiao, Zhihao Li, Hui Chen, Hongan Wang

Abstract: In this paper, we propose a Geometry-Contrastive Generative Adversarial Network (GC-GAN) for transferring continuous emotions across different subjects. Given an input face with certain emotion and a target facial expression from another subject, GC-GAN can generate an identity-preserving face with the target expression. Geometry information is introduced into cGANs as continuous conditions to gui… ▽ More In this paper, we propose a Geometry-Contrastive Generative Adversarial Network (GC-GAN) for transferring continuous emotions across different subjects. Given an input face with certain emotion and a target facial expression from another subject, GC-GAN can generate an identity-preserving face with the target expression. Geometry information is introduced into cGANs as continuous conditions to guide the generation of facial expressions. In order to handle the misalignment across different subjects or emotions, contrastive learning is used to transform geometry manifold into an embedded semantic manifold of facial expressions. Therefore, the embedded geometry is injected into the latent space of GANs and control the emotion generation effectively. Experimental results demonstrate that our proposed method can be applied in facial expression transfer even there exist big differences in facial shapes and expressions between different subjects. △ Less

Submitted 22 October, 2018; v1 submitted 6 February, 2018; originally announced February 2018.

arXiv:1702.02447 [pdf, ps, other]

doi 10.1109/ICIP.2017.8297136

Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation

Authors: Hengkai Guo, Guijin Wang, Xinghao Chen, Cairong Zhang, Fei Qiao, Huazhong Yang

Abstract: Hand pose estimation from monocular depth images is an important and challenging problem for human-computer interaction. Recently deep convolutional networks (ConvNet) with sophisticated design have been employed to address it, but the improvement over traditional methods is not so apparent. To promote the performance of directly 3D coordinate regression, we propose a tree-structured Region Ensemb… ▽ More Hand pose estimation from monocular depth images is an important and challenging problem for human-computer interaction. Recently deep convolutional networks (ConvNet) with sophisticated design have been employed to address it, but the improvement over traditional methods is not so apparent. To promote the performance of directly 3D coordinate regression, we propose a tree-structured Region Ensemble Network (REN), which partitions the convolution outputs into regions and integrates the results from multiple regressors on each regions. Compared with multi-model ensemble, our model is completely end-to-end training. The experimental results demonstrate that our approach achieves the best performance among state-of-the-arts on two public datasets. △ Less

Submitted 8 May, 2017; v1 submitted 8 February, 2017; originally announced February 2017.

Comments: Accepted to ICIP 2017. Project: https://github.com/guohengkai/region-ensemble-network

arXiv:1603.01954 [pdf]

A Real-Time and Energy-Efficient Implementation of Difference-of-Gaussian with Flexible Thin-Film Transistors

Authors: Nan Wu, Zheyu Liu, Fei Qiao, Xiaojun Guo, Qi Wei, Yuan Xie, Huazhong Yang

Abstract: With many advantageous features, softness and better biocompatibility, flexible electronic devices have developed rapidly and increasingly attracted attention. Many currently applications with flexible devices are sensors and drivers, while there is nearly no utilization aiming at complex computation since flexible devices have lower electron mobility, simple structure and large process variation.… ▽ More With many advantageous features, softness and better biocompatibility, flexible electronic devices have developed rapidly and increasingly attracted attention. Many currently applications with flexible devices are sensors and drivers, while there is nearly no utilization aiming at complex computation since flexible devices have lower electron mobility, simple structure and large process variation. In this paper, we proposed an innovative method that enabled flexible devices to implement real-time and energy-efficient Difference-of-Gaussian, which illustrated feasibility and potentials for them to achieve complicated real-time computation in future generation products. △ Less

Submitted 7 March, 2016; originally announced March 2016.

arXiv:1503.02354 [pdf, other]

A General Scheme for Noise-Tolerant Logic Design Based on Probabilistic and DCVS Approaches

Authors: Xinghua Yang, Fei Qiao, Qi Wei, Huazhong Yang

Abstract: In this paper, a general circuit scheme for noise-tolerant logic design based on Markov Random Field theory and differential Cascade Voltage Switch technique has been proposed, which is an extension of the work in [1-3], [4]. A block with only four transistors has been successfully inserted to the original circuit scheme from [3] and extensive simulation results show that our proposed design can o… ▽ More In this paper, a general circuit scheme for noise-tolerant logic design based on Markov Random Field theory and differential Cascade Voltage Switch technique has been proposed, which is an extension of the work in [1-3], [4]. A block with only four transistors has been successfully inserted to the original circuit scheme from [3] and extensive simulation results show that our proposed design can operate correctly with the input signal of 1 dB signal-noise-ratio. When using the evaluation parameter from [5], the output value of our design decreases by 76.5% on average than [3] which means that superior noise-immunity could be obtained through our work. △ Less

Submitted 8 March, 2015; originally announced March 2015.

Comments: 4 pages, 10 figures

arXiv:1408.2289 [pdf]

Physical Computing With No Clock to Implement the Gaussian Pyramid of SIFT Algorithm

Authors: Yi Li, Qi Wei, Fei Qiao, Huazhong Yang

Abstract: Physical computing is a technology utilizing the nature of electronic devices and circuit topology to cope with computing tasks. In this paper, we propose an active circuit network to implement multi-scale Gaussian filter, which is also called Gaussian Pyramid in image preprocessing. Various kinds of methods have been tried to accelerate the key stage in image feature extracting algorithm these ye… ▽ More Physical computing is a technology utilizing the nature of electronic devices and circuit topology to cope with computing tasks. In this paper, we propose an active circuit network to implement multi-scale Gaussian filter, which is also called Gaussian Pyramid in image preprocessing. Various kinds of methods have been tried to accelerate the key stage in image feature extracting algorithm these years. Compared with existing technologies, GPU parallel computing and FPGA accelerating technology, physical computing has great advantage on processing speed as well as power consumption. We have verified that processing time to implement the Gaussian pyramid of the SIFT algorithm stands on nanosecond level through the physical computing technology, while other existing methods all need at least hundreds of millisecond. With an estimate on the stray capacitance of the circuit, the power consumption is around 670pJ to filter a 256x256 image. To the best of our knowledge, this is the most fast processing technology to accelerate the SIFT algorithm, and it is also a rather energy-efficient method, thanks to the proposed physical computing technology. △ Less

Submitted 10 August, 2014; originally announced August 2014.

Comments: 6

arXiv:1405.5948 [pdf]

Low-complexity video encoder for smart eyes based on underdetermined blind signal separation

Authors: Jing Liu, Fei Qiao, Zhijian Ou, Huazhong Yang

Abstract: This paper presents a low complexity video coding method based on Underdetermined Blind Signal Separation (UBSS). The detailed coding framework is designed. Three key techniques are proposed to enhance the compression ratio and the quality of the decoded frames. The experiments validate that the proposed method costs 30ms encoding time less than DISCOVER. The simulation shows that this new method… ▽ More This paper presents a low complexity video coding method based on Underdetermined Blind Signal Separation (UBSS). The detailed coding framework is designed. Three key techniques are proposed to enhance the compression ratio and the quality of the decoded frames. The experiments validate that the proposed method costs 30ms encoding time less than DISCOVER. The simulation shows that this new method can save 50% energy compared with H.264. △ Less

Submitted 22 May, 2014; originally announced May 2014.

arXiv:1311.1419 [pdf]

Increasing Compression Ratio of Low Complexity Compressive Sensing Video Encoder with Application-Aware Configurable Mechanism

Authors: Shuang Yu, Fei Qiao, Li Luo, Huazhong Yang

Abstract: With the development of embedded video acquisition nodes and wireless video surveillance systems, traditional video coding methods could not meet the needs of less computing complexity any more, as well as the urgent power consumption. So, a low-complexity compressive sensing video encoder framework with application-aware configurable mechanism is proposed in this paper, where novel encoding metho… ▽ More With the development of embedded video acquisition nodes and wireless video surveillance systems, traditional video coding methods could not meet the needs of less computing complexity any more, as well as the urgent power consumption. So, a low-complexity compressive sensing video encoder framework with application-aware configurable mechanism is proposed in this paper, where novel encoding methods are exploited based on the practical purposes of the real applications to reduce the coding complexity effectively and improve the compression ratio (CR). Moreover, the group of processing (GOP) size and the measurement matrix size can be configured on the encoder side according to the post-analysis requirements of an application example of object tracking to increase the CR of encoder as best as possible. Simulations show the proposed framework of encoder could achieve 60X of CR when the tracking successful rate (SR) is still keeping above 90%. △ Less

Submitted 6 November, 2013; originally announced November 2013.

Comments: 5 pages with 6figures and 1 table,conference

arXiv:1310.3356 [pdf]

A Novel Reconfigurable Computing Architecture for Image Signal Processing Using Circuit-Switched NoC and Synchronous Dataflow Model

Authors: Feitian Li, Fei Qiao, Qi Wei, Huazhong Yang

Abstract: In this paper, a novel reconfigurable architecture is proposed for multifunctional image signal processing systems. A circuit-switched NoC is used to provide interconnection because the non-TMD links ensure fixed throughput, which is a desirable behavior for computational intensive image processing algorithms compared with packet-switched NoC. Image processing algorithms are modeled as synchronous… ▽ More In this paper, a novel reconfigurable architecture is proposed for multifunctional image signal processing systems. A circuit-switched NoC is used to provide interconnection because the non-TMD links ensure fixed throughput, which is a desirable behavior for computational intensive image processing algorithms compared with packet-switched NoC. Image processing algorithms are modeled as synchronous dataflow graphs which provide a unified model for general computing procedure. An image processing system is considered as several temporally mutually exclusive algorithms. Thus, their dataflow graph representations could be considered as a group and a merging algorithm could be applied to generate a union graph while eliminating spatial redundancy for area consumption optimization. After the union graph have been mapped and routed on the NoC, the reconfigurable system could be configured to any of its target image processing algorithms by properly setting the NoC topology. Experiments show the demo reconfigurable system with two image processing applications cost 26.4% less hardware resource, compared with the non-reconfigurable implementations. △ Less

Submitted 12 October, 2013; originally announced October 2013.

Comments: ISQED 2014,6 pages,7 figures

arXiv:1205.4572 [pdf]

A Novel Video Compression Approach Based on Underdetermined Blind Source Separation

Authors: Jing Liu, Fei Qiao, Qi Wei, Huazhong Yang

Abstract: This paper develops a new video compression approach based on underdetermined blind source separation. Underdetermined blind source separation, which can be used to efficiently enhance the video compression ratio, is combined with various off-the-shelf codecs in this paper. Combining with MPEG-2, video compression ratio could be improved slightly more than 33%. As for combing with H.264, 4X~12X mo… ▽ More This paper develops a new video compression approach based on underdetermined blind source separation. Underdetermined blind source separation, which can be used to efficiently enhance the video compression ratio, is combined with various off-the-shelf codecs in this paper. Combining with MPEG-2, video compression ratio could be improved slightly more than 33%. As for combing with H.264, 4X~12X more compression ratio could be achieved with acceptable PSNR, according to different kinds of video sequences. △ Less

Submitted 21 May, 2012; originally announced May 2012.

Comments: 4 pages with 4 figures and 1 table

ACM Class: H.5.1; H.4.3

arXiv:0910.3736 [pdf, other]

A Fault-tolerant Structure for Reliable Multi-core Systems Based on Hardware-Software Co-design

Authors: Bingbing Xia, Fei Qiao, Huazhong Yang, Hui Wang

Abstract: To cope with the soft errors and make full use of the multi-core system, this paper gives an efficient fault-tolerant hardware and software co-designed architecture for multi-core systems. And with a not large number of test patterns, it will use less than 33% hardware resources compared with the traditional hardware redundancy (TMR) and it will take less than 50% time compared with the traditio… ▽ More To cope with the soft errors and make full use of the multi-core system, this paper gives an efficient fault-tolerant hardware and software co-designed architecture for multi-core systems. And with a not large number of test patterns, it will use less than 33% hardware resources compared with the traditional hardware redundancy (TMR) and it will take less than 50% time compared with the traditional software redundancy (time redundant).Therefore, it will be a good choice for the fault-tolerant architecture for the future high-reliable multi-core systems. △ Less

Submitted 20 October, 2009; originally announced October 2009.

Comments: 7 pages, 5 figures

Showing 1–32 of 32 results for author: Qiao, F