Search | arXiv e-print repository

arXiv:2404.19403 [pdf, other]

doi 10.1109/LRA.2024.3450305

Transformer-Enhanced Motion Planner: Attention-Guided Sampling for State-Specific Decision Making

Authors: Lei Zhuang, Jingdong Zhao, Yuntao Li, Zichun Xu, Liangliang Zhao, Hong Liu

Abstract: Sampling-based motion planning (SBMP) algorithms are renowned for their robust global search capabilities. However, the inherent randomness in their sampling mechanisms often result in inconsistent path quality and limited search efficiency. In response to these challenges, this work proposes a novel deep learning-based motion planning framework, named Transformer-Enhanced Motion Planner (TEMP), w… ▽ More Sampling-based motion planning (SBMP) algorithms are renowned for their robust global search capabilities. However, the inherent randomness in their sampling mechanisms often result in inconsistent path quality and limited search efficiency. In response to these challenges, this work proposes a novel deep learning-based motion planning framework, named Transformer-Enhanced Motion Planner (TEMP), which synergizes an Environmental Information Semantic Encoder (EISE) with a Motion Planning Transformer (MPT). EISE converts environmental data into semantic environmental information (SEI), providing MPT with an enriched environmental comprehension. MPT leverages an attention mechanism to dynamically recalibrate its focus on SEI, task objectives, and historical planning data, refining the sampling node generation. To demonstrate the capabilities of TEMP, we train our model using a dataset comprised of planning results produced by the RRT*. EISE and MPT are collaboratively trained, enabling EISE to autonomously learn and extract patterns from environmental data, thereby forming semantic representations that MPT could more effectively interpret and utilize for motion planning. Subsequently, we conducted a systematic evaluation of TEMP's efficacy across diverse task dimensions, which demonstrates that TEMP achieves exceptional performance metrics and a heightened degree of generalizability compared to state-of-the-art SBMPs. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2403.06289 [pdf, other]

Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive Learning

Authors: Zijun Long, Lipeng Zhuang, George Killick, Richard McCreadie, Gerardo Aragon Camarasa, Paul Henderson

Abstract: Human-annotated vision datasets inevitably contain a fraction of human mislabelled examples. While the detrimental effects of such mislabelling on supervised learning are well-researched, their influence on Supervised Contrastive Learning (SCL) remains largely unexplored. In this paper, we show that human-labelling errors not only differ significantly from synthetic label errors, but also pose uni… ▽ More Human-annotated vision datasets inevitably contain a fraction of human mislabelled examples. While the detrimental effects of such mislabelling on supervised learning are well-researched, their influence on Supervised Contrastive Learning (SCL) remains largely unexplored. In this paper, we show that human-labelling errors not only differ significantly from synthetic label errors, but also pose unique challenges in SCL, different to those in traditional supervised learning methods. Specifically, our results indicate they adversely impact the learning process in the ~99% of cases when they occur as false positive samples. Existing noise-mitigating methods primarily focus on synthetic label errors and tackle the unrealistic setting of very high synthetic noise rates (40-80%), but they often underperform on common image datasets due to overfitting. To address this issue, we introduce a novel SCL objective with robustness to human-labelling errors, SCL-RHE. SCL-RHE is designed to mitigate the effects of real-world mislabelled examples, typically characterized by much lower noise rates (<5%). We demonstrate that SCL-RHE consistently outperforms state-of-the-art representation learning and noise-mitigating methods across various vision benchmarks, by offering improved resilience against human-labelling errors. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2311.16481

arXiv:2402.14551 [pdf, other]

CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion

Authors: Zijun Long, George Killick, Lipeng Zhuang, Gerardo Aragon-Camarasa, Zaiqiao Meng, Richard Mccreadie

Abstract: State-of-the-art pre-trained image models predominantly adopt a two-stage approach: initial unsupervised pre-training on large-scale datasets followed by task-specific fine-tuning using Cross-Entropy loss~(CE). However, it has been demonstrated that CE can compromise model generalization and stability. While recent works employing contrastive learning address some of these limitations by enhancing… ▽ More State-of-the-art pre-trained image models predominantly adopt a two-stage approach: initial unsupervised pre-training on large-scale datasets followed by task-specific fine-tuning using Cross-Entropy loss~(CE). However, it has been demonstrated that CE can compromise model generalization and stability. While recent works employing contrastive learning address some of these limitations by enhancing the quality of embeddings and producing better decision boundaries, they often overlook the importance of hard negative mining and rely on resource intensive and slow training using large sample batches. To counter these issues, we introduce a novel approach named CLCE, which integrates Label-Aware Contrastive Learning with CE. Our approach not only maintains the strengths of both loss functions but also leverages hard negative mining in a synergistic way to enhance performance. Experimental results demonstrate that CLCE significantly outperforms CE in Top-1 accuracy across twelve benchmarks, achieving gains of up to 3.52% in few-shot learning scenarios and 3.41% in transfer learning settings with the BEiT-3 model. Importantly, our proposed CLCE approach effectively mitigates the dependency of contrastive learning on large batch sizes such as 4096 samples per batch, a limitation that has previously constrained the application of contrastive learning in budget-limited hardware environments. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: arXiv admin note: text overlap with arXiv:2308.14893

arXiv:2401.11544 [pdf, other]

Hierarchical Prompts for Rehearsal-free Continual Learning

Authors: Yukun Zuo, Hantao Yao, Lu Yu, Liansheng Zhuang, Changsheng Xu

Abstract: Continual learning endeavors to equip the model with the capability to integrate current task knowledge while mitigating the forgetting of past task knowledge. Inspired by prompt tuning, prompt-based methods maintain a frozen backbone and train with slight learnable prompts to minimize the catastrophic forgetting that arises due to updating a large number of backbone parameters. Nonetheless, these… ▽ More Continual learning endeavors to equip the model with the capability to integrate current task knowledge while mitigating the forgetting of past task knowledge. Inspired by prompt tuning, prompt-based methods maintain a frozen backbone and train with slight learnable prompts to minimize the catastrophic forgetting that arises due to updating a large number of backbone parameters. Nonetheless, these learnable prompts tend to concentrate on the discriminatory knowledge of the current task while ignoring past task knowledge, leading to that learnable prompts still suffering from catastrophic forgetting. This paper introduces a novel rehearsal-free paradigm for continual learning termed Hierarchical Prompts (H-Prompts), comprising three categories of prompts -- class prompt, task prompt, and general prompt. To effectively depict the knowledge of past classes, class prompt leverages Bayesian Distribution Alignment to model the distribution of classes in each task. To reduce the forgetting of past task knowledge, task prompt employs Cross-task Knowledge Excavation to amalgamate the knowledge encapsulated in the learned class prompts of past tasks and current task knowledge. Furthermore, general prompt utilizes Generalized Knowledge Exploration to deduce highly generalized knowledge in a self-supervised manner. Evaluations on two benchmarks substantiate the efficacy of the proposed H-Prompts, exemplified by an average accuracy of 87.8% in Split CIFAR-100 and 70.6% in Split ImageNet-R. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: Submitted to TPAMI

arXiv:2401.06287 [pdf, other]

Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition

Authors: Yukun Zuo, Hantao Yao, Liansheng Zhuang, Changsheng Xu

Abstract: Audio-visual video recognition (AVVR) aims to integrate audio and visual clues to categorize videos accurately. While existing methods train AVVR models using provided datasets and achieve satisfactory results, they struggle to retain historical class knowledge when confronted with new classes in real-world situations. Currently, there are no dedicated methods for addressing this problem, so this… ▽ More Audio-visual video recognition (AVVR) aims to integrate audio and visual clues to categorize videos accurately. While existing methods train AVVR models using provided datasets and achieve satisfactory results, they struggle to retain historical class knowledge when confronted with new classes in real-world situations. Currently, there are no dedicated methods for addressing this problem, so this paper concentrates on exploring Class Incremental Audio-Visual Video Recognition (CIAVVR). For CIAVVR, since both stored data and learned model of past classes contain historical knowledge, the core challenge is how to capture past data knowledge and past model knowledge to prevent catastrophic forgetting. We introduce Hierarchical Augmentation and Distillation (HAD), which comprises the Hierarchical Augmentation Module (HAM) and Hierarchical Distillation Module (HDM) to efficiently utilize the hierarchical structure of data and models, respectively. Specifically, HAM implements a novel augmentation strategy, segmental feature augmentation, to preserve hierarchical model knowledge. Meanwhile, HDM introduces newly designed hierarchical (video-distribution) logical distillation and hierarchical (snippet-video) correlative distillation to capture and maintain the hierarchical intra-sample knowledge of each data and the hierarchical inter-sample knowledge between data, respectively. Evaluations on four benchmarks (AVE, AVK-100, AVK-200, and AVK-400) demonstrate that the proposed HAD effectively captures hierarchical information in both data and models, resulting in better preservation of historical class knowledge and improved performance. Furthermore, we provide a theoretical analysis to support the necessity of the segmental feature augmentation strategy. △ Less

Submitted 6 June, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: Accepted by TPAMI

arXiv:2312.13788 [pdf, other]

Open-Source Reinforcement Learning Environments Implemented in MuJoCo with Franka Manipulator

Authors: Zichun Xu, Yuntao Li, Xiaohang Yang, Zhiyuan Zhao, Lei Zhuang, Jingdong Zhao

Abstract: This paper presents three open-source reinforcement learning environments developed on the MuJoCo physics engine with the Franka Emika Panda arm in MuJoCo Menagerie. Three representative tasks, push, slide, and pick-and-place, are implemented through the Gymnasium Robotics API, which inherits from the core of Gymnasium. Both the sparse binary and dense rewards are supported, and the observation sp… ▽ More This paper presents three open-source reinforcement learning environments developed on the MuJoCo physics engine with the Franka Emika Panda arm in MuJoCo Menagerie. Three representative tasks, push, slide, and pick-and-place, are implemented through the Gymnasium Robotics API, which inherits from the core of Gymnasium. Both the sparse binary and dense rewards are supported, and the observation space contains the keys of desired and achieved goals to follow the Multi-Goal Reinforcement Learning framework. Three different off-policy algorithms are used to validate the simulation attributes to ensure the fidelity of all tasks, and benchmark results are also given. Each environment and task are defined in a clean way, and the main parameters for modifying the environment are preserved to reflect the main difference. The repository, including all environments, is available at https://github.com/zichunxx/panda_mujoco_gym. △ Less

Submitted 28 July, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

arXiv:2311.16481 [pdf, other]

Elucidating and Overcoming the Challenges of Label Noise in Supervised Contrastive Learning

Authors: Zijun Long, George Killick, Lipeng Zhuang, Richard McCreadie, Gerardo Aragon Camarasa, Paul Henderson

Abstract: Image classification datasets exhibit a non-negligible fraction of mislabeled examples, often due to human error when one class superficially resembles another. This issue poses challenges in supervised contrastive learning (SCL), where the goal is to cluster together data points of the same class in the embedding space while distancing those of disparate classes. While such methods outperform tho… ▽ More Image classification datasets exhibit a non-negligible fraction of mislabeled examples, often due to human error when one class superficially resembles another. This issue poses challenges in supervised contrastive learning (SCL), where the goal is to cluster together data points of the same class in the embedding space while distancing those of disparate classes. While such methods outperform those based on cross-entropy, they are not immune to labeling errors. However, while the detrimental effects of noisy labels in supervised learning are well-researched, their influence on SCL remains largely unexplored. Hence, we analyse the effect of label errors and examine how they disrupt the SCL algorithm's ability to distinguish between positive and negative sample pairs. Our analysis reveals that human labeling errors manifest as easy positive samples in around 99% of cases. We, therefore, propose D-SCL, a novel Debiased Supervised Contrastive Learning objective designed to mitigate the bias introduced by labeling errors. We demonstrate that D-SCL consistently outperforms state-of-the-art techniques for representation learning across diverse vision benchmarks, offering improved robustness to label errors. △ Less

Submitted 25 November, 2023; originally announced November 2023.

arXiv:2310.15931 [pdf, other]

GO-FEAP: Global Optimal UAV Planner Using Frontier-Omission-Aware Exploration and Altitude-Stratified Planning

Authors: Weiye Zhang, Wenshuai Yu, Licong Zhuang, Xiaoyi Zhang, Zhi Zeng, Jiasong Zhu

Abstract: Autonomous exploration is a fundamental problem for various applications of unmanned aerial vehicles(UAVs). Existing methods, however, are demonstrated to static local optima and two-dimensional exploration. To address these challenges, this paper introduces GO-FEAP (Global Optimal UAV Planner Using Frontier-Omission-Aware Exploration and Altitude-Stratified Planning), aiming to achieve efficient… ▽ More Autonomous exploration is a fundamental problem for various applications of unmanned aerial vehicles(UAVs). Existing methods, however, are demonstrated to static local optima and two-dimensional exploration. To address these challenges, this paper introduces GO-FEAP (Global Optimal UAV Planner Using Frontier-Omission-Aware Exploration and Altitude-Stratified Planning), aiming to achieve efficient and complete three-dimensional exploration. Frontier-Omission-Aware Exploration module presented in this work takes into account multiple pivotal factors, encompassing frontier distance, nearby frontier count, frontier duration, and frontier categorization, for a comprehensive assessment of frontier importance. Furthermore, to tackle scenarios with substantial vertical variations, we introduce the Altitude-Stratified Planning strategy, which stratifies the three-dimensional space based on altitude, conducting global-local planning for each stratum. The objective of global planning is to identify the most optimal frontier for exploration, followed by viewpoint selection and local path optimization based on frontier type, ultimately generating dynamically feasible three-dimensional spatial exploration trajectories. We present extensive benchmark and real-world tests, in which our method completes the exploration tasks with unprecedented completeness compared to state-of-the-art approaches. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: 7 pages,29 figures

arXiv:2305.10662 [pdf, other]

Private Gradient Estimation is Useful for Generative Modeling

Authors: Bochao Liu, Pengju Wang, Weijia Guo, Yong Li, Liansheng Zhuang, Weiping Wang, Shiming Ge

Abstract: While generative models have proved successful in many domains, they may pose a privacy leakage risk in practical deployment. To address this issue, differentially private generative model learning has emerged as a solution to train private generative models for different downstream tasks. However, existing private generative modeling approaches face significant challenges in generating high-dimen… ▽ More While generative models have proved successful in many domains, they may pose a privacy leakage risk in practical deployment. To address this issue, differentially private generative model learning has emerged as a solution to train private generative models for different downstream tasks. However, existing private generative modeling approaches face significant challenges in generating high-dimensional data due to the inherent complexity involved in modeling such data. In this work, we present a new private generative modeling approach where samples are generated via Hamiltonian dynamics with gradients of the private dataset estimated by a well-trained network. In the approach, we achieve differential privacy by perturbing the projection vectors in the estimation of gradients with sliced score matching. In addition, we enhance the reconstruction ability of the model by incorporating a residual enhancement module during the score matching. For sampling, we perform Hamiltonian dynamics with gradients estimated by the well-trained network, allowing the sampled data close to the private dataset's manifold step by step. In this way, our model is able to generate data with a resolution of 256x256. Extensive experiments and analysis clearly demonstrate the effectiveness and rationality of the proposed approach. △ Less

Submitted 26 August, 2024; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: accepted by ACM MM 2024 Oral

arXiv:2305.02567 [pdf, other]

LayoutDM: Transformer-based Diffusion Model for Layout Generation

Authors: Shang Chai, Liansheng Zhuang, Fengying Yan

Abstract: Automatic layout generation that can synthesize high-quality layouts is an important tool for graphic design in many applications. Though existing methods based on generative models such as Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs) have progressed, they still leave much room for improving the quality and diversity of the results. Inspired by the recent success of… ▽ More Automatic layout generation that can synthesize high-quality layouts is an important tool for graphic design in many applications. Though existing methods based on generative models such as Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs) have progressed, they still leave much room for improving the quality and diversity of the results. Inspired by the recent success of diffusion models in generating high-quality images, this paper explores their potential for conditional layout generation and proposes Transformer-based Layout Diffusion Model (LayoutDM) by instantiating the conditional denoising diffusion probabilistic model (DDPM) with a purely transformer-based architecture. Instead of using convolutional neural networks, a transformer-based conditional Layout Denoiser is proposed to learn the reverse diffusion process to generate samples from noised layout data. Benefitting from both transformer and DDPM, our LayoutDM is of desired properties such as high-quality generation, strong sample diversity, faithful distribution coverage, and stationary training in comparison to GANs and VAEs. Quantitative and qualitative experimental results show that our method outperforms state-of-the-art generative models in terms of quality and diversity. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: Accepted by CVPR 2023

arXiv:2204.11695 [pdf, other]

Estimation of Reliable Proposal Quality for Temporal Action Detection

Authors: Junshan Hu, Chaoxu guo, Liansheng Zhuang, Biao Wang, Tiezheng Ge, Yuning Jiang, Houqiang Li

Abstract: Temporal action detection (TAD) aims to locate and recognize the actions in an untrimmed video. Anchor-free methods have made remarkable progress which mainly formulate TAD into two tasks: classification and localization using two separate branches. This paper reveals the temporal misalignment between the two tasks hindering further progress. To address this, we propose a new method that gives ins… ▽ More Temporal action detection (TAD) aims to locate and recognize the actions in an untrimmed video. Anchor-free methods have made remarkable progress which mainly formulate TAD into two tasks: classification and localization using two separate branches. This paper reveals the temporal misalignment between the two tasks hindering further progress. To address this, we propose a new method that gives insights into moment and region perspectives simultaneously to align the two tasks by acquiring reliable proposal quality. For the moment perspective, Boundary Evaluate Module (BEM) is designed which focuses on local appearance and motion evolvement to estimate boundary quality and adopts a multi-scale manner to deal with varied action durations. For the region perspective, we introduce Region Evaluate Module (REM) which uses a new and efficient sampling method for proposal feature representation containing more contextual information compared with point feature to refine category score and proposal boundary. The proposed Boundary Evaluate Module and Region Evaluate Module (BREM) are generic, and they can be easily integrated with other anchor-free TAD methods to achieve superior performance. In our experiments, BREM is combined with two different frameworks and improves the performance on THUMOS14 by 3.6% and 1.0% respectively, reaching a new state-of-the-art (63.6% average mAP). Meanwhile, a competitive result of 36.2% average mAP is achieved on ActivityNet-1.3 with the consistent improvement of BREM. The codes are released at https://github.com/Junshan233/BREM. △ Less

Submitted 21 November, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

Comments: Accepted to ACM Multimedia 2022

arXiv:2109.13967 [pdf, other]

One-shot Key Information Extraction from Document with Deep Partial Graph Matching

Authors: Minghong Yao, Zhiguang Liu, Liangwei Wang, Houqiang Li, Liansheng Zhuang

Abstract: Automating the Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios such as rapid indexing and archiving. Many existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents. However, collecting and labeling a large dataset is time… ▽ More Automating the Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios such as rapid indexing and archiving. Many existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents. However, collecting and labeling a large dataset is time-consuming and is not a user-friendly requirement for many cloud platforms. To overcome these challenges, we propose a deep end-to-end trainable network for one-shot KIE using partial graph matching. Contrary to previous methods that the learning of similarity and solving are optimized separately, our method enables the learning of the two processes in an end-to-end framework. Existing one-shot KIE methods are either template or simple attention-based learning approach that struggle to handle texts that are shifted beyond their desired positions caused by printers, as illustrated in Fig.1. To solve this problem, we add one-to-(at most)-one constraint such that we will find the globally optimized solution even if some texts are drifted. Further, we design a multimodal context ensemble block to boost the performance through fusing features of spatial, textual, and aspect representations. To promote research of KIE, we collected and annotated a one-shot document KIE dataset named DKIE with diverse types of images. The DKIE dataset consists of 2.5K document images captured by mobile phones in natural scenes, and it is the largest available one-shot KIE dataset up to now. The results of experiments on DKIE show that our method achieved state-of-the-art performance compared with recent one-shot and supervised learning approaches. The dataset and proposed one-shot KIE model will be released soo △ Less

Submitted 26 September, 2021; originally announced September 2021.

arXiv:2109.08879 [pdf, other]

FastHyMix: Fast and Parameter-free Hyperspectral Image Mixed Noise Removal

Authors: Lina Zhuang, Michael K. Ng

Abstract: Hyperspectral imaging with high spectral resolution plays an important role in finding objects, identifying materials, or detecting processes. The decrease of the widths of spectral bands leads to a decrease in the signal-to-noise ratio (SNR) of measurements. The decreased SNR reduces the reliability of measured features or information extracted from HSIs. Furthermore, the image degradations linke… ▽ More Hyperspectral imaging with high spectral resolution plays an important role in finding objects, identifying materials, or detecting processes. The decrease of the widths of spectral bands leads to a decrease in the signal-to-noise ratio (SNR) of measurements. The decreased SNR reduces the reliability of measured features or information extracted from HSIs. Furthermore, the image degradations linked with various mechanisms also result in different types of noise, such as Gaussian noise, impulse noise, deadlines, and stripes. This paper introduces a fast and parameter-free hyperspectral image mixed noise removal method (termed FastHyMix), which characterizes the complex distribution of mixed noise by using a Gaussian mixture model and exploits two main characteristics of hyperspectral data, namely low-rankness in the spectral domain and high correlation in the spatial domain. The Gaussian mixture model enables us to make a good estimation of Gaussian noise intensity and the location of sparse noise. The proposed method takes advantage of the low-rankness using subspace representation and the spatial correlation of HSIs by adding a powerful deep image prior, which is extracted from a neural denoising network. An exhaustive array of experiments and comparisons with state-of-the-art denoisers were carried out. The experimental results show significant improvement in both synthetic and real datasets. A MATLAB demo of this work will be available at https://github.com/LinaZhuang for the sake of reproducibility. △ Less

Submitted 18 September, 2021; originally announced September 2021.

arXiv:2103.16204 [pdf, other]

doi 10.1109/TGRS.2021.3065990

Using Low-rank Representation of Abundance Maps and Nonnegative Tensor Factorization for Hyperspectral Nonlinear Unmixing

Authors: Lianru Gao, Zhicheng Wang, Lina Zhuang, Haoyang Yu, Bing Zhang, Jocelyn Chanussot

Abstract: Tensor-based methods have been widely studied to attack inverse problems in hyperspectral imaging since a hyperspectral image (HSI) cube can be naturally represented as a third-order tensor, which can perfectly retain the spatial information in the image. In this article, we extend the linear tensor method to the nonlinear tensor method and propose a nonlinear low-rank tensor unmixing algorithm to… ▽ More Tensor-based methods have been widely studied to attack inverse problems in hyperspectral imaging since a hyperspectral image (HSI) cube can be naturally represented as a third-order tensor, which can perfectly retain the spatial information in the image. In this article, we extend the linear tensor method to the nonlinear tensor method and propose a nonlinear low-rank tensor unmixing algorithm to solve the generalized bilinear model (GBM). Specifically, the linear and nonlinear parts of the GBM can both be expressed as tensors. Furthermore, the low-rank structures of abundance maps and nonlinear interaction abundance maps are exploited by minimizing their nuclear norm, thus taking full advantage of the high spatial correlation in HSIs. Synthetic and real-data experiments show that the low rank of abundance maps and nonlinear interaction abundance maps exploited in our method can improve the performance of the nonlinear unmixing. A MATLAB demo of this work will be available at https://github.com/LinaZhuang for the sake of reproducibility. △ Less

Submitted 30 March, 2021; originally announced March 2021.

arXiv:2103.07437 [pdf, other]

doi 10.1109/TGRS.2020.3040221

Hyperspectral Image Denoising and Anomaly Detection Based on Low-rank and Sparse Representations

Authors: Lina Zhuang, Lianru Gao, Bing Zhang, Xiyou Fu, Jose M. Bioucas-Dias

Abstract: Hyperspectral imaging measures the amount of electromagnetic energy across the instantaneous field of view at a very high resolution in hundreds or thousands of spectral channels. This enables objects to be detected and the identification of materials that have subtle differences between them. However, the increase in spectral resolution often means that there is a decrease in the number of photon… ▽ More Hyperspectral imaging measures the amount of electromagnetic energy across the instantaneous field of view at a very high resolution in hundreds or thousands of spectral channels. This enables objects to be detected and the identification of materials that have subtle differences between them. However, the increase in spectral resolution often means that there is a decrease in the number of photons received in each channel, which means that the noise linked to the image formation process is greater. This degradation limits the quality of the extracted information and its potential applications. Thus, denoising is a fundamental problem in hyperspectral image (HSI) processing. As images of natural scenes with highly correlated spectral channels, HSIs are characterized by a high level of self-similarity and can be well approximated by low-rank representations. These characteristics underlie the state-of-the-art methods used in HSI denoising. However, where there are rarely occurring pixel types, the denoising performance of these methods is not optimal, and the subsequent detection of these pixels may be compromised. To address these hurdles, in this article, we introduce RhyDe (Robust hyperspectral Denoising), a powerful HSI denoiser, which implements explicit low-rank representation, promotes self-similarity, and, by using a form of collaborative sparsity, preserves rare pixels. The denoising and detection effectiveness of the proposed robust HSI denoiser is illustrated using semireal and real data. △ Less

Submitted 12 March, 2021; originally announced March 2021.

arXiv:2103.06842 [pdf, other]

doi 10.1109/JSTARS.2018.2796570

Fast Hyperspectral Image Denoising and Inpainting Based on Low-Rank and Sparse Representations

Authors: Lina Zhuang, Jose M. Bioucas-Dias

Abstract: This paper introduces two very fast and competitive hyperspectral image (HSI) restoration algorithms: fast hyperspectral denoising (FastHyDe), a denoising algorithm able to cope with Gaussian and Poissonian noise, and fast hyperspectral inpainting (FastHyIn), an inpainting algorithm to restore HSIs where some observations from known pixels in some known bands are missing. FastHyDe and FastHyIn ful… ▽ More This paper introduces two very fast and competitive hyperspectral image (HSI) restoration algorithms: fast hyperspectral denoising (FastHyDe), a denoising algorithm able to cope with Gaussian and Poissonian noise, and fast hyperspectral inpainting (FastHyIn), an inpainting algorithm to restore HSIs where some observations from known pixels in some known bands are missing. FastHyDe and FastHyIn fully exploit extremely compact and sparse HSI representations linked with their low-rank and self-similarity characteristics. In a series of experiments with simulated and real data, the newly introduced FastHyDe and FastHyIn compete with the state-of-the-art methods, with much lower computational complexity. △ Less

Submitted 11 March, 2021; originally announced March 2021.

arXiv:2002.10616 [pdf]

How many infections of COVID-19 there will be in the "Diamond Princess"-Predicted by a virus transmission model based on the simulation of crowd flow

Authors: Zhiming Fang, Zhongyi Huang, Xiaolian Li, Jun Zhang, Wei Lv, Lei Zhuang, Xingpeng Xu, Nan Huang

Abstract: Objectives: Simulate the transmission process of COVID-19 in a cruise ship, and then to judge how many infections there will be in the 3711 people in the "Diamond Princess" and analyze measures that could have prevented mass transmission. Methods: Based on the crowd flow model, the virus transmission rule between pedestrians is established, to simulate the spread of the virus caused by the close… ▽ More Objectives: Simulate the transmission process of COVID-19 in a cruise ship, and then to judge how many infections there will be in the 3711 people in the "Diamond Princess" and analyze measures that could have prevented mass transmission. Methods: Based on the crowd flow model, the virus transmission rule between pedestrians is established, to simulate the spread of the virus caused by the close contact during pedestrians' daily activities on the cruise ship. Measurements and main results: Three types of simulation scenarios are designed, the Basic scenario focus on the process of virus transmission caused by a virus carrier and the effect of the personal protective measure against the virus. The condition that the original virus carriers had disembarked halfway and more and more people strengthen self-protection are considered in the Self-protection scenario, which would comparatively accord with the actual situation of "Diamond princess" cruise. Control scenario are set to simulate the effect of taking recommended or mandatory measures on virus transmission Conclusions: There are 850~1009 persons (with large probability) who have been infected with COVID-19 during the voyage of "Diamond Princess". The crowd infection percentage would be controlled effectively if the recommended or mandatory measures can be taken immediately during the alert phase of COVID-19 outbreaks. △ Less

Submitted 24 February, 2020; originally announced February 2020.

arXiv:2002.02089 [pdf, other]

Soft Hindsight Experience Replay

Authors: Qiwei He, Liansheng Zhuang, Houqiang Li

Abstract: Efficient learning in the environment with sparse rewards is one of the most important challenges in Deep Reinforcement Learning (DRL). In continuous DRL environments such as robotic arms control, Hindsight Experience Replay (HER) has been shown an effective solution. However, due to the brittleness of deterministic methods, HER and its variants typically suffer from a major challenge for stabilit… ▽ More Efficient learning in the environment with sparse rewards is one of the most important challenges in Deep Reinforcement Learning (DRL). In continuous DRL environments such as robotic arms control, Hindsight Experience Replay (HER) has been shown an effective solution. However, due to the brittleness of deterministic methods, HER and its variants typically suffer from a major challenge for stability and convergence, which significantly affects the final performance. This challenge severely limits the applicability of such methods to complex real-world domains. To tackle this challenge, in this paper, we propose Soft Hindsight Experience Replay (SHER), a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL), combining the failed experiences reuse and maximum entropy probabilistic inference model. We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards. Experimental results show that, in contrast to HER and its variants, our proposed SHER achieves state-of-the-art performance, especially in the difficult HandManipulation tasks. Furthermore, our SHER method is more stable, achieving very similar performance across different random seeds. △ Less

Submitted 5 February, 2020; originally announced February 2020.

Comments: 7 pages, 5 figures, 1 table, submitted to IJCAI2020

arXiv:1909.11252 [pdf, other]

Neighborhood-Enhanced and Time-Aware Model for Session-based Recommendation

Authors: Yang Lv, Liangsheng Zhuang, Pengyu Luo

Abstract: Session based recommendation has become one of the research hotpots in the field of recommendation systems due to its highly practical value.Previous deep learning methods mostly focus on the sequential characteristics within the current session,and neglect the context similarity and temporal similarity between sessions which contain abundant collaborative information.In this paper,we propose a no… ▽ More Session based recommendation has become one of the research hotpots in the field of recommendation systems due to its highly practical value.Previous deep learning methods mostly focus on the sequential characteristics within the current session,and neglect the context similarity and temporal similarity between sessions which contain abundant collaborative information.In this paper,we propose a novel neural networks framework,namely Neighborhood Enhanced and Time Aware Recommendation Machine(NETA) for session based recommendation. Firstly,we introduce an efficient neighborhood retrieve mechanism to find out similar sessions which includes collaborative information.Then we design a guided attention with time-aware mechanism to extract collaborative representation from neighborhood sessions.Especially,temporal recency between sessions is considered separately.Finally, we design a simple co-attention mechanism to determine the importance of complementary collaborative representation when predicting the next item.Extensive experiments conducted on two real-world datasets demonstrate the effectiveness of our proposed model. △ Less

Submitted 1 October, 2019; v1 submitted 24 September, 2019; originally announced September 2019.

arXiv:1807.08048 [pdf, other]

Baidu Apollo EM Motion Planner

Authors: Haoyang Fan, Fan Zhu, Changchun Liu, Liangliang Zhang, Li Zhuang, Dong Li, Weicheng Zhu, Jiangtao Hu, Hongye Li, Qi Kong

Abstract: In this manuscript, we introduce a real-time motion planning system based on the Baidu Apollo (open source) autonomous driving platform. The developed system aims to address the industrial level-4 motion planning problem while considering safety, comfort and scalability. The system covers multilane and single-lane autonomous driving in a hierarchical manner: (1) The top layer of the system is a mu… ▽ More In this manuscript, we introduce a real-time motion planning system based on the Baidu Apollo (open source) autonomous driving platform. The developed system aims to address the industrial level-4 motion planning problem while considering safety, comfort and scalability. The system covers multilane and single-lane autonomous driving in a hierarchical manner: (1) The top layer of the system is a multilane strategy that handles lane-change scenarios by comparing lane-level trajectories computed in parallel. (2) Inside the lane-level trajectory generator, it iteratively solves path and speed optimization based on a Frenet frame. (3) For path and speed optimization, a combination of dynamic programming and spline-based quadratic programming is proposed to construct a scalable and easy-to-tune framework to handle traffic rules, obstacle decisions and smoothness simultaneously. The planner is scalable to both highway and lower-speed city driving scenarios. We also demonstrate the algorithm through scenario illustrations and on-road test results. The system described in this manuscript has been deployed to dozens of Baidu Apollo autonomous driving vehicles since Apollo v1.5 was announced in September 2017. As of May 16th, 2018, the system has been tested under 3,380 hours and approximately 68,000 kilometers (42,253 miles) of closed-loop autonomous driving under various urban scenarios. The algorithm described in this manuscript is available at https://github.com/ApolloAuto/apollo/tree/master/modules/planning. △ Less

Submitted 20 July, 2018; originally announced July 2018.

arXiv:1709.04344

Flexible Network Binarization with Layer-wise Priority

Authors: Lixue Zhuang, Yi Xu, Bingbing Ni, Hongteng Xu

Abstract: How to effectively approximate real-valued parameters with binary codes plays a central role in neural network binarization. In this work, we reveal an important fact that binarizing different layers has a widely-varied effect on the compression ratio of network and the loss of performance. Based on this fact, we propose a novel and flexible neural network binarization method by introducing the co… ▽ More How to effectively approximate real-valued parameters with binary codes plays a central role in neural network binarization. In this work, we reveal an important fact that binarizing different layers has a widely-varied effect on the compression ratio of network and the loss of performance. Based on this fact, we propose a novel and flexible neural network binarization method by introducing the concept of layer-wise priority which binarizes parameters in inverse order of their layer depth. In each training step, our method selects a specific network layer, minimizes the discrepancy between the original real-valued weights and its binary approximations, and fine-tunes the whole network accordingly. During the iteration of the above process, it is significant that we can flexibly decide whether to binarize the remaining floating layers or not and explore a trade-off between the loss of performance and the compression ratio of model. The resulting binary network is applied for efficient pedestrian detection. Extensive experimental results on several benchmarks show that under the same compression ratio, our method achieves much lower miss rate and faster detection speed than the state-of-the-art neural network binarization method. △ Less

Submitted 16 February, 2018; v1 submitted 13 September, 2017; originally announced September 2017.

Comments: More experiments on image classification are planned

arXiv:1608.05150 [pdf]

Experimental demonstration of Layered/Enhanced ACO-OFDM in short haul optical fiber transmission link

Authors: Binhuang Song, Chen Zhu, Bill Corcoran, Qibing Wang, Leimeng Zhuang, Arthur J. Lowery

Abstract: Asymmetrically clipped optical orthogonal frequency division multiplexing (ACO-OFDM) is theoretically more power efficient but less spectrally efficient than DC-bias OFDM (DCO-OFDM), with less power allocating to the informationless bias component by only using odd index sub-carriers. Layered/Enhanced asymmetrically clipped optical orthogonal frequency division multiplexing (L/E-ACO-OFDM) has been… ▽ More Asymmetrically clipped optical orthogonal frequency division multiplexing (ACO-OFDM) is theoretically more power efficient but less spectrally efficient than DC-bias OFDM (DCO-OFDM), with less power allocating to the informationless bias component by only using odd index sub-carriers. Layered/Enhanced asymmetrically clipped optical orthogonal frequency division multiplexing (L/E-ACO-OFDM) has been proposed to increase the spectral efficiency of ACO-OFDM. In this letter, we experimentally demonstrate a 30-km single mode fiber transmission using L/E-ACO-OFDM at 4.375 Gbits/s. Using a Volterra filter based equalizer, 2-dB and 1.5-dB Q-factor improvements for L/E-ACO-OFDM comparing with DCO-OFDM can be obtained in back-to-back and 30-km fiber transmission respectively. △ Less

Submitted 17 August, 2016; originally announced August 2016.

arXiv:1607.02539

Graph Construction with Label Information for Semi-Supervised Learning

Authors: Liansheng Zhuang, Zihan Zhou, Jingwen Yin, Shenghua Gao, Zhouchen Lin, Yi Ma, Nenghai Yu

Abstract: In the literature, most existing graph-based semi-supervised learning (SSL) methods only use the label information of observed samples in the label propagation stage, while ignoring such valuable information when learning the graph. In this paper, we argue that it is beneficial to consider the label information in the graph learning stage. Specifically, by enforcing the weight of edges between lab… ▽ More In the literature, most existing graph-based semi-supervised learning (SSL) methods only use the label information of observed samples in the label propagation stage, while ignoring such valuable information when learning the graph. In this paper, we argue that it is beneficial to consider the label information in the graph learning stage. Specifically, by enforcing the weight of edges between labeled samples of different classes to be zero, we explicitly incorporate the label information into the state-of-the-art graph learning methods, such as the Low-Rank Representation (LRR), and propose a novel semi-supervised graph learning method called Semi-Supervised Low-Rank Representation (SSLRR). This results in a convex optimization problem with linear constraints, which can be solved by the linearized alternating direction method. Though we take LRR as an example, our proposed method is in fact very general and can be applied to any self-representation graph learning methods. Experiment results on both synthetic and real datasets demonstrate that the proposed graph learning method can better capture the global geometric structure of the data, and therefore is more effective for semi-supervised learning tasks. △ Less

Submitted 12 February, 2017; v1 submitted 8 July, 2016; originally announced July 2016.

Comments: This paper is withdrawn by the authors for some errors

arXiv:1409.0964 [pdf, other]

doi 10.1109/TIP.2015.2441632

Constructing a Non-Negative Low Rank and Sparse Graph with Data-Adaptive Features

Authors: Liansheng Zhuang, Shenghua Gao, Jinhui Tang, Jingjing Wang, Zhouchen Lin, Yi Ma

Abstract: This paper aims at constructing a good graph for discovering intrinsic data structures in a semi-supervised learning setting. Firstly, we propose to build a non-negative low-rank and sparse (referred to as NNLRS) graph for the given data representation. Specifically, the weights of edges in the graph are obtained by seeking a nonnegative low-rank and sparse matrix that represents each data sample… ▽ More This paper aims at constructing a good graph for discovering intrinsic data structures in a semi-supervised learning setting. Firstly, we propose to build a non-negative low-rank and sparse (referred to as NNLRS) graph for the given data representation. Specifically, the weights of edges in the graph are obtained by seeking a nonnegative low-rank and sparse matrix that represents each data sample as a linear combination of others. The so-obtained NNLRS-graph can capture both the global mixture of subspaces structure (by the low rankness) and the locally linear structure (by the sparseness) of the data, hence is both generative and discriminative. Secondly, as good features are extremely important for constructing a good graph, we propose to learn the data embedding matrix and construct the graph jointly within one framework, which is termed as NNLRS with embedded features (referred to as NNLRS-EF). Extensive experiments on three publicly available datasets demonstrate that the proposed method outperforms the state-of-the-art graph construction method by a large margin for both semi-supervised classification and discriminative analysis, which verifies the effectiveness of our proposed method. △ Less

Submitted 3 September, 2014; originally announced September 2014.

arXiv:1402.1879 [pdf, other]

Sparse Illumination Learning and Transfer for Single-Sample Face Recognition with Image Corruption and Misalignment

Authors: Liansheng Zhuang, Tsung-Han Chan, Allen Y. Yang, S. Shankar Sastry, Yi Ma

Abstract: Single-sample face recognition is one of the most challenging problems in face recognition. We propose a novel algorithm to address this problem based on a sparse representation based classification (SRC) framework. The new algorithm is robust to image misalignment and pixel corruption, and is able to reduce required gallery images to one sample per class. To compensate for the missing illuminatio… ▽ More Single-sample face recognition is one of the most challenging problems in face recognition. We propose a novel algorithm to address this problem based on a sparse representation based classification (SRC) framework. The new algorithm is robust to image misalignment and pixel corruption, and is able to reduce required gallery images to one sample per class. To compensate for the missing illumination information traditionally provided by multiple gallery images, a sparse illumination learning and transfer (SILT) technique is introduced. The illumination in SILT is learned by fitting illumination examples of auxiliary face images from one or more additional subjects with a sparsely-used illumination dictionary. By enforcing a sparse representation of the query image in the illumination dictionary, the SILT can effectively recover and transfer the illumination and pose information from the alignment stage to the recognition stage. Our extensive experiments have demonstrated that the new algorithms significantly outperform the state of the art in the single-sample regime and with less restrictions. In particular, the single-sample face alignment accuracy is comparable to that of the well-known Deformable SRC algorithm using multiple gallery images per class. Furthermore, the face recognition accuracy exceeds those of the SRC and Extended SRC algorithms using hand labeled alignment initialization. △ Less

Submitted 8 February, 2014; originally announced February 2014.

Showing 1–25 of 25 results for author: Zhuang, L