Search | arXiv e-print repository

arXiv:2404.19412 [pdf]

Enhancing Robotic Adaptability: Integrating Unsupervised Trajectory Segmentation and Conditional ProMPs for Dynamic Learning Environments

Authors: Tianci Gao

Abstract: We propose a novel framework for enhancing robotic adaptability and learning efficiency, which integrates unsupervised trajectory segmentation with adaptive probabilistic movement primitives (ProMPs). By employing a cutting-edge deep learning architecture that combines autoencoders and Recurrent Neural Networks (RNNs), our approach autonomously pinpoints critical transitional points in continuous,… ▽ More We propose a novel framework for enhancing robotic adaptability and learning efficiency, which integrates unsupervised trajectory segmentation with adaptive probabilistic movement primitives (ProMPs). By employing a cutting-edge deep learning architecture that combines autoencoders and Recurrent Neural Networks (RNNs), our approach autonomously pinpoints critical transitional points in continuous, unlabeled motion data, thus significantly reducing dependence on extensively labeled datasets. This innovative method dynamically adjusts motion trajectories using conditional variables, significantly enhancing the flexibility and accuracy of robotic actions under dynamic conditions while also reducing the computational overhead associated with traditional robotic programming methods. Our experimental validation demonstrates superior learning efficiency and adaptability compared to existing techniques, paving the way for advanced applications in industrial and service robotics. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2403.11091 [pdf, other]

Multitask frame-level learning for few-shot sound event detection

Authors: Liang Zou, Genwei Yan, Ruoyu Wang, Jun Du, Meng Lei, Tian Gao, Xin Fang

Abstract: This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples. However, prevailing methods methods in few-shot SED predominantly rely on segment-level predictions, which often providing detailed, fine-grained predictions, particularly for events of brief duration. Although frame-level prediction strategies have been… ▽ More This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples. However, prevailing methods methods in few-shot SED predominantly rely on segment-level predictions, which often providing detailed, fine-grained predictions, particularly for events of brief duration. Although frame-level prediction strategies have been proposed to overcome these limitations, these strategies commonly face difficulties with prediction truncation caused by background noise. To alleviate this issue, we introduces an innovative multitask frame-level SED framework. In addition, we introduce TimeFilterAug, a linear timing mask for data augmentation, to increase the model's robustness and adaptability to diverse acoustic environments. The proposed method achieves a F-score of 63.8%, securing the 1st rank in the few-shot bioacoustic event detection category of the Detection and Classification of Acoustic Scenes and Events Challenge 2023. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 6 pages, 4 figures, conference

arXiv:2312.10593 [pdf, other]

A Novel RFID Authentication Protocol Based on A Block-Order-Modulus Variable Matrix Encryption Algorithm

Authors: Yan Wang, Ruiqi Liu, Tong Gao, Feng Shu, Xuemei Lei, Guan Gui, Jiangzhou Wang

Abstract: In this paper, authentication for mobile radio frequency identification (RFID) systems with low-cost tags is studied. Firstly, an adaptive modulus (AM) encryption algorithm is proposed. Subsequently, in order to enhance the security without additional storage of new key matrices, a self-updating encryption order (SUEO) algorithm is designed. Furthermore, a diagonal block local transpose key matrix… ▽ More In this paper, authentication for mobile radio frequency identification (RFID) systems with low-cost tags is studied. Firstly, an adaptive modulus (AM) encryption algorithm is proposed. Subsequently, in order to enhance the security without additional storage of new key matrices, a self-updating encryption order (SUEO) algorithm is designed. Furthermore, a diagonal block local transpose key matrix (DBLTKM) encryption algorithm is presented, which effectively expands the feasible domain of the key space. Based on the above three algorithms, a novel joint AM-SUEO-DBLTKM encryption algorithm is constructed. Making full use of the advantages of the proposed joint algorithm, a two-way RFID authentication protocol, named AM-SUEO-DBLTKM-RFID, is proposed for mobile RFID systems. In addition, the Burrows-Abadi-Needham (BAN) logic and security analysis indicate that the proposed AM-SUEO-DBLTKM-RFID protocol can effectively combat various typical attacks. Numerical results demonstrate that the proposed AM-SUEO-DBLTKM algorithm can save 99.59\% of tag storage over traditional algorithms. Finally, the low computational complexity as well as the low storage cost of the proposed AM-SUEO-DBLTKM-RFID protocol facilitates deployment within low-cost RFID tags. △ Less

Submitted 9 May, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

arXiv:2309.11745 [pdf, other]

PIE: Simulating Disease Progression via Progressive Image Editing

Authors: Kaizhao Liang, Xu Cao, Kuei-Da Liao, Tianren Gao, Wenqian Ye, Zhengyu Chen, Jianguo Cao, Tejas Nama, Jimeng Sun

Abstract: Disease progression simulation is a crucial area of research that has significant implications for clinical diagnosis, prognosis, and treatment. One major challenge in this field is the lack of continuous medical imaging monitoring of individual patients over time. To address this issue, we develop a novel framework termed Progressive Image Editing (PIE) that enables controlled manipulation of dis… ▽ More Disease progression simulation is a crucial area of research that has significant implications for clinical diagnosis, prognosis, and treatment. One major challenge in this field is the lack of continuous medical imaging monitoring of individual patients over time. To address this issue, we develop a novel framework termed Progressive Image Editing (PIE) that enables controlled manipulation of disease-related image features, facilitating precise and realistic disease progression simulation. Specifically, we leverage recent advancements in text-to-image generative models to simulate disease progression accurately and personalize it for each patient. We theoretically analyze the iterative refining process in our framework as a gradient descent with an exponentially decayed learning rate. To validate our framework, we conduct experiments in three medical imaging domains. Our results demonstrate the superiority of PIE over existing methods such as Stable Diffusion Walk and Style-Based Manifold Extrapolation based on CLIP score (Realism) and Disease Classification Confidence (Alignment). Our user study collected feedback from 35 veteran physicians to assess the generated progressions. Remarkably, 76.2% of the feedback agrees with the fidelity of the generated progressions. To our best knowledge, PIE is the first of its kind to generate disease progression images meeting real-world standards. It is a promising tool for medical research and clinical practice, potentially allowing healthcare providers to model disease trajectories over time, predict future treatment responses, and improve patient outcomes. △ Less

Submitted 5 October, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: Code and checkpoints for replicating our results can be found at https://github.com/IrohXu/PIE and https://huggingface.co/IrohXu/stable-diffusion-mimic-cxr-v0.1

arXiv:2308.14638 [pdf, other]

The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge

Authors: Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee

Abstract: This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios. Additionally, it also evaluates the efficiency of systems in handling diverse array devices. To address these issues, we implemented an end-to-end speaker diarization system and introduced a rectification strategy base… ▽ More This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios. Additionally, it also evaluates the efficiency of systems in handling diverse array devices. To address these issues, we implemented an end-to-end speaker diarization system and introduced a rectification strategy based on multi-channel spatial information. This approach significantly diminished the word error rates (WER). In terms of recognition, we utilized publicly available pre-trained models as the foundational models to train our end-to-end speech recognition models. Our system attained a Macro-averaged diarization-attributed WER (DA-WER) of 21.01% on the CHiME-7 evaluation set, which signifies a relative improvement of 62.04% over the official baseline system. △ Less

Submitted 10 October, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: Accepted by 2023 CHiME Workshop, Oral

arXiv:2305.16616 [pdf, other]

Channel Measurement, Modeling, and Simulation for 6G: A Survey and Tutorial

Authors: Jianhua Zhang, Jiaxin Lin, Pan Tang, Yuxiang Zhang, Huixin Xu, Tianyang Gao, Haiyang Miao, Zeyong Chai, Zhengfu Zhou, Yi Li, Huiwen Gong, Yameng Liu, Zhiqiang Yuan, Lei Tian, Shaoshi Yang, Liang Xia, Guangyi Liu, Ping Zhang

Abstract: The sixth generation (6G) mobile communications have attracted substantial attention in the global research community of information and communication technologies (ICT). 6G systems are expected to support not only extended 5G usage scenarios, but also new usage scenarios, such as integrated sensing and communication (ISAC), integrated artificial intelligence (AI) and communication, and communicat… ▽ More The sixth generation (6G) mobile communications have attracted substantial attention in the global research community of information and communication technologies (ICT). 6G systems are expected to support not only extended 5G usage scenarios, but also new usage scenarios, such as integrated sensing and communication (ISAC), integrated artificial intelligence (AI) and communication, and communication and ubiquitous connectivity. To realize this goal, channel characteristics must be comprehensively studied and properly exploited, so as to promote the design, standardization, and optimization of 6G systems. In this paper, we first summarize the requirements and challenges in 6G channel research. Our focus is on channels for five promising technologies enabling 6G, including terahertz (THz), extreme MIMO (E-MIMO), ISAC, reconfigurable intelligent surface (RIS), and space-air-ground integrated network (SAGIN). Then, a survey of the progress of the 6G channel research regarding the above five promising technologies is presented in terms of the latest measurement campaigns, new characteristics, modeling methods, and research prospects. Moreover, a tutorial on the 6G channel simulations is presented. We introduce the BUPTCMG- 6G, a 6G link-level channel simulator, developed based on the ITU/3GPP 3D geometry-based stochastic model (GBSM) methodology. The simulator supports the channel simulation of the aforementioned 6G potential technologies. To facilitate the use of the simulator, the tutorial encompasses the design framework, user guidelines, and application examples. This paper offers in-depth, hands-on insights into the best practices of channel measurements, modeling, and simulations for the evaluation of 6G technologies, the development of 6G standards, and the implementation and optimization of 6G systems. △ Less

Submitted 28 March, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: 41 pages,52 figures

arXiv:2304.12615 [pdf]

STM-UNet: An Efficient U-shaped Architecture Based on Swin Transformer and Multi-scale MLP for Medical Image Segmentation

Authors: Lei Shi, Tianyu Gao, Zheng Zhang, Junxing Zhang

Abstract: Automated medical image segmentation can assist doctors to diagnose faster and more accurate. Deep learning based models for medical image segmentation have made great progress in recent years. However, the existing models fail to effectively leverage Transformer and MLP for improving U-shaped architecture efficiently. In addition, the multi-scale features of the MLP have not been fully extracted… ▽ More Automated medical image segmentation can assist doctors to diagnose faster and more accurate. Deep learning based models for medical image segmentation have made great progress in recent years. However, the existing models fail to effectively leverage Transformer and MLP for improving U-shaped architecture efficiently. In addition, the multi-scale features of the MLP have not been fully extracted in the bottleneck of U-shaped architecture. In this paper, we propose an efficient U-shaped architecture based on Swin Transformer and multi-scale MLP, namely STM-UNet. Specifically, the Swin Transformer block is added to skip connection of STM-UNet in form of residual connection, which can enhance the modeling ability of global features and long-range dependency. Meanwhile, a novel PCAS-MLP with parallel convolution module is designed and placed into the bottleneck of our architecture to contribute to the improvement of segmentation performance. The experimental results on ISIC 2016 and ISIC 2018 demonstrate the effectiveness of our proposed method. Our method also outperforms several state-of-the-art methods in terms of IoU and Dice. Our method has achieved a better trade-off between high segmentation accuracy and low model complexity. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: 6 pages,5 figures,2 tables

arXiv:2207.08902 [pdf, other]

Layered Cost-Map-Based Traffic Management for Multiple Automated Mobile Robots via a Data Distribution Service

Authors: Seungwoo Jeong, Taekwon Ga, Inhwan Jeong, Jongkyu Oh, Jongeun Choi

Abstract: This letter proposes traffic management for multiple automated mobile robots (AMRs) based on a layered cost map. Multiple AMRs communicate via a data distribution service (DDS), which is shared by topics in the same DDS domain. The cost of each layer is manipulated by topics. The traffic management server in the domain sends or receives topics to each of AMRs. Using the layered cost map, the new c… ▽ More This letter proposes traffic management for multiple automated mobile robots (AMRs) based on a layered cost map. Multiple AMRs communicate via a data distribution service (DDS), which is shared by topics in the same DDS domain. The cost of each layer is manipulated by topics. The traffic management server in the domain sends or receives topics to each of AMRs. Using the layered cost map, the new concept of prohibition filter, lane filter, fleet layer, and region filter are proposed and implemented. The prohibition filter can help a user set an area that would prohibit an AMR from trespassing. The lane filter can help set one-way directions based on an angle image. The fleet layer can help AMRs share their locations via the traffic management server. The region filter requests for or receives an exclusive area, which can be occupied by only one AMR, from the traffic management server. All the layers are experimentally validated with real-world AMRs. Each area can be configured with user-defined images or text-based parameter files. △ Less

Submitted 18 July, 2022; originally announced July 2022.

Comments: 8 pages, 13 figures

arXiv:2205.03553 [pdf, other]

doi 10.1016/j.patcog.2023.110205

From heavy rain removal to detail restoration: A faster and better network

Authors: Yuanbo Wen, Tao Gao, Jing Zhang, Kaihao Zhang, Ting Chen

Abstract: The profound accumulation of precipitation during intense rainfall events can markedly degrade the quality of images, leading to the erosion of textural details. Despite the improvements observed in existing learning-based methods specialized for heavy rain removal, it is discerned that a significant proportion of these methods tend to overlook the precise reconstruction of the intricate details.… ▽ More The profound accumulation of precipitation during intense rainfall events can markedly degrade the quality of images, leading to the erosion of textural details. Despite the improvements observed in existing learning-based methods specialized for heavy rain removal, it is discerned that a significant proportion of these methods tend to overlook the precise reconstruction of the intricate details. In this work, we introduce a simple dual-stage progressive enhancement network, denoted as DPENet, aiming to achieve effective deraining while preserving the structural accuracy of rain-free images. This approach comprises two key modules, a rain streaks removal network (R$^2$Net) focusing on accurate rain removal, and a details reconstruction network (DRNet) designed to recover the textural details of rain-free images. Firstly, we introduce a dilated dense residual block (DDRB) within R$^2$Net, enabling the aggregation of high-level and low-level features. Secondly, an enhanced residual pixel-wise attention block (ERPAB) is integrated into DRNet to facilitate the incorporation of contextual information. To further enhance the fidelity of our approach, we employ a comprehensive loss function that accentuates both the marginal and regional accuracy of rain-free images. Extensive experiments conducted on publicly available benchmarks demonstrates the noteworthy efficiency and effectiveness of our proposed DPENet. The source code and pre-trained models are currently available at \url{https://github.com/chdwyb/DPENet}. △ Less

Submitted 18 December, 2023; v1 submitted 7 May, 2022; originally announced May 2022.

Comments: Accepted by Pattern Recognition

arXiv:2112.14555 [pdf, other]

Onsite Non-Line-of-Sight Imaging via Online Calibrations

Authors: Zhengqing Pan, Ruiqian Li, Tian Gao, Zi Wang, Ping Liu, Siyuan Shen, Tao Wu, Jingyi Yu, Shiying Li

Abstract: There has been an increasing interest in deploying non-line-of-sight (NLOS) imaging systems for recovering objects behind an obstacle. Existing solutions generally pre-calibrate the system before scanning the hidden objects. Onsite adjustments of the occluder, object and scanning pattern require re-calibration. We present an online calibration technique that directly decouples the acquired transie… ▽ More There has been an increasing interest in deploying non-line-of-sight (NLOS) imaging systems for recovering objects behind an obstacle. Existing solutions generally pre-calibrate the system before scanning the hidden objects. Onsite adjustments of the occluder, object and scanning pattern require re-calibration. We present an online calibration technique that directly decouples the acquired transients at onsite scanning into the LOS and hidden components. We use the former to directly (re)calibrate the system upon changes of scene/obstacle configurations, scanning regions, and scanning patterns whereas the latter for hidden object recovery via spatial, frequency or learning based techniques. Our technique avoids using auxiliary calibration apparatus such as mirrors or checkerboards and supports both laboratory validations and real-world deployments. △ Less

Submitted 29 December, 2021; originally announced December 2021.

arXiv:2111.01430 [pdf, other]

CycleGAN with Dual Adversarial Loss for Bone-Conducted Speech Enhancement

Authors: Qing Pan, Teng Gao, Jian Zhou, Huabin Wang, Liang Tao, Hon Keung Kwan

Abstract: Compared with air-conducted speech, bone-conducted speech has the unique advantage of shielding background noise. Enhancement of bone-conducted speech helps to improve its quality and intelligibility. In this paper, a novel CycleGAN with dual adversarial loss (CycleGAN-DAL) is proposed for bone-conducted speech enhancement. The proposed method uses an adversarial loss and a cycle-consistent loss s… ▽ More Compared with air-conducted speech, bone-conducted speech has the unique advantage of shielding background noise. Enhancement of bone-conducted speech helps to improve its quality and intelligibility. In this paper, a novel CycleGAN with dual adversarial loss (CycleGAN-DAL) is proposed for bone-conducted speech enhancement. The proposed method uses an adversarial loss and a cycle-consistent loss simultaneously to learn forward and cyclic mapping, in which the adversarial loss is replaced with the classification adversarial loss and the defect adversarial loss to consolidate the forward mapping. Compared with conventional baseline methods, it can learn feature mapping between bone-conducted speech and target speech without additional air-conducted speech assistance. Moreover, the proposed method also avoids the oversmooth problem which is occurred commonly in conventional statistical based models. Experimental results show that the proposed method outperforms baseline methods such as CycleGAN, GMM, and BLSTM. Keywords: Bone-conducted speech enhancement, dual adversarial loss, Parallel CycleGAN, high frequency speech reconstruction △ Less

Submitted 2 November, 2021; originally announced November 2021.

arXiv:2111.01342 [pdf, other]

Attention-Guided Generative Adversarial Network for Whisper to Normal Speech Conversion

Authors: Teng Gao, Jian Zhou, Huabin Wang, Liang Tao, Hon Keung Kwan

Abstract: Whispered speech is a special way of pronunciation without using vocal cord vibration. A whispered speech does not contain a fundamental frequency, and its energy is about 20dB lower than that of a normal speech. Converting a whispered speech into a normal speech can improve speech quality and intelligibility. In this paper, a novel attention-guided generative adversarial network model incorporati… ▽ More Whispered speech is a special way of pronunciation without using vocal cord vibration. A whispered speech does not contain a fundamental frequency, and its energy is about 20dB lower than that of a normal speech. Converting a whispered speech into a normal speech can improve speech quality and intelligibility. In this paper, a novel attention-guided generative adversarial network model incorporating an autoencoder, a Siamese neural network, and an identity mapping loss function for whisper to normal speech conversion (AGAN-W2SC) is proposed. The proposed method avoids the challenge of estimating the fundamental frequency of the normal voiced speech converted from a whispered speech. Specifically, the proposed model is more amendable to practical applications because it does not need to align speech features for training. Experimental results demonstrate that the proposed AGAN-W2SC can obtain improved speech quality and intelligibility compared with dynamic-time-warping-based methods. △ Less

Submitted 1 November, 2021; originally announced November 2021.

arXiv:2103.16827 [pdf, other]

Integer-only Zero-shot Quantization for Efficient Speech Recognition

Authors: Sehoon Kim, Amir Gholami, Zhewei Yao, Nicholas Lee, Patrick Wang, Aniruddha Nrusimha, Bohan Zhai, Tianren Gao, Michael W. Mahoney, Kurt Keutzer

Abstract: End-to-end neural network models achieve improved performance on various automatic speech recognition (ASR) tasks. However, these models perform poorly on edge hardware due to large memory and computation requirements. While quantizing model weights and/or activations to low-precision can be a promising solution, previous research on quantizing ASR models is limited. In particular, the previous ap… ▽ More End-to-end neural network models achieve improved performance on various automatic speech recognition (ASR) tasks. However, these models perform poorly on edge hardware due to large memory and computation requirements. While quantizing model weights and/or activations to low-precision can be a promising solution, previous research on quantizing ASR models is limited. In particular, the previous approaches use floating-point arithmetic during inference and thus they cannot fully exploit efficient integer processing units. Moreover, they require training and/or validation data during quantization, which may not be available due to security or privacy concerns. To address these limitations, we propose an integer-only, zero-shot quantization scheme for ASR models. In particular, we generate synthetic data whose runtime statistics resemble the real data, and we use it to calibrate models during quantization. We apply our method to quantize QuartzNet, Jasper, and Conformer and show negligible WER degradation as compared to the full-precision baseline models, even without using any data. Moreover, we achieve up to 2.35x speedup on a T4 GPU and 4x compression rate, with a modest WER degradation of <1% with INT8 quantization. △ Less

Submitted 30 January, 2022; v1 submitted 31 March, 2021; originally announced March 2021.

Journal ref: ICASSP 2022

arXiv:2103.10661 [pdf, other]

USTC-NELSLIP System Description for DIHARD-III Challenge

Authors: Yuxuan Wang, Maokui He, Shutong Niu, Lei Sun, Tian Gao, Xin Fang, Jia Pan, Jun Du, Chin-Hui Lee

Abstract: This system description describes our submission system to the Third DIHARD Speech Diarization Challenge. Besides the traditional clustering based system, the innovation of our system lies in the combination of various front-end techniques to solve the diarization problem, including speech separation and target-speaker based voice activity detection (TS-VAD), combined with iterative data purificat… ▽ More This system description describes our submission system to the Third DIHARD Speech Diarization Challenge. Besides the traditional clustering based system, the innovation of our system lies in the combination of various front-end techniques to solve the diarization problem, including speech separation and target-speaker based voice activity detection (TS-VAD), combined with iterative data purification. We also adopted audio domain classification to design domain-dependent processing. Finally, we performed post processing to do system fusion and selection. Our best system achieved DERs of 11.30% in track 1 and 16.78% in track 2 on evaluation set, respectively. △ Less

Submitted 19 March, 2021; originally announced March 2021.

arXiv:2101.00373 [pdf, other]

doi 10.1109/TPAMI.2021.3076062

Non-line-of-Sight Imaging via Neural Transient Fields

Authors: Siyuan Shen, Zi Wang, Ping Liu, Zhengqing Pan, Ruiqian Li, Tian Gao, Shiying Li, Jingyi Yu

Abstract: We present a neural modeling framework for Non-Line-of-Sight (NLOS) imaging. Previous solutions have sought to explicitly recover the 3D geometry (e.g., as point clouds) or voxel density (e.g., within a pre-defined volume) of the hidden scene. In contrast, inspired by the recent Neural Radiance Field (NeRF) approach, we use a multi-layer perceptron (MLP) to represent the neural transient field or… ▽ More We present a neural modeling framework for Non-Line-of-Sight (NLOS) imaging. Previous solutions have sought to explicitly recover the 3D geometry (e.g., as point clouds) or voxel density (e.g., within a pre-defined volume) of the hidden scene. In contrast, inspired by the recent Neural Radiance Field (NeRF) approach, we use a multi-layer perceptron (MLP) to represent the neural transient field or NeTF. However, NeTF measures the transient over spherical wavefronts rather than the radiance along lines. We therefore formulate a spherical volume NeTF reconstruction pipeline, applicable to both confocal and non-confocal setups. Compared with NeRF, NeTF samples a much sparser set of viewpoints (scanning spots) and the sampling is highly uneven. We thus introduce a Monte Carlo technique to improve the robustness in the reconstruction. Comprehensive experiments on synthetic and real datasets demonstrate NeTF provides higher quality reconstruction and preserves fine details largely missing in the state-of-the-art. △ Less

Submitted 13 September, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

arXiv:2010.04611 [pdf, other]

Hyperspectral Unmixing via Nonnegative Matrix Factorization with Handcrafted and Learnt Priors

Authors: Min Zhao, Tiande Gao, Jie Chen, Wei Chen

Abstract: Nowadays, nonnegative matrix factorization (NMF) based methods have been widely applied to blind spectral unmixing. Introducing proper regularizers to NMF is crucial for mathematically constraining the solutions and physically exploiting spectral and spatial properties of images. Generally, properly handcrafting regularizers and solving the associated complex optimization problem are non-trivial t… ▽ More Nowadays, nonnegative matrix factorization (NMF) based methods have been widely applied to blind spectral unmixing. Introducing proper regularizers to NMF is crucial for mathematically constraining the solutions and physically exploiting spectral and spatial properties of images. Generally, properly handcrafting regularizers and solving the associated complex optimization problem are non-trivial tasks. In our work, we propose an NMF based unmixing framework which jointly uses a handcrafting regularizer and a learnt regularizer from data. we plug learnt priors of abundances where the associated subproblem can be addressed using various image denoisers, and we consider an l_2,1-norm regularizer to the abundance matrix to promote sparse unmixing results. The proposed framework is flexible and extendable. Both synthetic data and real airborne data are conducted to confirm the effectiveness of our method. △ Less

Submitted 9 October, 2020; originally announced October 2020.

arXiv:2001.05685 [pdf, other]

SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis

Authors: Bohan Zhai, Tianren Gao, Flora Xue, Daniel Rothchild, Bichen Wu, Joseph E. Gonzalez, Kurt Keutzer

Abstract: Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into an audio waveform. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. WaveGl… ▽ More Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into an audio waveform. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. WaveGlow is a flow-based feed-forward alternative to these auto-regressive models (Prenger et al., 2019). However, while WaveGlow can be easily parallelized, the model is too expensive for real-time speech synthesis on the edge. This paper presents SqueezeWave, a family of lightweight vocoders based on WaveGlow that can generate audio of similar quality to WaveGlow with 61x - 214x fewer MACs. Code, trained models, and generated audio are publicly available at https://github.com/tianrengao/SqueezeWave. △ Less

Submitted 16 January, 2020; originally announced January 2020.

arXiv:1906.01082 [pdf, other]

Representation Theoretic Patterns in Multi-Frequency Class Averaging for Three-Dimensional Cryo-Electron Microscopy

Authors: Yifeng Fan, Tingran Gao, Zhizhen Zhao

Abstract: We develop in this paper a novel intrinsic classification algorithm -- multi-frequency class averaging (MFCA) -- for classifying noisy projection images obtained from three-dimensional cryo-electron microscopy (cryo-EM) by the similarity among their viewing directions. This new algorithm leverages multiple irreducible representations of the unitary group to introduce additional redundancy into the… ▽ More We develop in this paper a novel intrinsic classification algorithm -- multi-frequency class averaging (MFCA) -- for classifying noisy projection images obtained from three-dimensional cryo-electron microscopy (cryo-EM) by the similarity among their viewing directions. This new algorithm leverages multiple irreducible representations of the unitary group to introduce additional redundancy into the representation of the optimal in-plane rotational alignment, extending and outperforming the existing class averaging algorithm that uses only a single representation. The formal algebraic model and representation theoretic patterns of the proposed MFCA algorithm extend the framework of Hadani and Singer to arbitrary irreducible representations of the unitary group. We conceptually establish the consistency and stability of MFCA by inspecting the spectral properties of a generalized local parallel transport operator through the lens of Wigner $D$-matrices. We demonstrate the efficacy of the proposed algorithm with numerical experiments. △ Less

Submitted 5 July, 2021; v1 submitted 31 May, 2019; originally announced June 2019.

Comments: 38 pages, 17 figures

MSC Class: 20G05; 33C45; 33C55; 55R25 ACM Class: I.4.5; I.4.10

arXiv:1901.07951 [pdf, other]

Modeling and Simulation of UAV Carrier Landings

Authors: Gaurav Misra, Tianyu Gao, Xiaoli Bai

Abstract: With UAVs promising capabilities to increase operation flexibility and reduce mission cost, we are exploiting the automated carrier-landing performance advancement that can be achieved by fixed-wing UAVs. To demonstrate such potentials, in this paper, we investigate two key metrics, namely, flight path control performance, and reduced approach speeds for UAVs based on the F/A-18 High Angle of Atta… ▽ More With UAVs promising capabilities to increase operation flexibility and reduce mission cost, we are exploiting the automated carrier-landing performance advancement that can be achieved by fixed-wing UAVs. To demonstrate such potentials, in this paper, we investigate two key metrics, namely, flight path control performance, and reduced approach speeds for UAVs based on the F/A-18 High Angle of Attack (HARV) model. The landing control architecture consists of an auto-throttle, a stability augmentation system, glideslope and approach track controllers. The performance of the control model is tested using Monte Carlo simulations under a range of environmental uncertainties including atmospheric turbulence consisting of wind shear, discrete and continuous wind gusts, and carrier airwakes. Realistic deck motion is considered where the standard deck motion time histories under the Systematic Characterization of the Naval Environment (SCONE) program released by the Office of Naval Research (ONR) are used. We numerically demonstrate the limiting approach conditions which allow for successful carrier landings and factors affecting it's performance. △ Less

Submitted 23 January, 2019; originally announced January 2019.

Showing 1–19 of 19 results for author: Gao, T