Zum Hauptinhalt springen

Showing 1–45 of 45 results for author: Deng, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.06790  [pdf, other

    eess.SY

    Residual Deep Reinforcement Learning for Inverter-based Volt-Var Control

    Authors: Qiong Liu, Ye Guo, Lirong Deng, Haotian Liu, Dongyu Li, Hongbin Sun

    Abstract: A residual deep reinforcement learning (RDRL) approach is proposed by integrating DRL with model-based optimization for inverter-based volt-var control in active distribution networks when the accurate power flow model is unknown. RDRL learns a residual action with a reduced residual action space, based on the action of the model-based approach with an approximate model. RDRL inherits the control… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2210.07360

  2. arXiv:2407.02826  [pdf, other

    eess.AS

    SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech

    Authors: Jingru Lin, Meng Ge, Junyi Ao, Liqun Deng, Haizhou Li

    Abstract: It was shown that pre-trained models with self-supervised learning (SSL) techniques are effective in various downstream speech tasks. However, most such models are trained on single-speaker speech data, limiting their effectiveness in mixture speech. This motivates us to explore pre-training on mixture speech. This work presents SA-WavLM, a novel pre-trained model for mixture speech. Specifically,… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: InterSpeech 2024

  3. arXiv:2406.09676  [pdf, other

    eess.AS cs.CL

    Optimizing Byte-level Representation for End-to-end ASR

    Authors: Roger Hsiao, Liuhui Deng, Erik McDermott, Ruchir Travadi, Xiaodan Zhuang

    Abstract: We propose a novel approach to optimizing a byte-level representation for end-to-end automatic speech recognition (ASR). Byte-level representation is often used by large scale multilingual ASR systems when the character set of the supported languages is large. The compactness and universality of byte-level representation allow the ASR models to use smaller output vocabularies and therefore, provid… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 5 pages, 1 figure

  4. arXiv:2406.03875  [pdf, other

    eess.SY

    Energy-storing analysis and fishtail stiffness optimization for a wire-driven elastic robotic fish

    Authors: Xiaocun Liao, Chao Zhou, Junfeng Fan, Zhuoliang Zhang, Zhaoran Yin, Liangwei Deng

    Abstract: The robotic fish with high propulsion efficiency and good maneuverability achieves underwater fishlike propulsion by commonly adopting the motor to drive the fishtail, causing the significant fluctuations of the motor power due to the uneven swing speed of the fishtail in one swing cycle. Hence, we propose a wire-driven robotic fish with a spring-steel-based active-segment elastic spine. This bion… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 14 pages, 19 figures

  5. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  6. arXiv:2404.11537  [pdf, other

    cs.CV eess.IV

    SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening

    Authors: Yu Zhong, Xiao Wu, Liang-Jian Deng, Zihan Cao

    Abstract: Pansharpening is a significant image fusion technique that merges the spatial content and spectral characteristics of remote sensing images to generate high-resolution multispectral images. Recently, denoising diffusion probabilistic models have been gradually applied to visual tasks, enhancing controllable image generation through low-rank adaptation (LoRA). In this paper, we introduce a spatial-… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  7. arXiv:2404.07932  [pdf, other

    cs.CV eess.IV

    FusionMamba: Efficient Image Fusion with State Space Model

    Authors: Siran Peng, Xiangyu Zhu, Haoyu Deng, Zhen Lei, Liang-Jian Deng

    Abstract: Image fusion aims to generate a high-resolution multi/hyper-spectral image by combining a high-resolution image with limited spectral information and a low-resolution image with abundant spectral data. Current deep learning (DL)-based methods for image fusion primarily rely on CNNs or Transformers to extract features and merge different types of data. While CNNs are efficient, their receptive fiel… ▽ More

    Submitted 10 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  8. arXiv:2404.07543  [pdf, other

    cs.CV eess.IV

    Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening

    Authors: Yule Duan, Xiao Wu, Haoyu Deng, Liang-Jian Deng

    Abstract: Currently, machine learning-based methods for remote sensing pansharpening have progressed rapidly. However, existing pansharpening methods often do not fully exploit differentiating regional information in non-local spaces, thereby limiting the effectiveness of the methods and resulting in redundant learning parameters. In this paper, we introduce a so-called content-adaptive non-local convolutio… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  9. arXiv:2404.01121  [pdf, other

    cs.CV eess.IV

    CMT: Cross Modulation Transformer with Hybrid Loss for Pansharpening

    Authors: Wen-Jie Shu, Hong-Xia Dou, Rui Wen, Xiao Wu, Liang-Jian Deng

    Abstract: Pansharpening aims to enhance remote sensing image (RSI) quality by merging high-resolution panchromatic (PAN) with multispectral (MS) images. However, prior techniques struggled to optimally fuse PAN and MS images for enhanced spatial and spectral information, due to a lack of a systematic framework capable of effectively coordinating their individual strengths. In response, we present the Cross… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  10. arXiv:2402.08934  [pdf, other

    eess.IV cs.CV

    Extreme Video Compression with Pre-trained Diffusion Models

    Authors: Bohan Li, Yiming Liu, Xueyan Niu, Bo Bai, Lei Deng, Deniz Gündüz

    Abstract: Diffusion models have achieved remarkable success in generating high quality image and video data. More recently, they have also been used for image compression with high perceptual quality. In this paper, we present a novel approach to extreme video compression leveraging the predictive power of diffusion-based generative models at the decoder. The conditional diffusion model takes several neural… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  11. arXiv:2312.06197  [pdf, other

    cs.SD cs.MM eess.AS

    MART: Learning Hierarchical Music Audio Representations with Part-Whole Transformer

    Authors: Dong Yao, Jieming Zhu, Jiahao Xun, Shengyu Zhang, Zhou Zhao, Liqun Deng, Wenqiao Zhang, Zhenhua Dong, Xin Jiang

    Abstract: Recent research in self-supervised contrastive learning of music representations has demonstrated remarkable results across diverse downstream tasks. However, a prevailing trend in existing methods involves representing equally-sized music clips in either waveform or spectrogram formats, often overlooking the intrinsic part-whole hierarchies within music. In our quest to comprehend the bottom-up s… ▽ More

    Submitted 19 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Short paper accepted by WWW 2024. This is revised and condensed based on the previous version titled "Music-PAW: Learning Music Representations via Hierarchical Part-whole Interaction and Contrast". For more experimental details and discussions, please refer to the original long paper at arXiv:2312.06197v1

  12. arXiv:2310.14823  [pdf, other

    eess.AS eess.SP

    Prompt-driven Target Speech Diarization

    Authors: Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, Haizhou Li

    Abstract: We introduce a novel task named `target speech diarization', which seeks to determine `when target event occurred' within an audio signal. We devise a neural architecture called Prompt-driven Target Speech Diarization (PTSD), that works with diverse prompts that specify the target speech events of interest. We train and evaluate PTSD using sim2spk, sim3spk and sim4spk datasets, which are derived f… ▽ More

    Submitted 8 January, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted by ICASSP 2024

  13. arXiv:2309.15889  [pdf, other

    eess.IV cs.CV cs.IT cs.LG cs.MM

    High Perceptual Quality Wireless Image Delivery with Denoising Diffusion Models

    Authors: Selim F. Yilmaz, Xueyan Niu, Bo Bai, Wei Han, Lei Deng, Deniz Gunduz

    Abstract: We consider the image transmission problem over a noisy wireless channel via deep learning-based joint source-channel coding (DeepJSCC) along with a denoising diffusion probabilistic model (DDPM) at the receiver. Specifically, we are interested in the perception-distortion trade-off in the practical finite block length regime, in which separate source and channel coding can be highly suboptimal. W… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: 6 pages, 4 figures

  14. arXiv:2309.02835  [pdf

    physics.optics eess.IV

    A flexible and accurate total variation and cascaded denoisers-based image reconstruction algorithm for hyperspectrally compressed ultrafast photography

    Authors: Zihan Guo, Jiali Yao, Dalong Qi, Pengpeng Ding, Chengzhi Jin, Ning Xu, Zhiling Zhang, Yunhua Yao, Lianzhong Deng, Zhiyong Wang, Zhenrong Sun, Shian Zhang

    Abstract: Hyperspectrally compressed ultrafast photography (HCUP) based on compressed sensing and the time- and spectrum-to-space mappings can simultaneously realize the temporal and spectral imaging of non-repeatable or difficult-to-repeat transient events passively in a single exposure. It possesses an incredibly high frame rate of tens of trillions of frames per second and a sequence depth of several hun… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: 25 pages, 5 figures and 1 table

  15. arXiv:2307.09775  [pdf, other

    cs.IR cs.SD eess.AS

    DisCover: Disentangled Music Representation Learning for Cover Song Identification

    Authors: Jiahao Xun, Shengyu Zhang, Yanting Yang, Jieming Zhu, Liqun Deng, Zhou Zhao, Zhenhua Dong, Ruiqi Li, Lichao Zhang, Fei Wu

    Abstract: In the field of music information retrieval (MIR), cover song identification (CSI) is a challenging task that aims to identify cover versions of a query song from a massive collection. Existing works still suffer from high intra-song variances and inter-song correlations, due to the entangled nature of version-specific and version-invariant factors in their modeling. In this work, we set the goal… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

  16. Game Theory and Coverage Optimization Based Multihop Routing Protocol for Network Lifetime in Wireless Sensor Networks

    Authors: Yindi Yao, Xiong Li, Yanpeng Cui, Lang Deng, Chen Wang

    Abstract: Wireless sensor networks (WSNs) are self-organizing monitoring networks with a large number of randomly deployed microsensor nodes to collect various physical information to realize tasks such as intelligent perception, efficient control, and decision-making. However, WSN nodes are powered by batteries, so they will run out of energy after a certain time. This energy limitation will greatly constr… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

    Comments: 14 pages, 13 figure, 3 tables

    Journal ref: in IEEE Sensors Journal, vol. 22, no. 13, pp. 13739-13752, July, 2022

  17. arXiv:2306.02541  [pdf, other

    eess.AS

    OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition

    Authors: Li Fu, Siqi Li, Qingtao Li, Fangzhu Li, Liping Deng, Lu Fan, Meng Chen, Youzheng Wu, Xiaodong He

    Abstract: Self-Supervised Learning (SSL) Automatic Speech Recognition (ASR) models have shown great promise over Supervised Learning (SL) ones in low-resource settings. However, the advantages of SSL are gradually weakened when the amount of labeled data increases in many industrial applications. To further improve the ASR performance when abundant labels are available, we first explore the potential of com… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: Accepted by Interspeech 2023

  18. arXiv:2305.13652  [pdf, ps, other

    cs.CL eess.AS

    Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for Low-Resource Speech Recognition with Transducers

    Authors: Jan Silovsky, Liuhui Deng, Arturo Argueta, Tresi Arvizo, Roger Hsiao, Sasha Kuznietsov, Yiu-Chang Lin, Xiaoqiang Xiao, Yuanyuan Zhang

    Abstract: Voice technology has become ubiquitous recently. However, the accuracy, and hence experience, in different languages varies significantly, which makes the technology not equally inclusive. The availability of data for different languages is one of the key factors affecting accuracy, especially in training of all-neural end-to-end automatic speech recognition systems. Cross-lingual knowledge tran… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  19. arXiv:2304.04774  [pdf, other

    cs.CV cs.AI eess.IV

    DDRF: Denoising Diffusion Model for Remote Sensing Image Fusion

    Authors: ZiHan Cao, ShiQi Cao, Xiao Wu, JunMing Hou, Ran Ran, Liang-Jian Deng

    Abstract: Denosing diffusion model, as a generative model, has received a lot of attention in the field of image generation recently, thanks to its powerful generation capability. However, diffusion models have not yet received sufficient research in the field of image fusion. In this article, we introduce diffusion model to the image fusion field, treating the image fusion task as image-to-image translatio… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

  20. U2Net: A General Framework with Spatial-Spectral-Integrated Double U-Net for Image Fusion

    Authors: Siran Peng, Chenhao Guo, Xiao Wu, Liang-Jian Deng

    Abstract: In image fusion tasks, images obtained from different sources exhibit distinct properties. Consequently, treating them uniformly with a single-branch network can lead to inadequate feature extraction. Additionally, numerous works have demonstrated that multi-scaled networks capture information more sufficiently than single-scaled models in pixel-level computer vision problems. Considering these fa… ▽ More

    Submitted 2 October, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: Accepted by the 31st ACM International Conference on Multimedia (ACM MM '23)

  21. arXiv:2210.14515  [pdf, other

    eess.AS cs.SD

    UFO2: A unified pre-training framework for online and offline speech recognition

    Authors: Li Fu, Siqi Li, Qingtao Li, Liping Deng, Fangzhu Li, Lu Fan, Meng Chen, Xiaodong He

    Abstract: In this paper, we propose a Unified pre-training Framework for Online and Offline (UFO2) Automatic Speech Recognition (ASR), which 1) simplifies the two separate training workflows for online and offline modes into one process, and 2) improves the Word Error Rate (WER) performance with limited utterance annotating. Specifically, we extend the conventional offline-mode Self-Supervised Learning (SSL… ▽ More

    Submitted 3 April, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted by ICASSP 2023

  22. arXiv:2210.12214  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Optimizing Bilingual Neural Transducer with Synthetic Code-switching Text Generation

    Authors: Thien Nguyen, Nathalie Tran, Liuhui Deng, Thiago Fraga da Silva, Matthew Radzihovsky, Roger Hsiao, Henry Mason, Stefan Braun, Erik McDermott, Dogan Can, Pawel Swietojanski, Lyan Verwimp, Sibel Oyman, Tresi Arvizo, Honza Silovsky, Arnab Ghoshal, Mathieu Martel, Bharat Ram Ambati, Mohamed Ali

    Abstract: Code-switching describes the practice of using more than one language in the same sentence. In this study, we investigate how to optimize a neural transducer based bilingual automatic speech recognition (ASR) model for code-switching speech. Focusing on the scenario where the ASR model is trained without supervised code-switching data, we found that semi-supervised training and synthetic code-swit… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: 5 pages, 1 figure, submitted to ICASSP 2023, *: equal contributions

  23. arXiv:2210.07360  [pdf, other

    eess.SY cs.AI cs.LG

    Reducing Action Space: Reference-Model-Assisted Deep Reinforcement Learning for Inverter-based Volt-Var Control

    Authors: Qiong Liu, Ye Guo, Lirong Deng, Haotian Liu, Dongyu Li, Hongbin Sun

    Abstract: Reference-model-assisted deep reinforcement learning (DRL) for inverter-based Volt-Var Control (IB-VVC) in active distribution networks is proposed. We investigate that a large action space increases the learning difficulties of DRL and degrades the optimization performance in the process of generating data and training neural networks. To reduce the action space of DRL, we design a reference-mode… ▽ More

    Submitted 9 October, 2022; originally announced October 2022.

    Comments: 10 pages, 9 figures

  24. arXiv:2205.00485  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Bilingual End-to-End ASR with Byte-Level Subwords

    Authors: Liuhui Deng, Roger Hsiao, Arnab Ghoshal

    Abstract: In this paper, we investigate how the output representation of an end-to-end neural network affects multilingual automatic speech recognition (ASR). We study different representations including character-level, byte-level, byte pair encoding (BPE), and byte-level byte pair encoding (BBPE) representations, and analyze their strengths and weaknesses. We focus on developing a single end-to-end model… ▽ More

    Submitted 1 May, 2022; originally announced May 2022.

    Comments: 5 pages, to be published in IEEE ICASSP 2022

  25. arXiv:2204.05460  [pdf, other

    eess.AS cs.CL cs.SD

    CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction

    Authors: Daxin Tan, Liqun Deng, Nianzu Zheng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee

    Abstract: This study propose a fully automated system for speech correction and accent reduction. Consider the application scenario that a recorded speech audio contains certain errors, e.g., inappropriate words, mispronunciations, that need to be corrected. The proposed system, named CorrectSpeech, performs the correction in three steps: recognizing the recorded speech and converting it into time-stamped s… ▽ More

    Submitted 13 October, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted by ISCSLP 2022

  26. arXiv:2203.04402  [pdf

    eess.SP physics.comp-ph

    High Noise Immune Time-domain Inversion via Cascade Network (TICaN) for Complex Scatterers

    Authors: Hongyu Gao, Yinpeng Wang, Qiang Ren, Zixi Wang, Liangcheng Deng, Chenyu Shi

    Abstract: In this paper, a high noise immune time-domain inversion cascade network (TICaN) is proposed to reconstruct scatterers from the measured electromagnetic fields. The TICaN is comprised of a denoising block aiming at improving the signal-to-noise ratio, and an inversion block to reconstruct the electromagnetic properties from the raw time-domain measurements. The scatterers investigated in this stud… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

    Comments: 9 pages, 11 figures

  27. arXiv:2201.12155  [pdf, other

    cs.CL cs.SD eess.AS

    Reducing language context confusion for end-to-end code-switching automatic speech recognition

    Authors: Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Jianhua Tao, Yu Ting Yeung, Liqun Deng

    Abstract: Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) automatic speech recognition (ASR) systems for code-switching is especially challenging as code-switching training data are always insufficient to combat the increased multilingual context confusion due to the presence of more than one language. We propose a language-related attention mechanism to r… ▽ More

    Submitted 29 June, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: arXiv admin note: text overlap with arXiv:2010.14798,the paper has been accepted by Insterspeech 2022

  28. arXiv:2112.02237  [pdf, other

    cs.CV eess.IV

    A Triple-Double Convolutional Neural Network for Panchromatic Sharpening

    Authors: Tian-Jing Zhang, Liang-Jian Deng, Ting-Zhu Huang, Jocelyn Chanussot, Gemine Vivone

    Abstract: Pansharpening refers to the fusion of a panchromatic image with a high spatial resolution and a multispectral image with a low spatial resolution, aiming to obtain a high spatial resolution multispectral image. In this paper, we propose a novel deep neural network architecture with level-domain based loss function for pansharpening by taking into account the following double-type structures, \emph… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

  29. arXiv:2111.08191  [pdf, other

    cs.CL cs.SD eess.AS

    CoCA-MDD: A Coupled Cross-Attention based Framework for Streaming Mispronunciation Detection and Diagnosis

    Authors: Nianzu Zheng, Liqun Deng, Wenyong Huang, Yu Ting Yeung, Baohua Xu, Yuanyuan Guo, Yasheng Wang, Xiao Chen, Xin Jiang, Qun Liu

    Abstract: Mispronunciation detection and diagnosis (MDD) is a popular research focus in computer-aided pronunciation training (CAPT) systems. End-to-end (e2e) approaches are becoming dominant in MDD. However an e2e MDD model usually requires entire speech utterances as input context, which leads to significant time latency especially for long paragraphs. We propose a streaming e2e MDD model called CoCA-MDD.… ▽ More

    Submitted 29 June, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: 5 pages, 4 figures, Accepted by INTERSPEECH 2022

  30. arXiv:2108.03176  [pdf, ps, other

    eess.SY

    Dynamic Control for Random Access in Deadline-Constrained Broadcasting

    Authors: Aoyu Gong, Lei Deng, Fang Liu, Yijin Zhang

    Abstract: This paper considers random access in deadline-constrained broadcasting with frame-synchronized traffic. To enhance the maximum achievable timely delivery ratio (TDR), we define a dynamic control scheme that allows each active node to determine the transmission probability with certainty based on the current delivery urgency and the knowledge of current contention intensity. For an idealized envir… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

  31. arXiv:2107.11617  [pdf, other

    cs.CV eess.IV

    LAConv: Local Adaptive Convolution for Image Fusion

    Authors: Zi-Rong Jin, Liang-Jian Deng, Tai-Xiang Jiang, Tian-Jing Zhang

    Abstract: The convolution operation is a powerful tool for feature extraction and plays a prominent role in the field of computer vision. However, when targeting the pixel-wise tasks like image fusion, it would not fully perceive the particularity of each pixel in the image if the uniform convolution kernel is used on different patches. In this paper, we propose a local adaptive convolution (LAConv), which… ▽ More

    Submitted 24 July, 2021; originally announced July 2021.

  32. arXiv:2107.01554  [pdf, other

    eess.AS cs.SD

    EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion

    Authors: Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee

    Abstract: This paper presents the design, implementation and evaluation of a speech editing system, named EditSpeech, which allows a user to perform deletion, insertion and replacement of words in a given speech utterance, without causing audible degradation in speech quality and naturalness. The EditSpeech system is developed upon a neural text-to-speech (NTTS) synthesis framework. Partial inference and bi… ▽ More

    Submitted 7 October, 2021; v1 submitted 4 July, 2021; originally announced July 2021.

    Comments: Accepted by ASRU 2021

  33. arXiv:2106.10132  [pdf, other

    eess.AS cs.CL cs.MM cs.SD eess.SP

    VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

    Authors: Disong Wang, Liqun Deng, Yu Ting Yeung, Xiao Chen, Xunying Liu, Helen Meng

    Abstract: One-shot voice conversion (VC), which performs conversion across arbitrary speakers with only a single target-speaker utterance for reference, can be effectively achieved by speech representation disentanglement. Existing work generally ignores the correlation between different speech representations during training, which causes leakage of content information into the speaker representation and t… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

    Comments: Accepted to Interspeech 2021. Code, pre-trained models and demo are available at https://github.com/Wendison/VQMIVC

  34. arXiv:2106.10127  [pdf, other

    eess.AS cs.CL cs.SD eess.SP

    Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization

    Authors: Disong Wang, Liqun Deng, Yu Ting Yeung, Xiao Chen, Xunying Liu, Helen Meng

    Abstract: Dysarthric speech detection (DSD) systems aim to detect characteristics of the neuromotor disorder from speech. Such systems are particularly susceptible to domain mismatch where the training and testing data come from the source and target domains respectively, but the two domains may differ in terms of speech stimuli, disease etiology, etc. It is hard to acquire labelled data in the target domai… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

    Comments: Accepted to Interspeech 2021

  35. arXiv:2011.13657  [pdf

    eess.SY

    Community Energy Storage Management for Welfare Optimization Using a Markov Decision Process

    Authors: Lirong Deng, Xuan Zhang, Tianshu Yang, Hongbin Sun, Shmuel S. Oren

    Abstract: In this paper, we address an optimal management problem of community energy storage in the real-time electricity market under a stochastic renewable environment. In a real-time electricity market, complete market information may not be assessable for a strategic participant, hence we propose a paradigm that uses partial information including the forecast of real-time prices and slopes of the aggre… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

  36. arXiv:2011.13652  [pdf

    math.OC eess.SY

    Optimal Planning of Integrated Heat and Electricity Systems: a Tightening McCormick Approach

    Authors: Lirong Deng, Xuan Zhang, Tianshu Yang, Hongbin Sun

    Abstract: In this paper, we propose a convex planning model of integrated heat and electricity systems considering variable mass flow rates. The main challenge comes from the non-convexity of the bilinear terms in the district heating network, i.e., the product of mass flow rate and nodal temperature. To resolve this issue, we first reformulate the district heating network model through equivalent transform… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

  37. arXiv:2007.04289  [pdf

    eess.SY

    A Quadratic Convex Approximation of Optimal Power Flow in Distribution System with Application in Loss Allocation

    Authors: Tianshu Yang, Ye Guo, Lirong Deng, Hongbin Sun, Wenchuan Wu

    Abstract: In this paper, a novel quadratic convex optimal power flow model, namely, MDOPF, is proposed to determine the optimal dispatches of distributed generators. Based on the results of MDOPF, two price mechanisms, distribution locational marginal price (DLMP) and distribution locational price (DLP), are analyzed. For DLMP, an explicit method is developed to calculate the marginal loss that does not req… ▽ More

    Submitted 4 September, 2020; v1 submitted 8 July, 2020; originally announced July 2020.

  38. A Linear Branch Flow Model for Radial Distribution Networks and its Application to Reactive Power Optimization and Network Reconfiguration

    Authors: Tianshu Yang, Ye Guo, Lirong Deng, Hongbin Sun, Wenchuan Wu

    Abstract: This paper presents a cold-start linear branch flow model named modified DistFlow. In modified DistFlow, the active and reactive power are replaced by their ratios to voltage magnitude as state variables, so that errors introduced by conventional branch flow linearization approaches due to their complete ignoring of the quadratic term are reduced. Based on the path-branch incidence matrix, branch… ▽ More

    Submitted 22 November, 2020; v1 submitted 4 July, 2020; originally announced July 2020.

  39. arXiv:2005.14400  [pdf, other

    eess.IV cs.CV

    Hyperspectral Image Super-resolution via Deep Spatio-spectral Convolutional Neural Networks

    Authors: Jin-Fan Hu, Ting-Zhu Huang, Liang-Jian Deng, Tai-Xiang Jiang, Gemine Vivone, Jocelyn Chanussot

    Abstract: Hyperspectral images are of crucial importance in order to better understand features of different materials. To reach this goal, they leverage on a high number of spectral bands. However, this interesting characteristic is often paid by a reduced spatial resolution compared with traditional multispectral image systems. In order to alleviate this issue, in this work, we propose a simple and effici… ▽ More

    Submitted 29 May, 2020; originally announced May 2020.

  40. arXiv:2005.02183  [pdf, other

    cs.CV cs.NE eess.IV

    Comparing SNNs and RNNs on Neuromorphic Vision Datasets: Similarities and Differences

    Authors: Weihua He, YuJie Wu, Lei Deng, Guoqi Li, Haoyu Wang, Yang Tian, Wei Ding, Wenhui Wang, Yuan Xie

    Abstract: Neuromorphic data, recording frameless spike events, have attracted considerable attention for the spatiotemporal information components and the event-driven processing fashion. Spiking neural networks (SNNs) represent a family of event-driven models with spatiotemporal dynamics for neuromorphic computing, which are widely benchmarked on neuromorphic data. Interestingly, researchers in the machine… ▽ More

    Submitted 2 May, 2020; originally announced May 2020.

  41. arXiv:2001.01587  [pdf, other

    cs.NE cs.CR cs.LG eess.SP

    Exploring Adversarial Attack in Spiking Neural Networks with Spike-Compatible Gradient

    Authors: Ling Liang, Xing Hu, Lei Deng, Yujie Wu, Guoqi Li, Yufei Ding, Peng Li, Yuan Xie

    Abstract: Recently, backpropagation through time inspired learning algorithms are widely introduced into SNNs to improve the performance, which brings the possibility to attack the models accurately given Spatio-temporal gradient maps. We propose two approaches to address the challenges of gradient input incompatibility and gradient vanishing. Specifically, we design a gradient to spike converter to convert… ▽ More

    Submitted 30 September, 2020; v1 submitted 1 January, 2020; originally announced January 2020.

  42. arXiv:1912.12419  [pdf, other

    eess.IV cs.LG stat.ML

    Transfer Learning in General Lensless Imaging through Scattering Media

    Authors: Yukuan Yang, Lei Deng, Peng Jiao, Yansong Chua, Jing Pei, Cheng Ma, Guoqi Li

    Abstract: Recently deep neural networks (DNNs) have been successfully introduced to the field of lensless imaging through scattering media. By solving an inverse problem in computational imaging, DNNs can overcome several shortcomings in the conventional lensless imaging through scattering media methods, namely, high cost, poor quality, complex control, and poor anti-interference. However, for training, a l… ▽ More

    Submitted 28 December, 2019; originally announced December 2019.

  43. arXiv:1911.00822  [pdf, other

    cs.NE cs.LG eess.SP

    Comprehensive SNN Compression Using ADMM Optimization and Activity Regularization

    Authors: Lei Deng, Yujie Wu, Yifan Hu, Ling Liang, Guoqi Li, Xing Hu, Yufei Ding, Peng Li, Yuan Xie

    Abstract: As well known, the huge memory and compute costs of both artificial neural networks (ANNs) and spiking neural networks (SNNs) greatly hinder their deployment on edge devices with high efficiency. Model compression has been proposed as a promising technique to improve the running efficiency via parameter and operation reduction. Whereas, this technique is mainly practiced in ANNs rather than SNNs.… ▽ More

    Submitted 20 August, 2020; v1 submitted 3 November, 2019; originally announced November 2019.

    Comments: Under review

  44. arXiv:1810.11390  [pdf, other

    eess.SP cs.IT

    Joint Estimation of DOA and Frequency with Sub-Nyquist Sampling in a Binary Array Radar System

    Authors: Zhan Zhang, Ping Wei, Lijuan Deng, Huaguo Zhang

    Abstract: Recently, several array radar structures combined with sub-Nyquist techniques and corresponding algorithms have been extensively studied. Carrier frequency and direction-of-arrival (DOA) estimations of multiple narrow-band signals received by array radars at the sub-Nyquist rates are considered in this paper. We propose a new sub-Nyquist array radar architecture (a binary array radar separately co… ▽ More

    Submitted 26 October, 2018; originally announced October 2018.

    Comments: 6 pages, 2 figures, conference

  45. arXiv:1509.03044  [pdf, other

    cs.LG cs.AI eess.SY

    Recurrent Reinforcement Learning: A Hybrid Approach

    Authors: Xiujun Li, Lihong Li, Jianfeng Gao, Xiaodong He, Jianshu Chen, Li Deng, Ji He

    Abstract: Successful applications of reinforcement learning in real-world problems often require dealing with partially observable states. It is in general very challenging to construct and infer hidden states as they often depend on the agent's entire interaction history and may require substantial domain knowledge. In this work, we investigate a deep-learning approach to learning the representation of sta… ▽ More

    Submitted 19 November, 2015; v1 submitted 10 September, 2015; originally announced September 2015.

    Comments: 11 pages, 6 figures