Zum Hauptinhalt springen

Showing 1–50 of 106 results for author: Cheng, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.08593  [pdf, other

    cs.LG eess.SY

    RadioDiff: An Effective Generative Diffusion Model for Sampling-Free Dynamic Radio Map Construction

    Authors: Xiucheng Wang, Keda Tao, Nan Cheng, Zhisheng Yin, Zan Li, Yuan Zhang, Xuemin Shen

    Abstract: Radio map (RM) is a promising technology that can obtain pathloss based on only location, which is significant for 6G network applications to reduce the communication costs for pathloss estimation. However, the construction of RM in traditional is either computationally intensive or depends on costly sampling-based pathloss measurements. Although the neural network (NN)-based method can efficientl… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  2. arXiv:2407.13123  [pdf, other

    cs.LG cs.DC cs.NI eess.SP

    Reconfigurable Intelligent Surface Aided Vehicular Edge Computing: Joint Phase-shift Optimization and Multi-User Power Allocation

    Authors: Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief

    Abstract: Vehicular edge computing (VEC) is an emerging technology with significant potential in the field of internet of vehicles (IoV), enabling vehicles to perform intensive computational tasks locally or offload them to nearby edge devices. However, the quality of communication links may be severely deteriorated due to obstacles such as buildings, impeding the offloading process. To address this challen… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at https://github.com/qiongwu86/DDPG-RIS-MADDPG-POWER. arXiv admin note: text overlap with arXiv:2406.11318

  3. arXiv:2407.11875  [pdf, ps, other

    eess.SP

    Cramer-Rao Bound Minimization for Movable Antenna-Assisted Multiuser Integrated Sensing and Communications

    Authors: Haoran Qin, Wen Chen, Qingqing Wu, Ziheng Zhang, Zhendong Li, Nan Cheng

    Abstract: This paper investigates a movable antenna (MA)-assisted multiuser integrated sensing and communication (ISAC) system, where the base station (BS) and communication users are all equipped with MA for improving both the sensing and communication performance. We employ the Cramer-Rao bound (CRB) as the performance metric of sensing, thus a joint beamforming design and MAs' position optimizing problem… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  4. arXiv:2407.06767  [pdf, other

    cs.IT eess.SP

    Enhancing Robustness and Security in ISAC Network Design: Leveraging Transmissive Reconfigurable Intelligent Surface with RSMA

    Authors: Ziwei Liu, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, Qiong Wu, Nan Cheng

    Abstract: In this paper, we propose a novel transmissive reconfigurable intelligent surface transceiver-enhanced robust and secure integrated sensing and communication network. A time-division sensing communication mechanism is designed for the scenario, which enables communication and sensing to share wireless resources. To address the interference management problem and hinder eavesdropping, we implement… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  5. arXiv:2407.05331  [pdf, ps, other

    eess.SY

    Channel Characterization of IRS-assisted Resonant Beam Communication Systems

    Authors: Wen Fang, Wen Chen, Qingqing Wu, Xusheng Zhu, Qiong Wu, Nan Cheng

    Abstract: To meet the growing demand for data traffic, spectrum-rich optical wireless communication (OWC) has emerged as a key technological driver for the development of 6G. The resonant beam communication (RBC) system, which employs spatially separated laser cavities as the transmitter and receiver, is a high-speed OWC technology capable of self-alignment without tracking. However, its transmission throug… ▽ More

    Submitted 15 August, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

  6. arXiv:2407.03668  [pdf, other

    cs.LG eess.SY

    Reliable Projection Based Unsupervised Learning for Semi-Definite QCQP with Application of Beamforming Optimization

    Authors: Xiucheng Wang, Qi Qiu, Nan Cheng

    Abstract: In this paper, we investigate a special class of quadratic-constrained quadratic programming (QCQP) with semi-definite constraints. Traditionally, since such a problem is non-convex and N-hard, the neural network (NN) is regarded as a promising method to obtain a high-performing solution. However, due to the inherent prediction error, it is challenging to ensure all solution output by the NN is fe… ▽ More

    Submitted 9 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  7. arXiv:2406.13145  [pdf, other

    eess.SY cs.LG

    Constructing and Evaluating Digital Twins: An Intelligent Framework for DT Development

    Authors: Longfei Ma, Nan Cheng, Xiucheng Wang, Jiong Chen, Yinjun Gao, Dongxiao Zhang, Jun-Jie Zhang

    Abstract: The development of Digital Twins (DTs) represents a transformative advance for simulating and optimizing complex systems in a controlled digital space. Despite their potential, the challenge of constructing DTs that accurately replicate and predict the dynamics of real-world systems remains substantial. This paper introduces an intelligent framework for the construction and evaluation of DTs, spec… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.11318  [pdf, other

    cs.MA cs.DC cs.LG cs.NI eess.SP

    Reconfigurable Intelligent Surface Assisted VEC Based on Multi-Agent Reinforcement Learning

    Authors: Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Qiang Fan, Jiangzhou Wang

    Abstract: Vehicular edge computing (VEC) is an emerging technology that enables vehicles to perform high-intensity tasks by executing tasks locally or offloading them to nearby edge devices. However, obstacles such as buildings may degrade the communications and incur communication interruptions, and thus the vehicle may not meet the requirement for task offloading. Reconfigurable intelligent surfaces (RIS)… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at: https://github.com/qiongwu86/RIS-VEC-MARL.git

  9. arXiv:2406.11245  [pdf, other

    cs.LG cs.DC cs.NI eess.SP

    Deep-Reinforcement-Learning-Based AoI-Aware Resource Allocation for RIS-Aided IoV Networks

    Authors: Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief

    Abstract: Reconfigurable Intelligent Surface (RIS) is a pivotal technology in communication, offering an alternative path that significantly enhances the link quality in wireless communication environments. In this paper, we propose a RIS-assisted internet of vehicles (IoV) network, considering the vehicle-to-everything (V2X) communication method. In addition, in order to improve the timeliness of vehicle-t… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at https://github.com/qiongwu86/RIS-RB-AoI-V2X-DRL.git

  10. arXiv:2406.09846  [pdf, ps, other

    cs.IT eess.SP

    Multiple Intelligent Reflecting Surfaces Collaborative Wireless Localization System

    Authors: Ziheng Zhang, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, Jingfeng Chen, Nan Cheng

    Abstract: This paper studies a multiple intelligent reflecting surfaces (IRSs) collaborative localization system where multiple semi-passive IRSs are deployed in the network to locate one or more targets based on time-of-arrival. It is assumed that each semi-passive IRS is equipped with reflective elements and sensors, which are used to establish the line-of-sight links from the base station (BS) to multipl… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 13 pages, 8 figures

  11. arXiv:2406.08835  [pdf, other

    cs.SD eess.AS

    EffectiveASR: A Single-Step Non-Autoregressive Mandarin Speech Recognition Architecture with High Accuracy and Inference Speed

    Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Ming Fang, Tao Wei, Zijian Li, Ning Cheng, Wei Hu, Shaojun Wang, Jing Xiao

    Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. In this paper, we propose a single-step NAR ASR architecture with high accuracy and inference speed, called EffectiveASR. It uses an Index Mappin… ▽ More

    Submitted 28 August, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Submitted to ICASSP 2025

  12. arXiv:2406.07996  [pdf, other

    cs.NI eess.SP

    Semantic-Aware Resource Allocation Based on Deep Reinforcement Learning for 5G-V2X HetNets

    Authors: Zhiyu Shao, Qiong Wu, Pingyi Fan, Nan Cheng, Qiang Fan, Jiangzhou Wang

    Abstract: This letter proposes a semantic-aware resource allocation (SARA) framework with flexible duty cycle (DC) coexistence mechanism (SARADC) for 5G-V2X Heterogeneous Network (HetNets) based on deep reinforcement learning (DRL) proximal policy optimization (PPO). Specifically, we investigate V2X networks within a two-tiered HetNets structure. In response to the needs of high-speed vehicular networking i… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: This paper has been submitted to IEEE Letter.The source code has been released at: https://github.com/qiongwu86/Semantic-Aware-Resource-Allocation-Based-on-Deep-Reinforcement-Learning-for-5G-V2X-HetNets

  13. arXiv:2406.07857  [pdf, other

    eess.SY cs.LG cs.NI

    Toward Enhanced Reinforcement Learning-Based Resource Management via Digital Twin: Opportunities, Applications, and Challenges

    Authors: Nan Cheng, Xiucheng Wang, Zan Li, Zhisheng Yin, Tom Luan, Xuemin Shen

    Abstract: This article presents a digital twin (DT)-enhanced reinforcement learning (RL) framework aimed at optimizing performance and reliability in network resource management, since the traditional RL methods face several unified challenges when applied to physical networks, including limited exploration efficiency, slow convergence, poor long-term performance, and safety concerns during the exploration… ▽ More

    Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 7pages, 6figures

  14. arXiv:2406.06998  [pdf, other

    eess.SP

    Movable Antenna Enhanced NOMA Short-Packet Transmission

    Authors: Xinyuan He, Wen Chen, Qingqing Wu, Xusheng Zhu, Nan Cheng

    Abstract: This letter investigates a short-packet downlink transmission system using non-orthogonal multiple access (NOMA) enhanced via movable antenna (MA). We focuses on maximizing the effective throughput for a core user while ensuring reliable communication for an edge user by optimizing the MAs' coordinates and the power and rate allocations from the access point (AP). The optimization challenge is app… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, 4 figures

  15. arXiv:2405.18692  [pdf, other

    cs.IT eess.SP

    Movable Antenna Empowered Downlink NOMA Systems: Power Allocation and Antenna Position Optimization

    Authors: Yufeng Zhou, Wen Chen, Qingqing Wu, Xusheng Zhu, Nan Cheng

    Abstract: This paper investigates a novel communication paradigm employing movable antennas (MAs) within a multiple-input single-output (MISO) non-orthogonal multiple access (NOMA) downlink framework, where users are equipped with MAs. Initially, leveraging the far-field response, we delineate the channel characteristics concerning both the power allocation coefficient and positions of MAs. Subsequently, we… ▽ More

    Submitted 7 August, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  16. arXiv:2405.17028  [pdf, other

    cs.SD eess.AS

    RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis

    Authors: Haoxiang Shi, Jianzong Wang, Xulong Zhang, Ning Cheng, Jun Yu, Jing Xiao

    Abstract: Although current Text-To-Speech (TTS) models are able to generate high-quality speech samples, there are still challenges in developing emotion intensity controllable TTS. Most existing TTS models achieve emotion intensity control by extracting intensity information from reference speeches. Unfortunately, limited by the lack of modeling for intra-class emotion intensity and the model's information… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by the 8th APWeb-WAIM International Joint Conference on Web and Big Data

  17. arXiv:2405.00930  [pdf, other

    cs.SD eess.AS

    MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion

    Authors: Pengcheng Li, Jianzong Wang, Xulong Zhang, Yong Zhang, Jing Xiao, Ning Cheng

    Abstract: One-shot voice conversion aims to change the timbre of any source speech to match that of the unseen target speaker with only one speech sample. Existing methods face difficulties in satisfactory speech representation disentanglement and suffer from sizable networks as some of them leverage numerous complex modules for disentanglement. In this paper, we propose a model named MAIN-VC to effectively… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  18. arXiv:2405.00603  [pdf, other

    cs.SD eess.AS

    Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation

    Authors: Yimin Deng, Jianzong Wang, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: Voice conversion is the task to transform voice characteristics of source speech while preserving content information. Nowadays, self-supervised representation learning models are increasingly utilized in content extraction. However, in these representations, a lot of hidden speaker information leads to timbre leakage while the prosodic information of hidden units lacks use. To address these issue… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  19. arXiv:2404.19214  [pdf, other

    cs.SD eess.AS

    EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization

    Authors: Jianzong Wang, Ziqi Liang, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: In recent years, Transformer networks have shown remarkable performance in speech recognition tasks. However, their deployment poses challenges due to high computational and storage resource requirements. To address this issue, a lightweight model called EfficientASR is proposed in this paper, aiming to enhance the versatility of Transformer models. EfficientASR employs two primary modules: Shared… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  20. arXiv:2404.19212  [pdf, other

    cs.SD eess.AS

    EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning

    Authors: Ziqi Liang, Jianzong Wang, Xulong Zhang, Yong Zhang, Ning Cheng, Jing Xiao

    Abstract: Using unsupervised learning to disentangle speech into content, rhythm, pitch, and timbre for voice conversion has become a hot research topic. Existing works generally take into account disentangling speech components through human-crafted bottleneck features which can not achieve sufficient information disentangling, while pitch and rhythm may still be mixed together. There is a risk of informat… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  21. arXiv:2404.19187  [pdf, other

    cs.SD eess.AS

    CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition

    Authors: Jianzong Wang, Pengcheng Li, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: Singing voice beautifying is a novel task that has application value in people's daily life, aiming to correct the pitch of the singing voice and improve the expressiveness without changing the original timbre and content. Existing methods rely on paired data or only concentrate on the correction of pitch. However, professional songs and amateur songs from the same person are hard to obtain, and s… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  22. arXiv:2402.01665  [pdf, other

    cs.NI cs.LG eess.SP

    Knowledge-Driven Deep Learning Paradigms for Wireless Network Optimization in 6G

    Authors: Ruijin Sun, Nan Cheng, Changle Li, Fangjiong Chen, Wen Chen

    Abstract: In the sixth-generation (6G) networks, newly emerging diversified services of massive users in dynamic network environments are required to be satisfied by multi-dimensional heterogeneous resources. The resulting large-scale complicated network optimization problems are beyond the capability of model-based theoretical methods due to the overwhelming computational complexity and the long processing… ▽ More

    Submitted 15 January, 2024; originally announced February 2024.

    Comments: 9 pages, 5 figures

  23. arXiv:2401.08166  [pdf, other

    eess.AS cs.SD

    ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis

    Authors: Haobin Tang, Xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang

    Abstract: Existing emotional speech synthesis methods often utilize an utterance-level style embedding extracted from reference audio, neglecting the inherent multi-scale property of speech prosody. We introduce ED-TTS, a multi-scale emotional speech synthesis model that leverages Speech Emotion Diarization (SED) and Speech Emotion Recognition (SER) to model emotions at different levels. Specifically, our p… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted by 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2024)

  24. arXiv:2401.08096  [pdf, other

    cs.SD eess.AS

    Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval

    Authors: Yimin Deng, Huaizhen Tang, Xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang

    Abstract: Voice conversion refers to transferring speaker identity with well-preserved content. Better disentanglement of speech representations leads to better voice conversion. Recent studies have found that phonetic information from input audio has the potential ability to well represent content. Besides, the speaker-style modeling with pre-trained models making the process more complex. To tackle these… ▽ More

    Submitted 17 January, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted by 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2024)

  25. arXiv:2401.08049  [pdf, other

    cs.CV cs.SD eess.AS

    EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model

    Authors: Bingyuan Zhang, Xulong Zhang, Ning Cheng, Jun Yu, Jing Xiao, Jianzong Wang

    Abstract: In recent years, the field of talking faces generation has attracted considerable attention, with certain methods adept at generating virtual faces that convincingly imitate human expressions. However, existing methods face challenges related to limited generalization, particularly when dealing with challenging identities. Furthermore, methods for editing expressions are often confined to a singul… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted by 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2024)

  26. arXiv:2311.08670  [pdf, other

    cs.SD eess.AS

    CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control and Contrastive Learning with Negative Samples Augmentation

    Authors: Yimin Deng, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

    Abstract: Better disentanglement of speech representation is essential to improve the quality of voice conversion. Recently contrastive learning is applied to voice conversion successfully based on speaker labels. However, the performance of model will reduce in conversion between similar speakers. Hence, we propose an augmented negative sample selection to address the issue. Specifically, we create hard ne… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted by the 21st IEEE International Symposium on Parallel and Distributed Processing with Applications (IEEE ISPA 2023)

  27. arXiv:2311.07965  [pdf, other

    cs.SD eess.AS

    DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation

    Authors: Jianzong Wang, Pengcheng Li, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: Most existing neural-based text-to-speech methods rely on extensive datasets and face challenges under low-resource condition. In this paper, we introduce a novel semi-supervised text-to-speech synthesis model that learns from both paired and unpaired data to address this challenge. The key component of the proposed model is a dynamic quantized representation module, which is integrated into a seq… ▽ More

    Submitted 2 February, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted by the 13th IEEE International Conference on Big Data and Cloud Computing (IEEE BDCloud 2023)

  28. arXiv:2311.03046  [pdf, other

    cs.IT eess.SP

    Antenna Positioning and Beamforming Design for Fluid-Antenna Enabled Multi-user Downlink Communications

    Authors: Haoran Qin, Wen Chen, Zhendong Li, Qingqing Wu, Nan Cheng, Fangjiong Chen

    Abstract: This paper investigates a multiple input single output (MISO) downlink communication system in which users are equipped with fluid antennas (FAs). First, we adopt a field-response based channel model to characterize the downlink channel with respect to FAs' positions. Then, we aim to minimize the total transmit power by jointly optimizing the FAs' positions and beamforming matrix. To solve the res… ▽ More

    Submitted 13 January, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

  29. arXiv:2310.16302  [pdf, other

    cs.LG eess.SY

    Imperfect Digital Twin Assisted Low Cost Reinforcement Training for Multi-UAV Networks

    Authors: Xiucheng Wang, Nan Cheng, Longfei Ma, Zhisheng Yin, Tom. Luan, Ning Lu

    Abstract: Deep Reinforcement Learning (DRL) is widely used to optimize the performance of multi-UAV networks. However, the training of DRL relies on the frequent interactions between the UAVs and the environment, which consumes lots of energy due to the flying and communication of UAVs in practical experiments. Inspired by the growing digital twin (DT) technology, which can simulate the performance of algor… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  30. arXiv:2310.05072  [pdf, other

    cs.IT eess.SP

    Performance Analysis of RIS-Aided Double Spatial Scattering Modulation for mmWave MIMO Systems

    Authors: Xusheng Zhu, Wen Chen, Qingqing Wu, Jun Li, Nan Cheng, Fangjiong Chen, Changle Li

    Abstract: In this paper, we investigate a practical structure of reconfigurable intelligent surface (RIS)-based double spatial scattering modulation (DSSM) for millimeter-wave (mmWave) multiple-input multiple-output (MIMO) systems. A suboptimal detector is proposed, in which the beam direction is first demodulated according to the received beam strength, and then the remaining information is demodulated by… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

  31. arXiv:2309.08839  [pdf, other

    cs.SD cs.MM eess.AS

    Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval

    Authors: Kaiyi Luo, Xulong Zhang, Jianzong Wang, Huaxiong Li, Ning Cheng, Jing Xiao

    Abstract: Cross-modal retrieval (CMR) has been extensively applied in various domains, such as multimedia search engines and recommendation systems. Most existing CMR methods focus on image-to-text retrieval, whereas audio-to-text retrieval, a less explored domain, has posed a great challenge due to the difficulty to uncover discriminative features from audio clips and texts. Existing studies are restricted… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted by The 35th IEEE International Conference on Tools with Artificial Intelligence. (ICTAI 2023)

  32. arXiv:2309.08838  [pdf, other

    cs.CV eess.IV

    AOSR-Net: All-in-One Sandstorm Removal Network

    Authors: Yazhong Si, Xulong Zhang, Fan Yang, Jianzong Wang, Ning Cheng, Jing Xiao

    Abstract: Most existing sandstorm image enhancement methods are based on traditional theory and prior knowledge, which often restrict their applicability in real-world scenarios. In addition, these approaches often adopt a strategy of color correction followed by dust removal, which makes the algorithm structure too complex. To solve the issue, we introduce a novel image restoration model, named all-in-one… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted by The 35th IEEE International Conference on Tools with Artificial Intelligence. (ICTAI 2023)

  33. arXiv:2309.08837  [pdf, other

    cs.SD eess.AS

    FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework

    Authors: Jianzong Wang, Xulong Zhang, Aolan Sun, Ning Cheng, Jing Xiao

    Abstract: This paper integrates graph-to-sequence into an end-to-end text-to-speech framework for syntax-aware modelling with syntactic information of input text. Specifically, the input text is parsed by a dependency parsing module to form a syntactic graph. The syntactic graph is then encoded by a graph encoder to extract the syntactic hidden information, which is concatenated with phoneme embedding and i… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted by The 35th IEEE International Conference on Tools with Artificial Intelligence. (ICTAI 2023)

  34. arXiv:2308.14348  [pdf, other

    eess.SY cs.CR cs.LG

    Label-free Deep Learning Driven Secure Access Selection in Space-Air-Ground Integrated Networks

    Authors: Zhaowei Wang, Zhisheng Yin, Xiucheng Wang, Nan Cheng, Yuan Zhang, Tom H. Luan

    Abstract: In Space-air-ground integrated networks (SAGIN), the inherent openness and extensive broadcast coverage expose these networks to significant eavesdropping threats. Considering the inherent co-channel interference due to spectrum sharing among multi-tier access networks in SAGIN, it can be leveraged to assist the physical layer security among heterogeneous transmissions. However, it is challenging… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  35. arXiv:2308.14319  [pdf, other

    cs.SD eess.AS

    Voice Conversion with Denoising Diffusion Probabilistic GAN Models

    Authors: Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

    Abstract: Voice conversion is a method that allows for the transformation of speaking style while maintaining the integrity of linguistic information. There are many researchers using deep generative models for voice conversion tasks. Generative Adversarial Networks (GANs) can quickly generate high-quality samples, but the generated samples lack diversity. The samples generated by the Denoising Diffusion Pr… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted by 19th International Conference on Advanced Data Mining and Applications. (ADMA 2023)

  36. arXiv:2308.14317  [pdf, other

    cs.SD eess.AS

    Symbolic & Acoustic: Multi-domain Music Emotion Modeling for Instrumental Music

    Authors: Kexin Zhu, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

    Abstract: Music Emotion Recognition involves the automatic identification of emotional elements within music tracks, and it has garnered significant attention due to its broad applicability in the field of Music Information Retrieval. It can also be used as the upstream task of many other human-related tasks such as emotional music generation and music recommendation. Due to existing psychology research, mu… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted by 19th International Conference on Advanced Data Mining and Applications. (ADMA 2023)

  37. arXiv:2308.13849  [pdf, other

    cs.LG cs.AI eess.SY

    Effectively Heterogeneous Federated Learning: A Pairing and Split Learning Based Approach

    Authors: Jinglong Shen, Xiucheng Wang, Nan Cheng, Longfei Ma, Conghao Zhou, Yuan Zhang

    Abstract: As a promising paradigm federated Learning (FL) is widely used in privacy-preserving machine learning, which allows distributed devices to collaboratively train a model while avoiding data transmission among clients. Despite its immense potential, the FL suffers from bottlenecks in training speed due to client heterogeneity, leading to escalated training latency and straggling server aggregation.… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

  38. PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion

    Authors: Yimin Deng, Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

    Abstract: Voice conversion as the style transfer task applied to speech, refers to converting one person's speech into a new speech that sounds like another person's. Up to now, there has been a lot of research devoted to better implementation of VC tasks. However, a good voice conversion model should not only match the timbre information of the target speaker, but also expressive information such as prosod… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted by the 31st ACM International Conference on Multimedia (MM2023)

  39. arXiv:2308.07511  [pdf, other

    cs.LG eess.SY

    Distilling Knowledge from Resource Management Algorithms to Neural Networks: A Unified Training Assistance Approach

    Authors: Longfei Ma, Nan Cheng, Xiucheng Wang, Zhisheng Yin, Haibo Zhou, Wei Quan

    Abstract: As a fundamental problem, numerous methods are dedicated to the optimization of signal-to-interference-plus-noise ratio (SINR), in a multi-user setting. Although traditional model-based optimization methods achieve strong performance, the high complexity raises the research of neural network (NN) based approaches to trade-off the performance and complexity. To fully leverage the high performance o… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  40. arXiv:2308.02603  [pdf, other

    cs.LG eess.SY

    Knowledge-Driven Multi-Agent Reinforcement Learning for Computation Offloading in Cybertwin-Enabled Internet of Vehicles

    Authors: Ruijin Sun, Xiao Yang, Nan Cheng, Xiucheng Wang, Changle Li

    Abstract: By offloading computation-intensive tasks of vehicles to roadside units (RSUs), mobile edge computing (MEC) in the Internet of Vehicles (IoV) can relieve the onboard computation burden. However, existing model-based task offloading methods suffer from heavy computational complexity with the increase of vehicles and data-driven methods lack interpretability. To address these challenges, in this pap… ▽ More

    Submitted 8 September, 2023; v1 submitted 4 August, 2023; originally announced August 2023.

  41. arXiv:2307.12307  [pdf, other

    cs.IT eess.SP

    Robust Weighted Sum-Rate Maximization for Transmissive RIS Transmitter Enabled RSMA Networks

    Authors: Bojiang Li, Wen Chen, Zhendong Li, Qingqing Wu, Nan Cheng, Changle Li, Linglong Dai

    Abstract: Due to the low power consumption and low cost nature of transmissive reconfigurable intelligent surface (RIS),in this paper, we propose a downlink multi-user rate-splitting multiple access (RSMA) architecture based on the transmissive RIS transmitter, where the channel state information (CSI) is only accquired partially. We investigate the weighted sum-rate maximization problem by jointly optimizi… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

  42. arXiv:2307.05882  [pdf, other

    eess.SY

    Knowledge-Driven Resource Allocation for D2D Networks: A WMMSE Unrolled Graph Neural Network Approach

    Authors: Hao Yang, Nan Cheng, Ruijin Sun, Wei Quan, Rong Chai, Khalid Aldubaikhy, Abdullah Alqasir, Xuemin Shen

    Abstract: This paper proposes an novel knowledge-driven approach for resource allocation in device-to-device (D2D) networks using a graph neural network (GNN) architecture. To meet the millisecond-level timeliness and scalability required for the dynamic network environment, our proposed approach incorporates the deep unrolling of the weighted minimum mean square error (WMMSE) algorithm, referred to as doma… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  43. arXiv:2307.02002  [pdf, other

    eess.SY

    Interpretable and Secure Trajectory Optimization for UAV-Assisted Communication

    Authors: Yunhao Quan, Nan Cheng, Xiucheng Wang, Jinglong Shen, Longfei Ma, Zhisheng Yin

    Abstract: Unmanned aerial vehicles (UAVs) have gained popularity due to their flexible mobility, on-demand deployment, and the ability to establish high probability line-of-sight wireless communication. As a result, UAVs have been extensively used as aerial base stations (ABSs) to supplement ground-based cellular networks for various applications. However, existing UAV-assisted communication schemes mainly… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

  44. arXiv:2306.08938  [pdf, other

    eess.SY cs.LG

    Scalable Resource Management for Dynamic MEC: An Unsupervised Link-Output Graph Neural Network Approach

    Authors: Xiucheng Wang, Nan Cheng, Lianhao Fu, Wei Quan, Ruijin Sun, Yilong Hui, Tom Luan, Xuemin Shen

    Abstract: Deep learning has been successfully adopted in mobile edge computing (MEC) to optimize task offloading and resource allocation. However, the dynamics of edge networks raise two challenges in neural network (NN)-based optimization methods: low scalability and high training costs. Although conventional node-output graph neural networks (GNN) can extract features of edge nodes when the network scales… ▽ More

    Submitted 19 June, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

  45. arXiv:2306.00648  [pdf, other

    cs.SD eess.AS

    EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis

    Authors: Haobin Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

    Abstract: There has been significant progress in emotional Text-To-Speech (TTS) synthesis technology in recent years. However, existing methods primarily focus on the synthesis of a limited number of emotion types and have achieved unsatisfactory performance in intensity control. To address these limitations, we propose EmoMix, which can generate emotional speech with specified intensity or a mixture of emo… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted by 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023)

  46. arXiv:2305.16333  [pdf, ps, other

    cs.CL cs.AI cs.LG eess.AS

    Text Generation with Speech Synthesis for ASR Data Augmentation

    Authors: Zhuangqun Huang, Gil Keren, Ziran Jiang, Shashank Jain, David Goss-Grubbs, Nelson Cheng, Farnaz Abtahi, Duc Le, David Zhang, Antony D'Avirro, Ethan Campbell-Taylor, Jessie Salas, Irina-Elena Veliche, Xi Chen

    Abstract: Aiming at reducing the reliance on expensive human annotations, data synthesis for Automatic Speech Recognition (ASR) has remained an active area of research. While prior work mainly focuses on synthetic speech generation for ASR data augmentation, its combination with text generation methods is considerably less explored. In this work, we explore text augmentation for ASR using large-scale pre-tr… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  47. arXiv:2304.11547  [pdf, other

    cs.SD eess.AS

    SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model

    Authors: Jianzong Wang, Xulong Zhang, Haobin Tang, Aolan Sun, Ning Cheng, Jing Xiao

    Abstract: In recent Text-to-Speech (TTS) systems, a neural vocoder often generates speech samples by solely conditioning on acoustic features predicted from an acoustic model. However, there are always distortions existing in the predicted acoustic features, compared to those of the groundtruth, especially in the common case of poor acoustic modeling due to low-quality training data. To overcome such limits… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

    Comments: Accepted by IJCNN2023. 2023 International Joint Conference on Neural Networks (IJCNN2023)

  48. arXiv:2303.11421  [pdf, other

    eess.SP cs.AI cs.LG

    Improving EEG-based Emotion Recognition by Fusing Time-frequency And Spatial Representations

    Authors: Kexin Zhu, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

    Abstract: Using deep learning methods to classify EEG signals can accurately identify people's emotions. However, existing studies have rarely considered the application of the information in another domain's representations to feature selection in the time-frequency domain. We propose a classification network of EEG signals based on the cross-domain feature fusion method, which makes the network more focus… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023 - The 48th IEEE International Conference on Acoustics, Speech, & Signal Processing

  49. arXiv:2303.07687  [pdf, other

    cs.SD cs.CL eess.AS

    Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy

    Authors: Xulong Zhang, Haobin Tang, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao

    Abstract: Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of speech recognition compared with traditional autoregressive models. In this work, we present dynamic alignment Mask CTC, introducing two methods: (1) Aligned Cross Entropy (AXE), finding the monotonic alignment that minimizes the cross-entropy loss through dynamic progr… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  50. arXiv:2303.07682  [pdf, other

    cs.SD cs.CL eess.AS

    QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis

    Authors: Haobin Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

    Abstract: Recent expressive text to speech (TTS) models focus on synthesizing emotional speech, but some fine-grained styles such as intonation are neglected. In this paper, we propose QI-TTS which aims to better transfer and control intonation to further deliver the speaker's questioning intention while transferring emotion from reference speech. We propose a multi-style extractor to extract style embeddin… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023