Search | arXiv e-print repository

RadioDiff: An Effective Generative Diffusion Model for Sampling-Free Dynamic Radio Map Construction

Authors: Xiucheng Wang, Keda Tao, Nan Cheng, Zhisheng Yin, Zan Li, Yuan Zhang, Xuemin Shen

Abstract: Radio map (RM) is a promising technology that can obtain pathloss based on only location, which is significant for 6G network applications to reduce the communication costs for pathloss estimation. However, the construction of RM in traditional is either computationally intensive or depends on costly sampling-based pathloss measurements. Although the neural network (NN)-based method can efficientl… ▽ More Radio map (RM) is a promising technology that can obtain pathloss based on only location, which is significant for 6G network applications to reduce the communication costs for pathloss estimation. However, the construction of RM in traditional is either computationally intensive or depends on costly sampling-based pathloss measurements. Although the neural network (NN)-based method can efficiently construct the RM without sampling, its performance is still suboptimal. This is primarily due to the misalignment between the generative characteristics of the RM construction problem and the discrimination modeling exploited by existing NN-based methods. Thus, to enhance RM construction performance, in this paper, the sampling-free RM construction is modeled as a conditional generative problem, where a denoised diffusion-based method, named RadioDiff, is proposed to achieve high-quality RM construction. In addition, to enhance the diffusion model's capability of extracting features from dynamic environments, an attention U-Net with an adaptive fast Fourier transform module is employed as the backbone network to improve the dynamic environmental features extracting capability. Meanwhile, the decoupled diffusion model is utilized to further enhance the construction performance of RMs. Moreover, a comprehensive theoretical analysis of why the RM construction is a generative problem is provided for the first time, from both perspectives of data features and NN training methods. Experimental results show that the proposed RadioDiff achieves state-of-the-art performance in all three metrics of accuracy, structural similarity, and peak signal-to-noise ratio. The code is available at https://github.com/UNIC-Lab/RadioDiff. △ Less

Submitted 16 August, 2024; originally announced August 2024.

arXiv:2407.13123 [pdf, other]

Reconfigurable Intelligent Surface Aided Vehicular Edge Computing: Joint Phase-shift Optimization and Multi-User Power Allocation

Authors: Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief

Abstract: Vehicular edge computing (VEC) is an emerging technology with significant potential in the field of internet of vehicles (IoV), enabling vehicles to perform intensive computational tasks locally or offload them to nearby edge devices. However, the quality of communication links may be severely deteriorated due to obstacles such as buildings, impeding the offloading process. To address this challen… ▽ More Vehicular edge computing (VEC) is an emerging technology with significant potential in the field of internet of vehicles (IoV), enabling vehicles to perform intensive computational tasks locally or offload them to nearby edge devices. However, the quality of communication links may be severely deteriorated due to obstacles such as buildings, impeding the offloading process. To address this challenge, we introduce the use of Reconfigurable Intelligent Surfaces (RIS), which provide alternative communication pathways to assist vehicular communication. By dynamically adjusting the phase-shift of the RIS, the performance of VEC systems can be substantially improved. In this work, we consider a RIS-assisted VEC system, and design an optimal scheme for local execution power, offloading power, and RIS phase-shift, where random task arrivals and channel variations are taken into account. To address the scheme, we propose an innovative deep reinforcement learning (DRL) framework that combines the Deep Deterministic Policy Gradient (DDPG) algorithm for optimizing RIS phase-shift coefficients and the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm for optimizing the power allocation of vehicle user (VU). Simulation results show that our proposed scheme outperforms the traditional centralized DDPG, Twin Delayed Deep Deterministic Policy Gradient (TD3) and some typical stochastic schemes. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: This paper has been submitted to IEEE Journal. The source code has been released at https://github.com/qiongwu86/DDPG-RIS-MADDPG-POWER. arXiv admin note: text overlap with arXiv:2406.11318

arXiv:2407.11875 [pdf, ps, other]

Cramer-Rao Bound Minimization for Movable Antenna-Assisted Multiuser Integrated Sensing and Communications

Authors: Haoran Qin, Wen Chen, Qingqing Wu, Ziheng Zhang, Zhendong Li, Nan Cheng

Abstract: This paper investigates a movable antenna (MA)-assisted multiuser integrated sensing and communication (ISAC) system, where the base station (BS) and communication users are all equipped with MA for improving both the sensing and communication performance. We employ the Cramer-Rao bound (CRB) as the performance metric of sensing, thus a joint beamforming design and MAs' position optimizing problem… ▽ More This paper investigates a movable antenna (MA)-assisted multiuser integrated sensing and communication (ISAC) system, where the base station (BS) and communication users are all equipped with MA for improving both the sensing and communication performance. We employ the Cramer-Rao bound (CRB) as the performance metric of sensing, thus a joint beamforming design and MAs' position optimizing problem is formulated to minimize the CRB. However the resulting optimization problem is NP-hard and the variables are highly coupled. To tackle this problem, we propose an alternating optimization (AO) framework by adopting semidefinite relaxation (SDR) and successive convex approximation (SCA) technique. Numerical results reveal that the proposed MA-assisted ISAC system achieves lower estimation CRB compared to the fixed-position antenna (FPA) counterpart. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.06767 [pdf, other]

Enhancing Robustness and Security in ISAC Network Design: Leveraging Transmissive Reconfigurable Intelligent Surface with RSMA

Authors: Ziwei Liu, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, Qiong Wu, Nan Cheng

Abstract: In this paper, we propose a novel transmissive reconfigurable intelligent surface transceiver-enhanced robust and secure integrated sensing and communication network. A time-division sensing communication mechanism is designed for the scenario, which enables communication and sensing to share wireless resources. To address the interference management problem and hinder eavesdropping, we implement… ▽ More In this paper, we propose a novel transmissive reconfigurable intelligent surface transceiver-enhanced robust and secure integrated sensing and communication network. A time-division sensing communication mechanism is designed for the scenario, which enables communication and sensing to share wireless resources. To address the interference management problem and hinder eavesdropping, we implement rate-splitting multiple access (RSMA), where the common stream is designed as a useful signal and an artificial noise, while taking into account the imperfect channel state information and modeling the channel for the illegal users in a fine-grained manner as well as giving an upper bound on the error. We introduce the secrecy outage probability and construct an optimization problem with secrecy sum-rate as the objective functions to optimize the common stream beamforming matrix, the private stream beamforming matrix and the timeslot duration variable. Due to the coupling of the optimization variables and the infinity of the error set, the proposed problem is a nonconvex optimization problem that cannot be solved directly. In order to address the above challenges, the block coordinate descent-based second-order cone programming algorithm is used to decouple the optimization variables and solving the problem. Specifically, the problem is decoupled into two subproblems concerning the common stream beamforming matrix, the private stream beamforming matrix, and the timeslot duration variable, which are solved by alternating optimization until convergence is reached. To solve the problem, S-procedure, Bernstein's inequality and successive convex approximation are employed to deal with the objective function and non-convex constraints. Numerical simulation results verify the superiority of the proposed scheme in improving the secrecy energy efficiency and the Cramér-Rao boundary. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.05331 [pdf, ps, other]

Channel Characterization of IRS-assisted Resonant Beam Communication Systems

Authors: Wen Fang, Wen Chen, Qingqing Wu, Xusheng Zhu, Qiong Wu, Nan Cheng

Abstract: To meet the growing demand for data traffic, spectrum-rich optical wireless communication (OWC) has emerged as a key technological driver for the development of 6G. The resonant beam communication (RBC) system, which employs spatially separated laser cavities as the transmitter and receiver, is a high-speed OWC technology capable of self-alignment without tracking. However, its transmission throug… ▽ More To meet the growing demand for data traffic, spectrum-rich optical wireless communication (OWC) has emerged as a key technological driver for the development of 6G. The resonant beam communication (RBC) system, which employs spatially separated laser cavities as the transmitter and receiver, is a high-speed OWC technology capable of self-alignment without tracking. However, its transmission through the air is susceptible to losses caused by obstructions. In this paper, we propose an intelligent reflecting surface (IRS) assisted RBC system with the optical frequency doubling method, where the resonant beam in frequency-fundamental and frequency-doubled is transmitted through both direct line-of-sight (LoS) and IRS-assisted channels to maintain steady-state oscillation and enable communication without echo-interference, respectively. Then, we establish the channel model based on Fresnel diffraction theory under the near-field optical propagation to analyze the transmission loss and frequency-doubled power analytically. Furthermore, communication power can be maximized by dynamically controlling the beam-splitting ratio between the two channels according to the loss levels encountered over air. Numerical results validate that the IRS-assisted channel can compensate for the losses in the obstructed LoS channel and misaligned receivers, ensuring that communication performance reaches an optimal value with dynamic ratio adjustments. △ Less

Submitted 15 August, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.03668 [pdf, other]

Reliable Projection Based Unsupervised Learning for Semi-Definite QCQP with Application of Beamforming Optimization

Authors: Xiucheng Wang, Qi Qiu, Nan Cheng

Abstract: In this paper, we investigate a special class of quadratic-constrained quadratic programming (QCQP) with semi-definite constraints. Traditionally, since such a problem is non-convex and N-hard, the neural network (NN) is regarded as a promising method to obtain a high-performing solution. However, due to the inherent prediction error, it is challenging to ensure all solution output by the NN is fe… ▽ More In this paper, we investigate a special class of quadratic-constrained quadratic programming (QCQP) with semi-definite constraints. Traditionally, since such a problem is non-convex and N-hard, the neural network (NN) is regarded as a promising method to obtain a high-performing solution. However, due to the inherent prediction error, it is challenging to ensure all solution output by the NN is feasible. Although some existing methods propose some naive methods, they only focus on reducing the constraint violation probability, where not all solutions are feasibly guaranteed. To deal with the above challenge, in this paper a computing efficient and reliable projection is proposed, where all solution output by the NN are ensured to be feasible. Moreover, unsupervised learning is used, so the NN can be trained effectively and efficiently without labels. Theoretically, the solution of the NN after projection is proven to be feasible, and we also prove the projection method can enhance the convergence performance and speed of the NN. To evaluate our proposed method, the quality of service (QoS)-contained beamforming scenario is studied, where the simulation results show the proposed method can achieve high-performance which is competitive with the lower bound. △ Less

Submitted 9 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

arXiv:2406.13145 [pdf, other]

Constructing and Evaluating Digital Twins: An Intelligent Framework for DT Development

Authors: Longfei Ma, Nan Cheng, Xiucheng Wang, Jiong Chen, Yinjun Gao, Dongxiao Zhang, Jun-Jie Zhang

Abstract: The development of Digital Twins (DTs) represents a transformative advance for simulating and optimizing complex systems in a controlled digital space. Despite their potential, the challenge of constructing DTs that accurately replicate and predict the dynamics of real-world systems remains substantial. This paper introduces an intelligent framework for the construction and evaluation of DTs, spec… ▽ More The development of Digital Twins (DTs) represents a transformative advance for simulating and optimizing complex systems in a controlled digital space. Despite their potential, the challenge of constructing DTs that accurately replicate and predict the dynamics of real-world systems remains substantial. This paper introduces an intelligent framework for the construction and evaluation of DTs, specifically designed to enhance the accuracy and utility of DTs in testing algorithmic performance. We propose a novel construction methodology that integrates deep learning-based policy gradient techniques to dynamically tune the DT parameters, ensuring high fidelity in the digital replication of physical systems. Moreover, the Mean STate Error (MSTE) is proposed as a robust metric for evaluating the performance of algorithms within these digital space. The efficacy of our framework is demonstrated through extensive simulations that show our DT not only accurately mirrors the physical reality but also provides a reliable platform for algorithm evaluation. This work lays a foundation for future research into DT technologies, highlighting pathways for both theoretical enhancements and practical implementations in various industries. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.11318 [pdf, other]

Reconfigurable Intelligent Surface Assisted VEC Based on Multi-Agent Reinforcement Learning

Authors: Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Qiang Fan, Jiangzhou Wang

Abstract: Vehicular edge computing (VEC) is an emerging technology that enables vehicles to perform high-intensity tasks by executing tasks locally or offloading them to nearby edge devices. However, obstacles such as buildings may degrade the communications and incur communication interruptions, and thus the vehicle may not meet the requirement for task offloading. Reconfigurable intelligent surfaces (RIS)… ▽ More Vehicular edge computing (VEC) is an emerging technology that enables vehicles to perform high-intensity tasks by executing tasks locally or offloading them to nearby edge devices. However, obstacles such as buildings may degrade the communications and incur communication interruptions, and thus the vehicle may not meet the requirement for task offloading. Reconfigurable intelligent surfaces (RIS) is introduced to support vehicle communication and provide an alternative communication path. The system performance can be improved by flexibly adjusting the phase-shift of the RIS. For RIS-assisted VEC system where tasks arrive randomly, we design a control scheme that considers offloading power, local power allocation and phase-shift optimization. To solve this non-convex problem, we propose a new deep reinforcement learning (DRL) framework that employs modified multi-agent deep deterministic policy gradient (MADDPG) approach to optimize the power allocation for vehicle users (VUs) and block coordinate descent (BCD) algorithm to optimize the phase-shift of the RIS. Simulation results show that our proposed scheme outperforms the centralized deep deterministic policy gradient (DDPG) scheme and random scheme. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: This paper has been submitted to IEEE Journal. The source code has been released at: https://github.com/qiongwu86/RIS-VEC-MARL.git

arXiv:2406.11245 [pdf, other]

Deep-Reinforcement-Learning-Based AoI-Aware Resource Allocation for RIS-Aided IoV Networks

Authors: Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief

Abstract: Reconfigurable Intelligent Surface (RIS) is a pivotal technology in communication, offering an alternative path that significantly enhances the link quality in wireless communication environments. In this paper, we propose a RIS-assisted internet of vehicles (IoV) network, considering the vehicle-to-everything (V2X) communication method. In addition, in order to improve the timeliness of vehicle-t… ▽ More Reconfigurable Intelligent Surface (RIS) is a pivotal technology in communication, offering an alternative path that significantly enhances the link quality in wireless communication environments. In this paper, we propose a RIS-assisted internet of vehicles (IoV) network, considering the vehicle-to-everything (V2X) communication method. In addition, in order to improve the timeliness of vehicle-to-infrastructure (V2I) links and the stability of vehicle-to-vehicle (V2V) links, we introduce the age of information (AoI) model and the payload transmission probability model. Therefore, with the objective of minimizing the AoI of V2I links and prioritizing transmission of V2V links payload, we construct this optimization problem as an Markov decision process (MDP) problem in which the BS serves as an agent to allocate resources and control phase-shift for the vehicles using the soft actor-critic (SAC) algorithm, which gradually converges and maintains a high stability. A AoI-aware joint vehicular resource allocation and RIS phase-shift control scheme based on SAC algorithm is proposed and simulation results show that its convergence speed, cumulative reward, AoI performance, and payload transmission probability outperforms those of proximal policy optimization (PPO), deep deterministic policy gradient (DDPG), twin delayed deep deterministic policy gradient (TD3) and stochastic algorithms. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: This paper has been submitted to IEEE Journal. The source code has been released at https://github.com/qiongwu86/RIS-RB-AoI-V2X-DRL.git

arXiv:2406.09846 [pdf, ps, other]

Multiple Intelligent Reflecting Surfaces Collaborative Wireless Localization System

Authors: Ziheng Zhang, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, Jingfeng Chen, Nan Cheng

Abstract: This paper studies a multiple intelligent reflecting surfaces (IRSs) collaborative localization system where multiple semi-passive IRSs are deployed in the network to locate one or more targets based on time-of-arrival. It is assumed that each semi-passive IRS is equipped with reflective elements and sensors, which are used to establish the line-of-sight links from the base station (BS) to multipl… ▽ More This paper studies a multiple intelligent reflecting surfaces (IRSs) collaborative localization system where multiple semi-passive IRSs are deployed in the network to locate one or more targets based on time-of-arrival. It is assumed that each semi-passive IRS is equipped with reflective elements and sensors, which are used to establish the line-of-sight links from the base station (BS) to multiple targets and process echo signals, respectively. Based on the above model, we derive the Fisher information matrix of the echo signal with respect to the time delay. By employing the chain rule and exploiting the geometric relationship between time delay and position, the Cramer-Rao bound (CRB) for estimating the target's Cartesian coordinate position is derived. Then, we propose a two-stage algorithmic framework to minimize CRB in single- and multi-target localization systems by joint optimizing active beamforming at BS, passive beamforming at multiple IRSs and IRS selection. For the single-target case, we derive the optimal closed-form solution for multiple IRSs coefficients design and propose a lowcomplexity algorithm based on alternating direction method of multipliers to obtain the optimal solution for active beaming design. For the multi-target case, alternating optimization is used to transform the original problem into two subproblems where semi-definite relaxation and successive convex approximation are applied to tackle the quadraticity and indefiniteness in the CRB expression, respectively. Finally, numerical simulation results validate the effectiveness of the proposed algorithm for multiple IRSs collaborative localization system compared to other benchmark schemes as well as the significant performance gains. △ Less

Submitted 17 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

Comments: 13 pages, 8 figures

arXiv:2406.08835 [pdf, other]

EffectiveASR: A Single-Step Non-Autoregressive Mandarin Speech Recognition Architecture with High Accuracy and Inference Speed

Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Ming Fang, Tao Wei, Zijian Li, Ning Cheng, Wei Hu, Shaojun Wang, Jing Xiao

Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. In this paper, we propose a single-step NAR ASR architecture with high accuracy and inference speed, called EffectiveASR. It uses an Index Mappin… ▽ More Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. In this paper, we propose a single-step NAR ASR architecture with high accuracy and inference speed, called EffectiveASR. It uses an Index Mapping Vector (IMV) based alignment generator to generate alignments during training, and an alignment predictor to learn the alignments for inference. It can be trained end-to-end (E2E) with cross-entropy loss combined with alignment loss. The proposed EffectiveASR achieves competitive results on the AISHELL-1 and AISHELL-2 Mandarin benchmarks compared to the leading models. Specifically, it achieves character error rates (CER) of 4.26%/4.62% on the AISHELL-1 dev/test dataset, which outperforms the AR Conformer with about 30x inference speedup. △ Less

Submitted 28 August, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: Submitted to ICASSP 2025

arXiv:2406.07996 [pdf, other]

Semantic-Aware Resource Allocation Based on Deep Reinforcement Learning for 5G-V2X HetNets

Authors: Zhiyu Shao, Qiong Wu, Pingyi Fan, Nan Cheng, Qiang Fan, Jiangzhou Wang

Abstract: This letter proposes a semantic-aware resource allocation (SARA) framework with flexible duty cycle (DC) coexistence mechanism (SARADC) for 5G-V2X Heterogeneous Network (HetNets) based on deep reinforcement learning (DRL) proximal policy optimization (PPO). Specifically, we investigate V2X networks within a two-tiered HetNets structure. In response to the needs of high-speed vehicular networking i… ▽ More This letter proposes a semantic-aware resource allocation (SARA) framework with flexible duty cycle (DC) coexistence mechanism (SARADC) for 5G-V2X Heterogeneous Network (HetNets) based on deep reinforcement learning (DRL) proximal policy optimization (PPO). Specifically, we investigate V2X networks within a two-tiered HetNets structure. In response to the needs of high-speed vehicular networking in urban environments, we design a semantic communication system and introduce two resource allocation metrics: high-speed semantic transmission rate (HSR) and semantic spectrum efficiency (HSSE). Our main goal is to maximize HSSE. Additionally, we address the coexistence of vehicular users and WiFi users in 5G New Radio Unlicensed (NR-U) networks. To tackle this complex challenge, we propose a novel approach that jointly optimizes flexible DC coexistence mechanism and the allocation of resources and base stations (BSs). Unlike traditional bit transmission methods, our approach integrates the semantic communication paradigm into the communication system. Experimental results demonstrate that our proposed solution outperforms traditional bit transmission methods with traditional DC coexistence mechanism in terms of HSSE and semantic throughput (ST) for both vehicular and WiFi users. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: This paper has been submitted to IEEE Letter.The source code has been released at: https://github.com/qiongwu86/Semantic-Aware-Resource-Allocation-Based-on-Deep-Reinforcement-Learning-for-5G-V2X-HetNets

arXiv:2406.07857 [pdf, other]

Toward Enhanced Reinforcement Learning-Based Resource Management via Digital Twin: Opportunities, Applications, and Challenges

Authors: Nan Cheng, Xiucheng Wang, Zan Li, Zhisheng Yin, Tom Luan, Xuemin Shen

Abstract: This article presents a digital twin (DT)-enhanced reinforcement learning (RL) framework aimed at optimizing performance and reliability in network resource management, since the traditional RL methods face several unified challenges when applied to physical networks, including limited exploration efficiency, slow convergence, poor long-term performance, and safety concerns during the exploration… ▽ More This article presents a digital twin (DT)-enhanced reinforcement learning (RL) framework aimed at optimizing performance and reliability in network resource management, since the traditional RL methods face several unified challenges when applied to physical networks, including limited exploration efficiency, slow convergence, poor long-term performance, and safety concerns during the exploration phase. To deal with the above challenges, a comprehensive DT-based framework is proposed to enhance the convergence speed and performance for unified RL-based resource management. The proposed framework provides safe action exploration, more accurate estimates of long-term returns, faster training convergence, higher convergence performance, and real-time adaptation to varying network conditions. Then, two case studies on ultra-reliable and low-latency communication (URLLC) services and multiple unmanned aerial vehicles (UAV) network are presented, demonstrating improvements of the proposed framework in performance, convergence speed, and training cost reduction both on traditional RL and neural network based Deep RL (DRL). Finally, the article identifies and explores some of the research challenges and open issues in this rapidly evolving field. △ Less

Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 7pages, 6figures

arXiv:2406.06998 [pdf, other]

Movable Antenna Enhanced NOMA Short-Packet Transmission

Authors: Xinyuan He, Wen Chen, Qingqing Wu, Xusheng Zhu, Nan Cheng

Abstract: This letter investigates a short-packet downlink transmission system using non-orthogonal multiple access (NOMA) enhanced via movable antenna (MA). We focuses on maximizing the effective throughput for a core user while ensuring reliable communication for an edge user by optimizing the MAs' coordinates and the power and rate allocations from the access point (AP). The optimization challenge is app… ▽ More This letter investigates a short-packet downlink transmission system using non-orthogonal multiple access (NOMA) enhanced via movable antenna (MA). We focuses on maximizing the effective throughput for a core user while ensuring reliable communication for an edge user by optimizing the MAs' coordinates and the power and rate allocations from the access point (AP). The optimization challenge is approached by decomposing it into two subproblems, utilizing successive convex approximation (SCA) to handle the highly non-concave nature of channel gains. Numerical results confirm that the proposed solution offers substantial improvements in effective throughput compared to NOMA short-packet communication with fixed position antennas (FPAs). △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 5 pages, 4 figures

arXiv:2405.18692 [pdf, other]

Movable Antenna Empowered Downlink NOMA Systems: Power Allocation and Antenna Position Optimization

Authors: Yufeng Zhou, Wen Chen, Qingqing Wu, Xusheng Zhu, Nan Cheng

Abstract: This paper investigates a novel communication paradigm employing movable antennas (MAs) within a multiple-input single-output (MISO) non-orthogonal multiple access (NOMA) downlink framework, where users are equipped with MAs. Initially, leveraging the far-field response, we delineate the channel characteristics concerning both the power allocation coefficient and positions of MAs. Subsequently, we… ▽ More This paper investigates a novel communication paradigm employing movable antennas (MAs) within a multiple-input single-output (MISO) non-orthogonal multiple access (NOMA) downlink framework, where users are equipped with MAs. Initially, leveraging the far-field response, we delineate the channel characteristics concerning both the power allocation coefficient and positions of MAs. Subsequently, we endeavor to maximize the channel capacity by jointly optimizing power allocation and antenna positions. To tackle the resultant non-convex problem, we propose an alternating optimization (AO) scheme underpinned by successive convex approximation (SCA) to converge towards a stationary point. Through numerical simulations, our findings substantiate the superiority of the MA-assisted NOMA system over both orthogonal multiple access (OMA) and conventional NOMA configurations in terms of average sum rate and outage probability. △ Less

Submitted 7 August, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17028 [pdf, other]

RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis

Authors: Haoxiang Shi, Jianzong Wang, Xulong Zhang, Ning Cheng, Jun Yu, Jing Xiao

Abstract: Although current Text-To-Speech (TTS) models are able to generate high-quality speech samples, there are still challenges in developing emotion intensity controllable TTS. Most existing TTS models achieve emotion intensity control by extracting intensity information from reference speeches. Unfortunately, limited by the lack of modeling for intra-class emotion intensity and the model's information… ▽ More Although current Text-To-Speech (TTS) models are able to generate high-quality speech samples, there are still challenges in developing emotion intensity controllable TTS. Most existing TTS models achieve emotion intensity control by extracting intensity information from reference speeches. Unfortunately, limited by the lack of modeling for intra-class emotion intensity and the model's information decoupling capability, the generated speech cannot achieve fine-grained emotion intensity control and suffers from information leakage issues. In this paper, we propose an emotion transfer TTS model, which defines a remapping-based sorting method to model intra-class relative intensity information, combined with Mutual Information (MI) to decouple speaker and emotion information, and synthesizes expressive speeches with perceptible intensity differences. Experiments show that our model achieves fine-grained emotion control while preserving speaker information. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Accepted by the 8th APWeb-WAIM International Joint Conference on Web and Big Data

arXiv:2405.00930 [pdf, other]

MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion

Authors: Pengcheng Li, Jianzong Wang, Xulong Zhang, Yong Zhang, Jing Xiao, Ning Cheng

Abstract: One-shot voice conversion aims to change the timbre of any source speech to match that of the unseen target speaker with only one speech sample. Existing methods face difficulties in satisfactory speech representation disentanglement and suffer from sizable networks as some of them leverage numerous complex modules for disentanglement. In this paper, we propose a model named MAIN-VC to effectively… ▽ More One-shot voice conversion aims to change the timbre of any source speech to match that of the unseen target speaker with only one speech sample. Existing methods face difficulties in satisfactory speech representation disentanglement and suffer from sizable networks as some of them leverage numerous complex modules for disentanglement. In this paper, we propose a model named MAIN-VC to effectively disentangle via a concise neural network. The proposed model utilizes Siamese encoders to learn clean representations, further enhanced by the designed mutual information estimator. The Siamese structure and the newly designed convolution module contribute to the lightweight of our model while ensuring performance in diverse voice conversion tasks. The experimental results show that the proposed model achieves comparable subjective scores and exhibits improvements in objective metrics compared to existing methods in a one-shot voice conversion scenario. △ Less

Submitted 1 May, 2024; originally announced May 2024.