Zum Hauptinhalt springen

Showing 1–50 of 109 results for author: Deng, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.09534  [pdf, ps, other

    eess.SY

    Safe Adaptive Control for Uncertain Systems with Complex Input Constraints

    Authors: Yaosheng Deng, Yang Bai, Yujie Wang, Masaki Ogura, Mir Feroskhan

    Abstract: In this paper, we propose a novel adaptive Control Barrier Function (CBF) based controller for nonlinear systems with complex, time-varying input constraints. Conventional CBF approaches often struggle with feasibility issues and stringent assumptions when addressing input constraints. Unlike these methods, our approach converts the input-constraint problem into an output-constraint CBF design. Th… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 8 pages, 2 figures

  2. arXiv:2408.04358  [pdf, other

    eess.SY

    Goal-Oriented UAV Communication Design and Optimization for Target Tracking: A MachineLearning Approach

    Authors: Wenchao Wu, Yanning Wu, Yuanqing Yang, Yansha Deng

    Abstract: To accomplish various tasks, safe and smooth control of unmanned aerial vehicles (UAVs) needs to be guaranteed, which cannot be met by existing ultra-reliable low latency communications (URLLC). This has attracted the attention of the communication field, where most existing work mainly focused on optimizing communication performance (i.e., delay) and ignored the performance of the task (i.e., tra… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  3. arXiv:2408.03646  [pdf, other

    eess.SY

    Goal-oriented Semantic Communication for the Metaverse Application

    Authors: Zhe Wang, Nan Li, Yansha Deng

    Abstract: With the emergence of the metaverse and its role in enabling real-time simulation and analysis of real-world counterparts, an increasing number of personalized metaverse scenarios are being created to influence entertainment experiences and social behaviors. However, compared to traditional image and video entertainment applications, the exact transmission of the vast amount of metaverse-associate… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  4. arXiv:2408.00428  [pdf, other

    eess.IV

    Goal-Oriented Semantic Communication for Wireless Image Transmission via Stable Diffusion

    Authors: Nan Li, Yansha Deng

    Abstract: Efficient image transmission is essential for seamless communication and collaboration within the visually-driven digital landscape. To achieve low latency and high-quality image reconstruction over a bandwidth-constrained noisy wireless channel, we propose a stable diffusion (SD)-based goal-oriented semantic communication (GSC) framework. In this framework, we design a semantic autoencoder that e… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  5. arXiv:2408.00407  [pdf, other

    eess.SY

    Task-oriented and Semantics-aware Communications for Augmented Reality

    Authors: Zhe Wang, Yansha Deng

    Abstract: Upon the advent of the emerging metaverse and its related applications in Augmented Reality (AR), the current bit-oriented network struggles to support real-time changes for the vast amount of associated information, creating a significant bottleneck in its development. To address the above problem, we present a novel task-oriented and semantics-aware communication framework for augmented reality… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2306.15470

  6. arXiv:2407.14894  [pdf, other

    eess.SY

    A Holistic Optimization Framework for Energy Efficient UAV-assisted Fog Computing: Attitude Control, Trajectory Planning and Task Assignment

    Authors: Shuaijun Liu, Jinqiu Du, Yaxin Zheng, Jiaying Yin, Yuhui Deng, Jingjin Wu

    Abstract: Unmanned Aerial Vehicles (UAVs) have significantly enhanced fog computing by acting as both flexible computation platforms and communication mobile relays. In this paper, we propose a holistic framework that jointly optimizes the total latency and energy consumption for UAV-assisted fog computing in a three-dimensional spatial domain with varying terrain elevations and dynamic task generations. Ou… ▽ More

    Submitted 5 August, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

    Comments: 14 pages, 10 figures

  7. arXiv:2406.11364  [pdf, other

    cs.SD eess.AS

    AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

    Authors: Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, Pingyi Fan

    Abstract: Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machine anomalous sound detection (ASD) task. This may be caused by the inconsistency of the pre-trained model and the inductive bias of machine audio, res… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  8. arXiv:2406.03714  [pdf, other

    cs.SD eess.AS

    Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining

    Authors: Jinlong Xue, Yayue Deng, Yingming Gao, Ya Li

    Abstract: Recent prompt-based text-to-speech (TTS) models can clone an unseen speaker using only a short speech prompt. They leverage a strong in-context ability to mimic the speech prompts, including speaker style, prosody, and emotion. Therefore, the selection of a speech prompt greatly influences the generated speech, akin to the importance of a prompt in large language models (LLMs). However, current pr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  9. arXiv:2406.03706  [pdf, other

    cs.SD cs.CL eess.AS

    Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model

    Authors: Jinlong Xue, Yayue Deng, Yicheng Han, Yingming Gao, Ya Li

    Abstract: Recent advances in large language models (LLMs) and development of audio codecs greatly propel the zero-shot TTS. They can synthesize personalized speech with only a 3-second speech of an unseen speaker as acoustic prompt. However, they only support short speech prompts and cannot leverage longer context information, as required in audiobook and conversational TTS scenarios. In this paper, we intr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  10. arXiv:2405.00603  [pdf, other

    cs.SD eess.AS

    Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation

    Authors: Yimin Deng, Jianzong Wang, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: Voice conversion is the task to transform voice characteristics of source speech while preserving content information. Nowadays, self-supervised representation learning models are increasingly utilized in content extraction. However, in these representations, a lot of hidden speaker information leads to timbre leakage while the prosodic information of hidden units lacks use. To address these issue… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  11. arXiv:2404.01654  [pdf, other

    cs.CV cs.AI eess.IV eess.SP

    AI WALKUP: A Computer-Vision Approach to Quantifying MDS-UPDRS in Parkinson's Disease

    Authors: Xiang Xiang, Zihan Zhang, Jing Ma, Yao Deng

    Abstract: Parkinson's Disease (PD) is the second most common neurodegenerative disorder. The existing assessment method for PD is usually the Movement Disorder Society - Unified Parkinson's Disease Rating Scale (MDS-UPDRS) to assess the severity of various types of motor symptoms and disease progression. However, manual assessment suffers from high subjectivity, lack of consistency, and high cost and low ef… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Technical report for AI WALKUP, an APP winning 3rd Prize of 2022 HUST GS AI Innovation and Design Competition

  12. arXiv:2403.17392  [pdf, other

    cs.RO eess.SY nlin.AO

    Natural-artificial hybrid swarm: Cyborg-insect group navigation in unknown obstructed soft terrain

    Authors: Yang Bai, Phuoc Thanh Tran Ngoc, Huu Duoc Nguyen, Duc Long Le, Quang Huy Ha, Kazuki Kai, Yu Xiang See To, Yaosheng Deng, Jie Song, Naoki Wakamiya, Hirotaka Sato, Masaki Ogura

    Abstract: Navigating multi-robot systems in complex terrains has always been a challenging task. This is due to the inherent limitations of traditional robots in collision avoidance, adaptation to unknown environments, and sustained energy efficiency. In order to overcome these limitations, this research proposes a solution by integrating living insects with miniature electronic controllers to enable roboti… ▽ More

    Submitted 27 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  13. arXiv:2402.16027  [pdf, other

    cs.IT eess.SP

    Enhancing xURLLC with RSMA-Assisted Massive-MIMO Networks: Performance Analysis and Optimization

    Authors: Yuang Chen, Hancheng Lu, Chenwu Zhang, Yansha Deng, Arumugam Nallanathan

    Abstract: Massive interconnection has sparked people's envisioning for next-generation ultra-reliable and low-latency communications (xURLLC), prompting the design of customized next-generation advanced transceivers (NGAT). Rate-splitting multiple access (RSMA) has emerged as a pivotal technology for NGAT design, given its robustness to imperfect channel state information (CSI) and resilience to quality of… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: 14 pages, 11 figures, Submitted to IEEE for potential publication

  14. arXiv:2402.11478  [pdf, other

    eess.SY

    Federated Reinforcement Learning for Uplink Centric Broadband Communication Optimization over Unlicensed Spectrum

    Authors: Hui Zhou, Yansha Deng

    Abstract: To provide Uplink Centric Broadband Communication (UCBC), New Radio Unlicensed (NR-U) network has been standardized to exploit the unlicensed spectrum using Listen Before Talk (LBT) scheme to fairly coexist with the incumbent Wireless Fidelity (WiFi) network. Existing access schemes over unlicensed spectrum are required to perform Clear Channel Assessment (CCA) before transmissions, where fixed En… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  15. arXiv:2401.08096  [pdf, other

    cs.SD eess.AS

    Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval

    Authors: Yimin Deng, Huaizhen Tang, Xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang

    Abstract: Voice conversion refers to transferring speaker identity with well-preserved content. Better disentanglement of speech representations leads to better voice conversion. Recent studies have found that phonetic information from input audio has the potential ability to well represent content. Besides, the speaker-style modeling with pre-trained models making the process more complex. To tackle these… ▽ More

    Submitted 17 January, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted by 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2024)

  16. arXiv:2401.01544  [pdf, other

    cs.CV eess.SP

    Collaborative Perception for Connected and Autonomous Driving: Challenges, Possible Solutions and Opportunities

    Authors: Senkang Hu, Zhengru Fang, Yiqin Deng, Xianhao Chen, Yuguang Fang

    Abstract: Autonomous driving has attracted significant attention from both academia and industries, which is expected to offer a safer and more efficient driving system. However, current autonomous driving systems are mostly based on a single vehicle, which has significant limitations which still poses threats to driving safety. Collaborative perception with connected and autonomous vehicles (CAVs) shows a… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  17. arXiv:2401.01044  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation

    Authors: Jinlong Xue, Yayue Deng, Yingming Gao, Ya Li

    Abstract: Recent advancements in diffusion models and large language models (LLMs) have significantly propelled the field of AIGC. Text-to-Audio (TTA), a burgeoning AIGC application designed to generate audio from natural language prompts, is attracting increasing attention. However, existing TTA studies often struggle with generation quality and text-audio alignment, especially for complex textual inputs.… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Demo and implementation at https://auffusion.github.io

  18. arXiv:2312.16383  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Frame-level emotional state alignment method for speech emotion recognition

    Authors: Qifei Li, Yingming Gao, Cong Wang, Yayue Deng, Jinlong Xue, Yichen Han, Ya Li

    Abstract: Speech emotion recognition (SER) systems aim to recognize human emotional state during human-computer interaction. Most existing SER systems are trained based on utterance-level labels. However, not all frames in an audio have affective states consistent with utterance-level label, which makes it difficult for the model to distinguish the true emotion of the audio and perform poorly. To address th… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  19. arXiv:2312.13182  [pdf, other

    cs.RO eess.SY

    Task-oriented Semantics-aware Communications for Robotic Waypoint Transmission: the Value and Age of Information Approach

    Authors: Wenchao Wu, Yuanqing Yang, Yansha Deng, A. Hamid Aghvami

    Abstract: The ultra-reliable and low-latency communication (URLLC) service of the fifth-generation (5G) mobile communication network struggles to support safe robot operation. Nowadays, the sixth-generation (6G) mobile communication network is proposed to provide hyper-reliable and low-latency communication to enable safer control for robots. However, current 5G/ 6G research mainly focused on improving comm… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  20. arXiv:2312.12358  [pdf, other

    cs.IT eess.SP

    Localization and Discrete Beamforming with a Large Reconfigurable Intelligent Surface

    Authors: Baojia Luo, Yili Deng, Miaomiao Dong, Zhongyi Huang, Xiang Chen, Wei Han, Bo Bai

    Abstract: In millimeter-wave (mmWave) cellular systems, reconfigurable intelligent surfaces (RISs) are foreseeably deployed with a large number of reflecting elements to achieve high beamforming gains. The large-sized RIS will make radio links fall in the near-field localization regime with spatial non-stationarity issues. Moreover, the discrete phase restriction on the RIS reflection coefficient incurs exp… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 13 pages

  21. arXiv:2311.08670  [pdf, other

    cs.SD eess.AS

    CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control and Contrastive Learning with Negative Samples Augmentation

    Authors: Yimin Deng, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

    Abstract: Better disentanglement of speech representation is essential to improve the quality of voice conversion. Recently contrastive learning is applied to voice conversion successfully based on speaker labels. However, the performance of model will reduce in conversion between similar speakers. Hence, we propose an augmented negative sample selection to address the issue. Specifically, we create hard ne… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted by the 21st IEEE International Symposium on Parallel and Distributed Processing with Applications (IEEE ISPA 2023)

  22. arXiv:2310.07062  [pdf, other

    cs.SD cs.LG eess.AS

    Acoustic Model Fusion for End-to-end Speech Recognition

    Authors: Zhihong Lei, Mingbin Xu, Shiyi Han, Leo Liu, Zhen Huang, Tim Ng, Yuanyuan Zhang, Ernest Pusateri, Mirko Hannemann, Yaqiao Deng, Man-Hung Siu

    Abstract: Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, tr… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  23. PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion

    Authors: Yimin Deng, Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

    Abstract: Voice conversion as the style transfer task applied to speech, refers to converting one person's speech into a new speech that sounds like another person's. Up to now, there has been a lot of research devoted to better implementation of VC tasks. However, a good voice conversion model should not only match the timbre information of the target speaker, but also expressive information such as prosod… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted by the 31st ACM International Conference on Multimedia (MM2023)

  24. arXiv:2306.14228  [pdf, ps, other

    eess.SY eess.SP

    Task-Oriented Semantics-Aware Communication for Wireless UAV Control and Command Transmission

    Authors: Yujie Xu, Zhou Hui, Yansha Deng

    Abstract: To guarantee the safety and smooth control of Unmanned Aerial Vehicle (UAV) operation, the new control and command (C&C) data type imposes stringent quality of service (QoS) requirements on the cellular network. However, the existing bit-oriented communication framework is already approaching the Shannon capacity limit, which can hardly guarantee the ultra-reliable low latency communications (URLL… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

  25. DIAS: A Dataset and Benchmark for Intracranial Artery Segmentation in DSA sequences

    Authors: Wentao Liu, Tong Tian, Lemeng Wang, Weijin Xu, Lei Li, Haoyuan Li, Wenyi Zhao, Siyu Tian, Xipeng Pan, Huihua Yang, Feng Gao, Yiming Deng, Xin Yang, Ruisheng Su

    Abstract: The automated segmentation of Intracranial Arteries (IA) in Digital Subtraction Angiography (DSA) plays a crucial role in the quantification of vascular morphology, significantly contributing to computer-assisted stroke research and clinical practice. Current research primarily focuses on the segmentation of single-frame DSA using proprietary datasets. However, these methods face challenges due to… ▽ More

    Submitted 13 June, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

  26. arXiv:2306.04980  [pdf, other

    cs.CL cs.SD eess.AS

    Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models

    Authors: Zhiyi Wang, Shaoguang Mao, Wenshan Wu, Yan Xia, Yan Deng, Jonathan Tien

    Abstract: This work introduces approaches to assessing phrase breaks in ESL learners' speech using pre-trained language models (PLMs) and large language models (LLMs). There are two tasks: overall assessment of phrase break for a speech clip and fine-grained assessment of every possible phrase break position. To leverage NLP models, speech input is first force-aligned with texts, and then pre-processed into… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: Accepted by InterSpeech 2023. arXiv admin note: substantial text overlap with arXiv:2210.16029

  27. arXiv:2305.08000  [pdf, other

    cs.CV eess.IV

    DNN-Compressed Domain Visual Recognition with Feature Adaptation

    Authors: Yingpeng Deng, Lina J. Karam

    Abstract: Learning-based image compression was shown to achieve a competitive performance with state-of-the-art transform-based codecs. This motivated the development of new learning-based visual compression standards such as JPEG-AI. Of particular interest to these emerging standards is the development of learning-based image compression systems targeting both humans and machines. This paper is concerned w… ▽ More

    Submitted 26 July, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

  28. arXiv:2305.02269  [pdf, other

    cs.SD cs.CL eess.AS

    M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis

    Authors: Jinlong Xue, Yayue Deng, Fengping Wang, Ya Li, Yingming Gao, Jianhua Tao, Jianqing Sun, Jiaen Liang

    Abstract: Conversational text-to-speech (TTS) aims to synthesize speech with proper prosody of reply based on the historical conversation. However, it is still a challenge to comprehensively model the conversation, and a majority of conversational TTS systems only focus on extracting global information and omit local prosody features, which contain important fine-grained information like keywords and emphas… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: 5 pages, 1 figures, 2 tables. Accepted by ICASSP 2023

  29. arXiv:2303.17949  [pdf, other

    cs.SD cs.LG eess.AS

    Unsupervised Anomaly Detection and Localization of Machine Audio: A GAN-based Approach

    Authors: Anbai Jiang, Wei-Qiang Zhang, Yufeng Deng, Pingyi Fan, Jia Liu

    Abstract: Automatic detection of machine anomaly remains challenging for machine learning. We believe the capability of generative adversarial network (GAN) suits the need of machine audio anomaly detection, yet rarely has this been investigated by previous work. In this paper, we propose AEGAN-AD, a totally unsupervised approach in which the generator (also an autoencoder) is trained to reconstruct input s… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  30. arXiv:2303.10398  [pdf, other

    cs.NI cs.LG cs.MA eess.SP

    Energy-Efficient Cellular-Connected UAV Swarm Control Optimization

    Authors: Yang Su, Hui Zhou, Yansha Deng, Mischa Dohler

    Abstract: Cellular-connected unmanned aerial vehicle (UAV) swarm is a promising solution for diverse applications, including cargo delivery and traffic control. However, it is still challenging to communicate with and control the UAV swarm with high reliability, low latency, and high energy efficiency. In this paper, we propose a two-phase command and control (C&C) transmission scheme in a cellular-connecte… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

  31. arXiv:2302.09332  [pdf, other

    eess.SP

    Incipient Fault Detection in Power Distribution System: A Time-Frequency Embedded Deep Learning Based Approach

    Authors: Qiyue Li, Huan Luo, Hong Cheng, Yuxing Deng, Wei Sun, Weitao Li, Zhi Liu

    Abstract: Incipient fault detection in power distribution systems is crucial to improve the reliability of the grid. However, the non-stationary nature and the inadequacy of the training dataset due to the self-recovery of the incipient fault signal, make the incipient fault detection in power distribution systems a great challenge. In this paper, we focus on incipient fault detection in power distribution… ▽ More

    Submitted 18 February, 2023; originally announced February 2023.

    Comments: 15 pages

  32. arXiv:2211.05295  [pdf, other

    cs.CV cs.LG eess.IV

    Harmonizing output imbalance for defect segmentation on extremely-imbalanced photovoltaic module cells images

    Authors: Jianye Yi, Xiaopin Zhong, Weixiang Liu, Zongze Wu, Yuanlong Deng, Zhengguang Wu

    Abstract: The continuous development of the photovoltaic (PV) industry has raised high requirements for the quality of monocrystalline of PV module cells. When learning to segment defect regions in PV module cell images, Tiny Hidden Cracks (THC) lead to extremely-imbalanced samples. The ratio of defect pixels to normal pixels can be as low as 1:2000. This extreme imbalance makes it difficult to segment the… ▽ More

    Submitted 24 October, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: 19 pages, 16 figures, 3 appendixes

  33. arXiv:2211.01676  [pdf, other

    cs.AI eess.SY

    Repeatable Random Permutation Set

    Authors: Wenran Yang, Yong Deng

    Abstract: Random permutation set (RPS), as a recently proposed theory, enables powerful information representation by traversing all possible permutations. However, the repetition of items is not allowed in RPS while it is quite common in real life. To address this issue, we propose repeatable random permutation set ($\rm R^2PS$) which takes the repetition of items into consideration. The right and left jun… ▽ More

    Submitted 4 November, 2022; v1 submitted 3 November, 2022; originally announced November 2022.

  34. arXiv:2210.17016  [pdf, other

    cs.SD eess.AS

    Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit

    Authors: Hongji Wang, Chengdong Liang, Shuai Wang, Zhengyang Chen, Binbin Zhang, Xu Xiang, Yanlei Deng, Yanmin Qian

    Abstract: Speaker modeling is essential for many related tasks, such as speaker recognition and speaker diarization. The dominant modeling approach is fixed-dimensional vector representation, i.e., speaker embedding. This paper introduces a research and production oriented speaker embedding learning toolkit, Wespeaker. Wespeaker contains the implementation of scalable data management, state-of-the-art speak… ▽ More

    Submitted 1 November, 2022; v1 submitted 30 October, 2022; originally announced October 2022.

  35. arXiv:2210.09372  [pdf, other

    eess.SY eess.SP

    Goal-Oriented Semantic Communications for 6G Networks

    Authors: Hui Zhou, Yansha Deng, Xiaonan Liu, Nikolaos Pappas, Arumugam Nallanathan

    Abstract: Upon the arrival of emerging devices, including Extended Reality (XR) and Unmanned Aerial Vehicles (UAVs), the traditional communication framework is approaching Shannon's physical capacity limit and fails to guarantee the massive amount of transmission within latency requirements. By jointly exploiting the context of data and its importance to the task, an emerging communication paradigm shift to… ▽ More

    Submitted 6 April, 2024; v1 submitted 17 October, 2022; originally announced October 2022.

  36. arXiv:2209.09411  [pdf, other

    eess.SY

    Shepherding Control for Separating a Single Agent from a Swarm

    Authors: Yaosheng Deng, Masaki Ogura, Aiyi Li, Naoki Wakamiya

    Abstract: In this paper, we consider the swarm-control problem of spatially separating a specified target agent within the swarm from all the other agents, while maintaining the connectivity among the other agents. We specifically aim to achieve the separation by designing the movement algorithm of an external agent, called a shepherd, which exerts repulsive forces on the agents in the swarm. This problem h… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: 6 pages, 6 figures

  37. arXiv:2207.00908  [pdf, other

    cs.NI eess.SY

    Interference Constrained Beam Alignment for Time-Varying Channels via Kernelized Bandits

    Authors: Yuntian Deng, Xingyu Zhou, Arnob Ghosh, Abhishek Gupta, Ness B. Shroff

    Abstract: To fully utilize the abundant spectrum resources in millimeter wave (mmWave), Beam Alignment (BA) is necessary for large antenna arrays to achieve large array gains. In practical dynamic wireless environments, channel modeling is challenging due to time-varying and multipath effects. In this paper, we formulate the beam alignment problem as a non-stationary online learning problem with the objecti… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

  38. arXiv:2206.14150  [pdf, other

    cs.DC eess.SY

    Autonomous Smart Grid Fault Detection

    Authors: Qiyue Li, Yuxing Deng, Xin Liu, Wei Sun, Weitao Li, Jie Li, Zhi Liu

    Abstract: Smart grid plays a crucial role for the smart society and the upcoming carbon neutral society. Achieving autonomous smart grid fault detection is critical for smart grid system state awareness, maintenance and operation. This paper focuses on fault monitoring in smart grid and discusses the inherent technical challenges and solutions. In particular, we first present the basic principles of smart g… ▽ More

    Submitted 27 May, 2022; originally announced June 2022.

  39. arXiv:2204.12426  [pdf, ps, other

    cs.LG eess.SY

    Time-triggered Federated Learning over Wireless Networks

    Authors: Xiaokang Zhou, Yansha Deng, Huiyun Xia, Shaochuan Wu, Mehdi Bennis

    Abstract: The newly emerging federated learning (FL) framework offers a new way to train machine learning models in a privacy-preserving manner. However, traditional FL algorithms are based on an event-triggered aggregation, which suffers from stragglers and communication overhead issues. To address these issues, in this paper, we present a time-triggered FL algorithm (TT-Fed) over wireless networks, which… ▽ More

    Submitted 2 May, 2022; v1 submitted 26 April, 2022; originally announced April 2022.

  40. arXiv:2204.10461  [pdf, other

    cs.CL cs.SD eess.AS

    WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment

    Authors: Lin Yao, Jianfei Song, Ruizhuo Xu, Yingfang Yang, Zijian Chen, Yafeng Deng

    Abstract: Historically lower-level tasks such as automatic speech recognition (ASR) and speaker identification are the main focus in the speech field. Interest has been growing in higher-level spoken language understanding (SLU) tasks recently, like sentiment analysis (SA). However, improving performances on SLU tasks remains a big challenge. Basically, there are two main methods for SLU tasks: (1) Two-stag… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

  41. arXiv:2204.08169  [pdf, ps, other

    cs.NI eess.SP

    Actions at the Edge: Jointly Optimizing the Resources in Multi-access Edge Computing

    Authors: Yiqin Deng, Xianhao Chen, Guangyu Zhu, Yuguang Fang, Zhigang Chen, Xiaoheng Deng

    Abstract: Multi-access edge computing (MEC) is an emerging paradigm that pushes resources for sensing, communications, computing, storage and intelligence (SCCSI) to the premises closer to the end users, i.e., the edge, so that they could leverage the nearby rich resources to improve their quality of experience (QoE). Due to the growing emerging applications targeting at intelligentizing life-sustaining cyb… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

    Comments: 7 pages, 2 figures, accepted by IEEE Wireless Communications

  42. arXiv:2203.10473  [pdf, other

    cs.SD cs.LG eess.AS

    ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis

    Authors: Jinlong Xue, Yayue Deng, Yichen Han, Ya Li, Jianqing Sun, Jiaen Liang

    Abstract: In recent years, neural network based methods for multi-speaker text-to-speech synthesis (TTS) have made significant progress. However, the current speaker encoder models used in these methods still cannot capture enough speaker information. In this paper, we focus on accurate speaker encoder modeling and propose an end-to-end method that can generate high-quality speech and better similarity for… ▽ More

    Submitted 26 March, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

    Comments: 5 pages, 2 figures, submitted to interspeech2022

  43. arXiv:2203.03004  [pdf, other

    cs.IT eess.SP

    Low-Complexity Beamforming Design for IRS-Aided NOMA Communication System with Imperfect CSI

    Authors: Yasaman Omid, S. M. Mahdi Shahabi, Cunhua Pan, Yansha Deng, Arumugam Nallanathan

    Abstract: Intelligent reflecting surface (IRS) as a promising technology rendering high throughput in future communication systems is compatible with various communication techniques such as non-orthogonal multiple-access (NOMA). In this paper, the downlink transmission of IRS-assisted NOMA communication is considered while undergoing imperfect channel state information (CSI). Consequently, a robust IRS-aid… ▽ More

    Submitted 16 March, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

  44. arXiv:2203.02098  [pdf, other

    eess.IV cs.CV

    Universal Segmentation of 33 Anatomies

    Authors: Pengbo Liu, Yang Deng, Ce Wang, Yuan Hui, Qian Li, Jun Li, Shiwei Luo, Mengke Sun, Quan Quan, Shuxin Yang, You Hao, Honghu Xiao, Chunpeng Zhao, Xinbao Wu, S. Kevin Zhou

    Abstract: In the paper, we present an approach for learning a single model that universally segments 33 anatomical structures, including vertebrae, pelvic bones, and abdominal organs. Our model building has to address the following challenges. Firstly, while it is ideal to learn such a model from a large-scale, fully-annotated dataset, it is practically hard to curate such a dataset. Thus, we resort to lear… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

  45. arXiv:2112.09312  [pdf, other

    cs.SD cs.LG eess.AS

    MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

    Authors: Yusong Wu, Ethan Manilow, Yi Deng, Rigel Swavely, Kyle Kastner, Tim Cooijmans, Aaron Courville, Cheng-Zhi Anna Huang, Jesse Engel

    Abstract: Musical expression requires control of both what notes are played, and how they are performed. Conventional audio synthesizers provide detailed expressive controls, but at the cost of realism. Black-box neural audio synthesis and concatenative samplers can produce realistic audio, but have few mechanisms for control. In this work, we introduce MIDI-DDSP a hierarchical model of musical instruments… ▽ More

    Submitted 17 March, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted by International Conference on Learning Representations (ICLR) 2022

  46. arXiv:2111.09284  [pdf, other

    eess.SY

    Optimization of Grant-Free NOMA with Multiple Configured-Grants for mURLLC

    Authors: Yan Liu, Yansha Deng, Maged Elkashlan, Arumugam Nallanathan, George K. Karagiannidis

    Abstract: Massive Ultra-Reliable and Low-Latency Communications (mURLLC), which integrates URLLC with massive access, is emerging as a new and important service class in the next generation (6G) for time-sensitive traffics and has recently received tremendous research attention. However, realizing efficient, delay-bounded, and reliable communications for a massive number of user equipments (UEs) in mURLLC,… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

    Comments: 15 pages, 15 figures, submitted to IEEE JSAC SI on Next Generation Multiple Access. arXiv admin note: text overlap with arXiv:2101.00515

  47. arXiv:2111.00418  [pdf, other

    cs.HC cs.LG eess.SP

    Deep Learning in Human Activity Recognition with Wearable Sensors: A Review on Advances

    Authors: Shibo Zhang, Yaxuan Li, Shen Zhang, Farzad Shahabi, Stephen Xia, Yu Deng, Nabil Alshurafa

    Abstract: Mobile and wearable devices have enabled numerous applications, including activity tracking, wellness monitoring, and human--computer interaction, that measure and improve our daily lives. Many of these applications are made possible by leveraging the rich collection of low-power sensors found in many mobile and wearable devices to perform human activity recognition (HAR). Recently, deep learning… ▽ More

    Submitted 3 March, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

  48. arXiv:2108.00506  [pdf, other

    eess.SP

    Scalable Multi-agent Reinforcement Learning Algorithm for Wireless Networks

    Authors: Fenghe Hu, Yansha Deng, A. Hamid Aghvami

    Abstract: Scalability is the key roadstone towards the application of cooperative intelligent algorithms in large-scale networks. Reinforcement learning (RL) is known as model-free and high efficient intelligent algorithm for communication problems and proved useful in the communication network. However, when coming to large-scale networks with limited centralization, it is not possible to employ a centrali… ▽ More

    Submitted 4 November, 2021; v1 submitted 1 August, 2021; originally announced August 2021.

    Comments: 18 pages, 9 figures

  49. arXiv:2107.12943  [pdf, other

    eess.SP

    Learning-based Prediction, Rendering and Transmission for Interactive Virtual Reality in RIS-Assisted Terahertz Networks

    Authors: Xiaonan Liu, Yansha Deng, Chong Han, Marco Di Renzo

    Abstract: The quality of experience (QoE) requirements of wireless Virtual Reality (VR) can only be satisfied with high data rate, high reliability, and low VR interaction latency. This high data rate over short transmission distances may be achieved via abundant bandwidth in the terahertz (THz) band. However, THz waves suffer from severe signal attenuation, which may be compensated by the reconfigurable in… ▽ More

    Submitted 27 July, 2021; originally announced July 2021.

  50. arXiv:2106.04312  [pdf, other

    eess.AS cs.SD

    Speech BERT Embedding For Improving Prosody in Neural TTS

    Authors: Liping Chen, Yan Deng, Xi Wang, Frank K. Soong, Lei He

    Abstract: This paper presents a speech BERT model to extract embedded prosody information in speech segments for improving the prosody of synthesized speech in neural text-to-speech (TTS). As a pre-trained model, it can learn prosody attributes from a large amount of speech data, which can utilize more data than the original training data used by the target TTS. The embedding is extracted from the previous… ▽ More

    Submitted 14 September, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Journal ref: ICASSP 2021