Zum Hauptinhalt springen

Showing 1–50 of 199 results for author: Zhu, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.14472  [pdf, other

    cs.RO cs.AI eess.SY

    Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning

    Authors: Xinyang Gu, Yen-Jen Wang, Xiang Zhu, Chengming Shi, Yanjiang Guo, Yichen Liu, Jianyu Chen

    Abstract: Humanoid robots, with their human-like skeletal structure, are especially suited for tasks in human-centric environments. However, this structure is accompanied by additional challenges in locomotion controller design, especially in complex real-world environments. As a result, existing humanoid robots are limited to relatively simple terrains, either with model-based control or model-free reinfor… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Robotics: Science and Systems (RSS), 2024. (Best Paper Award Finalist)

  2. arXiv:2408.12897  [pdf, other

    eess.IV cs.CV

    When Diffusion MRI Meets Diffusion Model: A Novel Deep Generative Model for Diffusion MRI Generation

    Authors: Xi Zhu, Wei Zhang, Yijie Li, Lauren J. O'Donnell, Fan Zhang

    Abstract: Diffusion MRI (dMRI) is an advanced imaging technique characterizing tissue microstructure and white matter structural connectivity of the human brain. The demand for high-quality dMRI data is growing, driven by the need for better resolution and improved tissue contrast. However, acquiring high-quality dMRI data is expensive and time-consuming. In this context, deep generative modeling emerges as… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 11 pages, 3 figures

  3. arXiv:2408.03265  [pdf, other

    eess.IV

    BVI-AOM: A New Training Dataset for Deep Video Compression Optimization

    Authors: Jakub Nawała, Yuxuan Jiang, Fan Zhang, Xiaoqing Zhu, Joel Sole, David Bull

    Abstract: Deep learning is now playing an important role in enhancing the performance of conventional hybrid video codecs. These learning-based methods typically require diverse and representative training material for optimization in order to achieve model generalization and optimal coding performance. However, existing datasets either offer limited content variability or come with restricted licensing ter… ▽ More

    Submitted 7 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: 6 pages, 5 figures. Swapped the PSNR-HVS plot in Fig. 3 for a PSNR-YUV plot

  4. arXiv:2407.17758  [pdf, other

    eess.SP

    Speed-enhanced Subdomain Adaptation Regression for Long-term Stable Neural Decoding in Brain-computer Interfaces

    Authors: Jiyu Wei, Dazhong Rong, Xinyun Zhu, Qinming He, Yueming Wang

    Abstract: Brain-computer interfaces (BCIs) offer a means to convert neural signals into control signals, providing a potential restoration of movement for people with paralysis. Despite their promise, BCIs face a significant challenge in maintaining decoding accuracy over time due to neural nonstationarities. However, the decoding accuracy of BCI drops severely across days due to the neural data drift. Whil… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  5. arXiv:2407.06767  [pdf, other

    cs.IT eess.SP

    Enhancing Robustness and Security in ISAC Network Design: Leveraging Transmissive Reconfigurable Intelligent Surface with RSMA

    Authors: Ziwei Liu, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, Qiong Wu, Nan Cheng

    Abstract: In this paper, we propose a novel transmissive reconfigurable intelligent surface transceiver-enhanced robust and secure integrated sensing and communication network. A time-division sensing communication mechanism is designed for the scenario, which enables communication and sensing to share wireless resources. To address the interference management problem and hinder eavesdropping, we implement… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  6. arXiv:2407.05331  [pdf, ps, other

    eess.SY

    Channel Characterization of IRS-assisted Resonant Beam Communication Systems

    Authors: Wen Fang, Wen Chen, Qingqing Wu, Xusheng Zhu, Qiong Wu, Nan Cheng

    Abstract: To meet the growing demand for data traffic, spectrum-rich optical wireless communication (OWC) has emerged as a key technological driver for the development of 6G. The resonant beam communication (RBC) system, which employs spatially separated laser cavities as the transmitter and receiver, is a high-speed OWC technology capable of self-alignment without tracking. However, its transmission throug… ▽ More

    Submitted 15 August, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

  7. arXiv:2407.04737  [pdf, other

    eess.SP cs.AI

    Hierarchical Decoupling Capacitor Optimization for Power Distribution Network of 2.5D ICs with Co-Analysis of Frequency and Time Domains Based on Deep Reinforcement Learning

    Authors: Yuanyuan Duan, Haiyang Feng, Zhiping Yu, Hanming Wu, Leilai Shao, Xiaolei Zhu

    Abstract: With the growing need for higher memory bandwidth and computation density, 2.5D design, which involves integrating multiple chiplets onto an interposer, emerges as a promising solution. However, this integration introduces significant challenges due to increasing data rates and a large number of I/Os, necessitating advanced optimization of the power distribution networks (PDNs) both on-chip and on… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  8. arXiv:2407.03388  [pdf

    physics.soc-ph eess.SY

    Passenger Route and Departure Time Guidance under Disruptions in Oversaturated Urban Rail Transit Networks

    Authors: Siyu Zhuo, Xiaoning Zhu, Pan Shang, Zhengke Liu

    Abstract: The urban rail transit (URT) system attracts many commuters with its punctuality and convenience. However, it is vulnerable to disruptions caused by factors like extreme weather and temporary equipment failures, which greatly impact passengers' journeys and diminish the system's service quality. In this study, we propose targeted travel guidance for passengers at different space-time locations by… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  9. arXiv:2406.09846  [pdf, ps, other

    cs.IT eess.SP

    Multiple Intelligent Reflecting Surfaces Collaborative Wireless Localization System

    Authors: Ziheng Zhang, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, Jingfeng Chen, Nan Cheng

    Abstract: This paper studies a multiple intelligent reflecting surfaces (IRSs) collaborative localization system where multiple semi-passive IRSs are deployed in the network to locate one or more targets based on time-of-arrival. It is assumed that each semi-passive IRS is equipped with reflective elements and sensors, which are used to establish the line-of-sight links from the base station (BS) to multipl… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 13 pages, 8 figures

  10. arXiv:2406.09844  [pdf, other

    cs.SD eess.AS

    Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy

    Authors: Linhan Ma, Xinfa Zhu, Yuanjun Lv, Zhichao Wang, Ziqian Wang, Wendi He, Hongbin Zhou, Lei Xie

    Abstract: Zero-shot voice conversion (VC) aims to transform source speech into arbitrary unseen target voice while keeping the linguistic content unchanged. Recent VC methods have made significant progress, but semantic losses in the decoupling process as well as training-inference mismatch still hinder conversion performance. In this paper, we propose Vec-Tok-VC+, a novel prompt-based zero-shot VC model im… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  11. arXiv:2406.08920  [pdf, other

    cs.SD cs.AI eess.AS

    AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis

    Authors: Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiankang Deng, Xiatian Zhu

    Abstract: Novel view acoustic synthesis (NVAS) aims to render binaural audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene. Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing binaural audio. However, in addition to low efficiency originating from heavy NeRF rendering, these methods all have a limited ability… ▽ More

    Submitted 14 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  12. arXiv:2406.07422  [pdf, other

    eess.AS

    Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation

    Authors: Hanzhao Li, Liumeng Xue, Haohan Guo, Xinfa Zhu, Yuanjun Lv, Lei Xie, Yunlin Chen, Hao Yin, Zhifei Li

    Abstract: The multi-codebook speech codec enables the application of large language models (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction. To avoid this obstacle, we propose Single-Codec, a single-codebook single-sequence codec, which employs a disentangled VQ-VAE to decouple speech into a time-invariant embedding and a phonetically-rich discrete sequence. Furthermor… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  13. arXiv:2406.06998  [pdf, other

    eess.SP

    Movable Antenna Enhanced NOMA Short-Packet Transmission

    Authors: Xinyuan He, Wen Chen, Qingqing Wu, Xusheng Zhu, Nan Cheng

    Abstract: This letter investigates a short-packet downlink transmission system using non-orthogonal multiple access (NOMA) enhanced via movable antenna (MA). We focuses on maximizing the effective throughput for a core user while ensuring reliable communication for an edge user by optimizing the MAs' coordinates and the power and rate allocations from the access point (AP). The optimization challenge is app… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, 4 figures

  14. Multi-Objective Sizing Optimization Method of Microgrid Considering Cost and Carbon Emissions

    Authors: Xiang Zhu, Guangchun Ruan, Hua Geng, Honghai Liu, Mingfei Bai, Chao Peng

    Abstract: Microgrid serves as a promising solution to integrate and manage distributed renewable energy resources. In this paper, we establish a stochastic multi-objective sizing optimization (SMOSO) model for microgrid planning, which fully captures the battery degradation characteristics and the total carbon emissions. The microgrid operator aims to simultaneously maximize the economic benefits and minimi… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Industry Applications

  15. arXiv:2406.05976  [pdf, other

    eess.SY

    Dynamic Virtual Power Plants With Frequency Regulation Capacity

    Authors: Xiang Zhu, Guangchun Ruan, Hua Geng

    Abstract: For integrating heterogeneous distributed energy resources to provide fast frequency regulation, this paper proposes a dynamic virtual power plant~(DVPP) with frequency regulation capacity. A parameter anonymity-based approach is established for DVPP aggregating small-scaled inverter-based resources~(IBRs) with privacy concerns. On this basis, a parameter-to-performance mapping is formulated to ev… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by IAS Annual Meeting 2024

  16. arXiv:2406.05672  [pdf, other

    eess.AS

    Text-aware and Context-aware Expressive Audiobook Speech Synthesis

    Authors: Dake Guo, Xinfa Zhu, Liumeng Xue, Yongmao Zhang, Wenjie Tian, Lei Xie

    Abstract: Recent advances in text-to-speech have significantly improved the expressiveness of synthetic speech. However, a major challenge remains in generating speech that captures the diverse styles exhibited by professional narrators in audiobooks without relying on manually labeled data or reference speech. To address this problem, we propose a text-aware and context-aware(TACA) style modeling approach… ▽ More

    Submitted 12 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  17. arXiv:2406.04111  [pdf, other

    cs.CV eess.IV

    UrbanSARFloods: Sentinel-1 SLC-Based Benchmark Dataset for Urban and Open-Area Flood Mapping

    Authors: Jie Zhao, Zhitong Xiong, Xiao Xiang Zhu

    Abstract: Due to its cloud-penetrating capability and independence from solar illumination, satellite Synthetic Aperture Radar (SAR) is the preferred data source for large-scale flood mapping, providing global coverage and including various land cover classes. However, most studies on large-scale SAR-derived flood mapping using deep learning algorithms have primarily focused on flooded open areas, utilizing… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR 2024 EarthVision Workshop

  18. arXiv:2405.18692  [pdf, other

    cs.IT eess.SP

    Movable Antenna Empowered Downlink NOMA Systems: Power Allocation and Antenna Position Optimization

    Authors: Yufeng Zhou, Wen Chen, Qingqing Wu, Xusheng Zhu, Nan Cheng

    Abstract: This paper investigates a novel communication paradigm employing movable antennas (MAs) within a multiple-input single-output (MISO) non-orthogonal multiple access (NOMA) downlink framework, where users are equipped with MAs. Initially, leveraging the far-field response, we delineate the channel characteristics concerning both the power allocation coefficient and positions of MAs. Subsequently, we… ▽ More

    Submitted 7 August, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  19. arXiv:2405.13901  [pdf, other

    cs.CV cs.LG eess.SP

    DCT-Based Decorrelated Attention for Vision Transformers

    Authors: Hongyi Pan, Emadeldeen Hamdan, Xin Zhu, Koushik Biswas, Ahmet Enis Cetin, Ulas Bagci

    Abstract: Central to the Transformer architectures' effectiveness is the self-attention mechanism, a function that maps queries, keys, and values into a high-dimensional vector space. However, training the attention weights of queries, keys, and values is non-trivial from a state of random initialization. In this paper, we propose two methods. (i) We first address the initialization problem of Vision Transf… ▽ More

    Submitted 28 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  20. arXiv:2405.04285  [pdf, other

    cs.AI eess.SP

    On the Foundations of Earth and Climate Foundation Models

    Authors: Xiao Xiang Zhu, Zhitong Xiong, Yi Wang, Adam J. Stewart, Konrad Heidler, Yuanyuan Wang, Zhenghang Yuan, Thomas Dujardin, Qingsong Xu, Yilei Shi

    Abstract: Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model. Crafting the ideal Earth foundation model, we define eleven features which would allow such a foundation model to be beneficial for any geoscientific downstream application in an en… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  21. arXiv:2404.07932  [pdf, other

    cs.CV eess.IV

    FusionMamba: Efficient Image Fusion with State Space Model

    Authors: Siran Peng, Xiangyu Zhu, Haoyu Deng, Zhen Lei, Liang-Jian Deng

    Abstract: Image fusion aims to generate a high-resolution multi/hyper-spectral image by combining a high-resolution image with limited spectral information and a low-resolution image with abundant spectral data. Current deep learning (DL)-based methods for image fusion primarily rely on CNNs or Transformers to extract features and merge different types of data. While CNNs are efficient, their receptive fiel… ▽ More

    Submitted 10 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  22. arXiv:2404.00872  [pdf, other

    cs.IT eess.SP

    Performance Evaluation of RIS-Assisted Spatial Modulation for Downlink Transmission

    Authors: Xusheng Zhu, Qingqing Wu, Wen Chen

    Abstract: This paper explores the performance of reconfigurable intelligent surface (RIS) assisted spatial modulation (SM) downlink communication systems, focusing on the average bit error probability (ABEP). Notably, in scenarios with a large number of reflecting units, the composite channel can be approximated by a Gaussian distribution using the central limit theorem. The receiver utilizes a maximum like… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.02893

  23. arXiv:2403.19457  [pdf, other

    cs.IT eess.SP

    Transmissive RIS Transmitter Enabled Spatial Modulation for MIMO Systems

    Authors: Xusheng Zhu, Qingqing Wu, Wen Chen

    Abstract: In this paper, we propose a novel transmissive reconfigurable intelligent surface (TRIS) transmitter-enabled spatial modulation (SM) multiple-input multiple-output (MIMO) system. In the transmission phase, a column-wise activation strategy is implemented for the TRIS panel, where the specific column elements are activated per time slot. Concurrently, the receiver employs the maximum likelihood det… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  24. arXiv:2403.18846  [pdf, other

    cs.IT cs.AI cs.LG eess.SP

    The Blind Normalized Stein Variational Gradient Descent-Based Detection for Intelligent Massive Random Access

    Authors: Xin Zhu, Ahmet Enis Cetin

    Abstract: The lack of an efficient preamble detection algorithm remains a challenge for solving preamble collision problems in intelligent massive random access (RA) in practical communication scenarios. To solve this problem, we present a novel early preamble detection scheme based on a maximum likelihood estimation (MLE) model at the first step of the grant-based RA procedure. A novel blind normalized Ste… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  25. arXiv:2403.17751  [pdf, other

    cs.IT eess.SP

    Robust Analysis of Full-Duplex Two-Way Space Shift Keying With RIS Systems

    Authors: Xusheng Zhu, Wen Chen, Qingqing Wu, Wen Fang, Chaoying Huang, Jun Li

    Abstract: Reconfigurable intelligent surface (RIS)-assisted index modulation system schemes are considered a promising technology for sixth-generation (6G) wireless communication systems, which can enhance various system capabilities such as coverage and reliability. However, obtaining perfect channel state information (CSI) is challenging due to the lack of a radio frequency chain in RIS. In this paper, we… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  26. arXiv:2403.05435  [pdf, other

    cs.CV eess.IV eess.SP

    OmniCount: Multi-label Object Counting with Semantic-Geometric Priors

    Authors: Anindya Mondal, Sauradip Nag, Xiatian Zhu, Anjan Dutta

    Abstract: Object counting is pivotal for understanding the composition of scenes. Previously, this task was dominated by class-specific methods, which have gradually evolved into more adaptable class-agnostic strategies. However, these strategies come with their own set of limitations, such as the need for manual exemplar input and multiple passes for multiple categories, resulting in significant inefficien… ▽ More

    Submitted 20 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  27. arXiv:2403.05024  [pdf, other

    eess.IV cs.CV cs.LG

    A Probabilistic Hadamard U-Net for MRI Bias Field Correction

    Authors: Xin Zhu, Hongyi Pan, Yury Velichko, Adam B. Murphy, Ashley Ross, Baris Turkbey, Ahmet Enis Cetin, Ulas Bagci

    Abstract: Magnetic field inhomogeneity correction remains a challenging task in MRI analysis. Most established techniques are designed for brain MRI by supposing that image intensities in the identical tissue follow a uniform distribution. Such an assumption cannot be easily applied to other organs, especially those that are small in size and heterogeneous in texture (large variations in intensity), such as… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  28. arXiv:2402.02893  [pdf, other

    cs.IT eess.SP

    On the Performance of RIS-Aided Spatial Modulation for Downlink Transmission

    Authors: Xusheng Zhu, Qingqing Wu, Wen Chen

    Abstract: In this study, we explore the performance of a reconfigurable reflecting surface (RIS)-assisted transmit spatial modulation (SM) system for downlink transmission, wherein the deployment of RIS serves the purpose of blind area coverage within the channel. At the receiving end, we present three detectors, i.e., maximum likelihood (ML) detector, two-stage ML detection, and greedy detector to recover… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  29. arXiv:2401.10345  [pdf, other

    eess.IV

    Attack and Defense Analysis of Learned Image Compression

    Authors: Tianyu Zhu, Heming Sun, Xiankui Xiong, Xuanpeng Zhu, Yong Gong, Minge jing, Yibo Fan

    Abstract: Learned image compression (LIC) is becoming more and more popular these years with its high efficiency and outstanding compression quality. Still, the practicality against modified inputs added with specific noise could not be ignored. White-box attacks such as FGSM and PGD use only gradient to compute adversarial images that mislead LIC models to output unexpected results. Our experiments compare… ▽ More

    Submitted 27 March, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  30. arXiv:2312.16850  [pdf, other

    cs.SD eess.AS

    Accent-VITS:accent transfer for end-to-end TTS

    Authors: Linhan Ma, Yongmao Zhang, Xinfa Zhu, Yi Lei, Ziqian Ning, Pengcheng Zhu, Lei Xie

    Abstract: Accent transfer aims to transfer an accent from a source speaker to synthetic speech in the target speaker's voice. The main challenge is how to effectively disentangle speaker timbre and accent which are entangled in speech. This paper presents a VITS-based end-to-end accent transfer model named Accent-VITS.Based on the main structure of VITS, Accent-VITS makes substantial improvements to enable… ▽ More

    Submitted 29 December, 2023; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted by NCMMSC2023

  31. arXiv:2312.16064  [pdf, other

    cs.NI eess.SP

    Goal-Oriented Integration of Sensing, Communication, Computing, and Control for Mission-Critical Internet-of-Things

    Authors: Jie Cao, Ernest Kurniawan, Amnart Boonkajay, Sumei Sun, Petar Popovski, Xu Zhu

    Abstract: Driven by the development goal of network paradigm and demand for various functions in the sixth-generation (6G) mission-critical Internet-of-Things (MC-IoT), we foresee a goal-oriented integration of sensing, communication, computing, and control (GIS3C) in this paper. We first provide an overview of the tasks, requirements, and challenges of MC-IoT. Then we introduce an end-to-end GIS3C architec… ▽ More

    Submitted 1 January, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

  32. arXiv:2312.15454  [pdf, other

    cs.IT eess.SY

    Risk-Aware and Energy-Efficient AoI Optimization for Multi-Connectivity WNCS with Short Packet Transmissions

    Authors: Jie Cao, Xu Zhu, Sumei Sun, Ernest Kurniawan, Amnart Boonkajay

    Abstract: Age of Information (AoI) has been proposed to quantify the freshness of information for emerging real-time applications such as remote monitoring and control in wireless networked control systems (WNCSs). Minimization of the average AoI and its outage probability can ensure timely and stable transmission. Energy efficiency (EE) also plays an important role in WNCSs, as many devices are featured by… ▽ More

    Submitted 1 January, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

  33. arXiv:2312.12581  [pdf

    physics.med-ph eess.IV physics.ins-det

    Wireless, Customizable Coaxially-shielded Coils for Magnetic Resonance Imaging

    Authors: Ke Wu, Xia Zhu, Stephan W. Anderson, Xin Zhang

    Abstract: Anatomy-specific RF receive coil arrays routinely adopted in magnetic resonance imaging (MRI) for signal acquisition, are commonly burdened by their bulky, fixed, and rigid configurations, which may impose patient discomfort, bothersome positioning, and suboptimal sensitivity in certain situations. Herein, leveraging coaxial cables' inherent flexibility and electric field confining property, for t… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  34. arXiv:2312.10018  [pdf

    physics.med-ph eess.IV

    Wearable Coaxially-shielded Metamaterial for Magnetic Resonance Imaging

    Authors: Xia Zhu, Ke Wu, Stephan W. Anderson, Xin Zhang

    Abstract: Recent advancements in metamaterials have yielded the possibility of a wireless solution to improve signal-to-noise ratio (SNR) in magnetic resonance imaging (MRI). Unlike traditional closely packed local coil arrays with rigid designs and numerous components, these lightweight, cost-effective metamaterials eliminate the need for radio frequency (RF) cabling, baluns, adapters, and interfaces. Howe… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  35. arXiv:2312.09747  [pdf, other

    eess.AS eess.SP

    SELM: Speech Enhancement Using Discrete Tokens and Language Models

    Authors: Ziqian Wang, Xinfa Zhu, Zihan Zhang, YuanJun Lv, Ning Jiang, Guoqing Zhao, Lei Xie

    Abstract: Language models (LMs) have shown superior performances in various speech generation tasks recently, demonstrating their powerful ability for semantic context modeling. Given the intrinsic similarity between speech generation and speech enhancement, harnessing semantic information holds potential advantages for speech enhancement tasks. In light of this, we propose SELM, a novel paradigm for speech… ▽ More

    Submitted 7 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  36. arXiv:2311.13254  [pdf, other

    cs.CV cs.AI eess.IV

    Unified Domain Adaptive Semantic Segmentation

    Authors: Zhe Zhang, Gaochang Wu, Jing Zhang, Xiatian Zhu, Dacheng Tao, Tianyou Chai

    Abstract: Unsupervised Domain Adaptive Semantic Segmentation (UDA-SS) aims to transfer the supervision from a labeled source domain to an unlabeled target domain. The majority of existing UDA-SS works typically consider images whilst recent attempts have extended further to tackle videos by modeling the temporal dimension. Although the two lines of research share the major challenges -- overcoming the under… ▽ More

    Submitted 20 August, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: 18 pages,10 figures, 11 tables

  37. arXiv:2311.07179  [pdf, other

    cs.SD eess.AS

    SponTTS: modeling and transferring spontaneous style for TTS

    Authors: Hanzhao Li, Xinfa Zhu, Liumeng Xue, Yang Song, Yunlin Chen, Lei Xie

    Abstract: Spontaneous speaking style exhibits notable differences from other speaking styles due to various spontaneous phenomena (e.g., filled pauses, prolongation) and substantial prosody variation (e.g., diverse pitch and duration variation, occasional non-verbal speech like a smile), posing challenges to modeling and prediction of spontaneous style. Moreover, the limitation of high-quality spontaneous d… ▽ More

    Submitted 8 January, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: 5 pages, 3 figures, Accepted by ICASSP2024

  38. arXiv:2310.17101  [pdf, other

    eess.AS cs.SD

    Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning

    Authors: Xinfa Zhu, Yuke Li, Yi Lei, Ning Jiang, Guoqing Zhao, Lei Xie

    Abstract: This paper aims to build a multi-speaker expressive TTS system, synthesizing a target speaker's speech with multiple styles and emotions. To this end, we propose a novel contrastive learning-based TTS approach to transfer style and emotion across speakers. Specifically, contrastive learning from different levels, i.e. utterance and category level, is leveraged to extract the disentangled style, em… ▽ More

    Submitted 25 April, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: 6 pages, 4 figures; Accepted by ICME 2024

  39. arXiv:2310.09840  [pdf, ps, other

    eess.SP

    Towards Structural Sparse Precoding: Dynamic Time, Frequency, Space, and Power Multistage Resource Programming

    Authors: Zhongxiang Wei, Ping Wang, Qingjiang Shi, Xu Zhu, Christos Masouros

    Abstract: In last decades, dynamic resource programming in partial resource domains has been extensively investigated for single time slot optimizations. However, with the emerging real-time media applications in fifth-generation communications, their new quality of service requirements are often measured in temporal dimension. This requires multistage optimization for full resource domain dynamic programmi… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

  40. arXiv:2310.07246  [pdf, other

    cs.SD eess.AS

    Vec-Tok Speech: speech vectorization and tokenization for neural speech generation

    Authors: Xinfa Zhu, Yuanjun Lv, Yi Lei, Tao Li, Wendi He, Hongbin Zhou, Heng Lu, Lei Xie

    Abstract: Language models (LMs) have recently flourished in natural language processing and computer vision, generating high-fidelity texts or images in various tasks. In contrast, the current speech generative models are still struggling regarding speech quality and task generalization. This paper presents Vec-Tok Speech, an extensible framework that resembles multiple speech generation tasks, generating e… ▽ More

    Submitted 12 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 15 pages, 2 figures

  41. arXiv:2310.05072  [pdf, other

    cs.IT eess.SP

    Performance Analysis of RIS-Aided Double Spatial Scattering Modulation for mmWave MIMO Systems

    Authors: Xusheng Zhu, Wen Chen, Qingqing Wu, Jun Li, Nan Cheng, Fangjiong Chen, Changle Li

    Abstract: In this paper, we investigate a practical structure of reconfigurable intelligent surface (RIS)-based double spatial scattering modulation (DSSM) for millimeter-wave (mmWave) multiple-input multiple-output (MIMO) systems. A suboptimal detector is proposed, in which the beam direction is first demodulated according to the received beam strength, and then the remaining information is demodulated by… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

  42. arXiv:2310.04004  [pdf, other

    cs.SD eess.AS

    U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning

    Authors: Tao Li, Zhichao Wang, Xinfa Zhu, Jian Cong, Qiao Tian, Yuping Wang, Lei Xie

    Abstract: Zero-shot speaker cloning aims to synthesize speech for any target speaker unseen during TTS system building, given only a single speech reference of the speaker at hand. Although more practical in real applications, the current zero-shot methods still produce speech with undesirable naturalness and speaker similarity. Moreover, endowing the target speaker with arbitrary speaking styles in the zer… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  43. arXiv:2310.03963  [pdf, other

    cs.SD eess.AS

    Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis

    Authors: Yuke Li, Xinfa Zhu, Yi Lei, Hai Li, Junhui Liu, Danming Xie, Lei Xie

    Abstract: Zero-shot emotion transfer in cross-lingual speech synthesis aims to transfer emotion from an arbitrary speech reference in the source language to the synthetic speech in the target language. Building such a system faces challenges of unnatural foreign accents and difficulty in modeling the shared emotional expressions of different languages. Building on the DelightfulTTS neural architecture, this… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU2023

  44. arXiv:2310.02862  [pdf, other

    cs.LG cs.AI eess.SP

    A novel asymmetrical autoencoder with a sparsifying discrete cosine Stockwell transform layer for gearbox sensor data compression

    Authors: Xin Zhu, Daoguang Yang, Hongyi Pan, Hamid Reza Karimi, Didem Ozevin, Ahmet Enis Cetin

    Abstract: The lack of an efficient compression model remains a challenge for the wireless transmission of gearbox data in non-contact gear fault diagnosis problems. In this paper, we present a signal-adaptive asymmetrical autoencoder with a transform domain layer to compress sensor signals. First, a new discrete cosine Stockwell transform (DCST) layer is introduced to replace linear layers in a multi-layer… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  45. arXiv:2310.00153  [pdf

    physics.med-ph eess.SP physics.app-ph

    Conformal Metamaterials with Active Tunability and Self-adaptivity for Magnetic Resonance Imaging

    Authors: Ke Wu, Xia Zhu, Xiaoguang Zhao, Stephan W. Anderson, Xin Zhang

    Abstract: Ongoing effort has been devoted to applying metamaterials to boost the imaging performance of magnetic resonance imaging owing to their unique capacity for electromagnetic field confinement and enhancement. However, there are still major obstacles to widespread clinical adoption of conventional metamaterials due to several notable restrictions, namely: their typically bulky and rigid structures, d… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: 21 pages, 7 figures

  46. arXiv:2309.16499  [pdf, other

    cs.CV eess.IV

    Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for Cross-City Semantic Segmentation using High-Resolution Domain Adaptation Networks

    Authors: Danfeng Hong, Bing Zhang, Hao Li, Yuxuan Li, Jing Yao, Chenyu Li, Martin Werner, Jocelyn Chanussot, Alexander Zipf, Xiao Xiang Zhu

    Abstract: Artificial intelligence (AI) approaches nowadays have gained remarkable success in single-modality-dominated remote sensing (RS) applications, especially with an emphasis on individual urban environments (e.g., single cities or regions). Yet these AI models tend to meet the performance bottleneck in the case studies across cities or regions, due to the lack of diverse RS information and cutting-ed… ▽ More

    Submitted 3 October, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

  47. arXiv:2309.16468  [pdf, other

    eess.SP

    HyperLISTA-ABT: An Ultra-light Unfolded Network for Accurate Multi-component Differential Tomographic SAR Inversion

    Authors: Kun Qian, Yuanyuan Wang, Peter Jung, Yilei Shi, Xiao Xiang Zhu

    Abstract: Deep neural networks based on unrolled iterative algorithms have achieved remarkable success in sparse reconstruction applications, such as synthetic aperture radar (SAR) tomographic inversion (TomoSAR). However, the currently available deep learning-based TomoSAR algorithms are limited to three-dimensional (3D) reconstruction. The extension of deep learning-based algorithms to four-dimensional (4… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  48. arXiv:2309.13907  [pdf, other

    cs.SD eess.AS

    HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS

    Authors: Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie

    Abstract: Recent advances in text-to-speech, particularly those based on Graph Neural Networks (GNNs), have significantly improved the expressiveness of short-form synthetic speech. However, generating human-parity long-form speech with high dynamic prosodic variations is still challenging. To address this problem, we expand the capabilities of GNNs with a hierarchical prosody modeling approach, named HiGNN… ▽ More

    Submitted 6 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted by ASRU2023

  49. arXiv:2309.12201  [pdf, other

    eess.SP cs.AI cs.LG

    Electroencephalogram Sensor Data Compression Using An Asymmetrical Sparse Autoencoder With A Discrete Cosine Transform Layer

    Authors: Xin Zhu, Hongyi Pan, Shuaiang Rong, Ahmet Enis Cetin

    Abstract: Electroencephalogram (EEG) data compression is necessary for wireless recording applications to reduce the amount of data that needs to be transmitted. In this paper, an asymmetrical sparse autoencoder with a discrete cosine transform (DCT) layer is proposed to compress EEG signals. The encoder module of the autoencoder has a combination of a fully connected linear layer and the DCT layer to reduc… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  50. arXiv:2309.11109  [pdf, other

    cs.CV eess.IV

    Self-supervised Domain-agnostic Domain Adaptation for Satellite Images

    Authors: Fahong Zhang, Yilei Shi, Xiao Xiang Zhu

    Abstract: Domain shift caused by, e.g., different geographical regions or acquisition conditions is a common issue in machine learning for global scale satellite image processing. A promising method to address this problem is domain adaptation, where the training and the testing datasets are split into two or multiple domains according to their distributions, and an adaptation method is applied to improve t… ▽ More

    Submitted 25 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.