Zum Hauptinhalt springen

Showing 1–50 of 274 results for author: Ma, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.14340  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Foundation Models for Music: A Survey

    Authors: Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elio Quinton, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg , et al. (18 additional authors not shown)

    Abstract: In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the signifi… ▽ More

    Submitted 27 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  2. arXiv:2408.12255  [pdf, ps, other

    cs.IT eess.SP

    Fast Iterative ELAA-MIMO Detection Exploiting Static Channel Components

    Authors: Jiuyu Liu, Yi Ma, Rahim Tafazolli

    Abstract: Extremely large aperture array (ELAA) is a promising multiple-input multiple-output (MIMO) technique for next generation mobile networks. In this paper, we propose two novel approaches to accelerate the convergence of current iterative MIMO detectors in ELAA channels. Our approaches exploit the static components of the ELAA channel, which include line of sight (LoS) paths and deterministic non-LoS… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: This work has been accepted by the IEEE Information Theory Workshop (ITW) 2024. Copyright may be transferred without notice, after which this version may no longer be accessible

  3. arXiv:2408.08746  [pdf, other

    cs.IT eess.SP

    Accelerating Iteratively Linear Detectors in Multi-User (ELAA-)MIMO Systems with UW-SVD

    Authors: Jiuyu Liu, Yi Ma, Jinfei Wang, Rahim Tafazolli

    Abstract: Current iterative multiple-input multiple-output (MIMO) detectors suffer from slow convergence when the wireless channel is ill-conditioned. The ill-conditioning is mainly caused by spatial correlation between channel columns corresponding to the same user equipment, known as intra-user interference. In addition, in the emerging MIMO systems using an extremely large aperture array (ELAA), spatial… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: This work has been accepted by IEEE Transactions on Wireless Communications. Copyright may be transferred without notice, after which this version may no longer be accessible

  4. arXiv:2408.07592  [pdf, other

    eess.SP

    Multi-periodicity dependency Transformer based on spectrum offset for radio frequency fingerprint identification

    Authors: Jing Xiao, Wenrui Ding, Zeqi Shao, Duona Zhang, Yanan Ma, Yufeng Wang, Jian Wang

    Abstract: Radio Frequency Fingerprint Identification (RFFI) has emerged as a pivotal task for reliable device authentication. Despite advancements in RFFI methods, background noise and intentional modulation features result in weak energy and subtle differences in the RFF features. These challenges diminish the capability of RFFI methods in feature representation, complicating the effective identification o… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  5. arXiv:2408.07325  [pdf, other

    eess.IV cs.GR

    RoCoSDF: Row-Column Scanned Neural Signed Distance Fields for Freehand 3D Ultrasound Imaging Shape Reconstruction

    Authors: Hongbo Chen, Yuchong Gao, Shuhang Zhang, Jiangjie Wu, Yuexin Ma, Rui Zheng

    Abstract: The reconstruction of high-quality shape geometry is crucial for developing freehand 3D ultrasound imaging. However, the shape reconstruction of multi-view ultrasound data remains challenging due to the elevation distortion caused by thick transducer probes. In this paper, we present a novel learning-based framework RoCoSDF, which can effectively generate an implicit surface through continuous sha… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted by MICCAI 2024

  6. arXiv:2408.03393  [pdf, other

    eess.IV cs.CV cs.GR

    Biomedical Image Segmentation: A Systematic Literature Review of Deep Learning Based Object Detection Methods

    Authors: Fazli Wahid, Yingliang Ma, Dawar Khan, Muhammad Aamir, Syed U. K. Bukhari

    Abstract: Biomedical image segmentation plays a vital role in diagnosis of diseases across various organs. Deep learning-based object detection methods are commonly used for such segmentation. There exists an extensive research in this topic. However, there is no standard review on this topic. Existing surveys often lack a standardized approach or focus on broader segmentation techniques. In this paper, we… ▽ More

    Submitted 28 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  7. arXiv:2408.02943  [pdf, other

    eess.SP

    Recent Advances in Data-driven Intelligent Control for Wireless Communication: A Comprehensive Survey

    Authors: Wei Huo, Huiwen Yang, Nachuan Yang, Zhaohua Yang, Jiuzhou Zhang, Fuhai Nan, Xingzhou Chen, Yifan Mao, Suyang Hu, Pengyu Wang, Xuanyu Zheng, Mingming Zhao, Ling Shi

    Abstract: The advent of next-generation wireless communication systems heralds an era characterized by high data rates, low latency, massive connectivity, and superior energy efficiency. These systems necessitate innovative and adaptive strategies for resource allocation and device behavior control in wireless networks. Traditional optimization-based methods have been found inadequate in meeting the complex… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  8. arXiv:2407.21531  [pdf, other

    cs.SD cs.CL cs.MM eess.AS

    Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation

    Authors: Ziya Zhou, Yuhang Wu, Zhiyue Wu, Xinyue Zhang, Ruibin Yuan, Yinghao Ma, Lu Wang, Emmanouil Benetos, Wei Xue, Yike Guo

    Abstract: Symbolic Music, akin to language, can be encoded in discrete symbols. Recent research has extended the application of large language models (LLMs) such as GPT-4 and Llama2 to the symbolic music domain including understanding and generation. Yet scant research explores the details of how these LLMs perform on advanced music understanding and conditioned generation, especially from the multi-step re… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted by ISMIR2024

  9. arXiv:2407.13703  [pdf, other

    cs.IT cs.LG eess.SP

    Energy-Efficient Channel Decoding for Wireless Federated Learning: Convergence Analysis and Adaptive Design

    Authors: Linping Qu, Yuyi Mao, Shenghui Song, Chi-Ying Tsui

    Abstract: One of the most critical challenges for deploying distributed learning solutions, such as federated learning (FL), in wireless networks is the limited battery capacity of mobile clients. While it is a common belief that the major energy consumption of mobile clients comes from the uplink data transmission, this paper presents a novel finding, namely the channel decoding operation also contributes… ▽ More

    Submitted 19 July, 2024; v1 submitted 26 June, 2024; originally announced July 2024.

    Comments: This work has been submitted to the IEEE TWC for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  10. arXiv:2407.08948  [pdf, other

    eess.IV cs.CV

    Symmetry Awareness Encoded Deep Learning Framework for Brain Imaging Analysis

    Authors: Yang Ma, Dongang Wang, Peilin Liu, Lynette Masters, Michael Barnett, Weidong Cai, Chenyu Wang

    Abstract: The heterogeneity of neurological conditions, ranging from structural anomalies to functional impairments, presents a significant challenge in medical imaging analysis tasks. Moreover, the limited availability of well-annotated datasets constrains the development of robust analysis models. Against this backdrop, this study introduces a novel approach leveraging the inherent anatomical symmetrical… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024

    ACM Class: I.2.10; I.4.10

  11. arXiv:2407.06530  [pdf, ps, other

    eess.SP

    RS-BNN: A Deep Learning Framework for the Optimal Beamforming Design of Rate-Splitting Multiple Access

    Authors: Yiwen Wang, Yijie Mao, Sijie Ji

    Abstract: Rate splitting multiple access (RSMA) relies on beamforming design for attaining spectral efficiency and energy efficiency gains over traditional multiple access schemes. While conventional optimization approaches such as weighted minimum mean square error (WMMSE) achieve suboptimal solutions for RSMA beamforming optimization, they are computationally demanding. A novel approach based on fractiona… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  12. arXiv:2407.05155  [pdf, other

    cs.IT eess.SP

    Wi-Fi Beyond Communications: Experimental Evaluation of Respiration Monitoring and Motion Detection Using COTS Devices

    Authors: Jiuyu Liu, Yi Ma, Rahim Tafazolli

    Abstract: Wi-Fi sensing has become an attractive option for non-invasive monitoring of human activities and vital signs. This paper explores the feasibility of using state-of-the-art commercial off-the-shelf (COTS) devices for Wi-Fi sensing applications, particularly respiration monitoring and motion detection. We utilize the Intel AX210 network interface card (NIC) to transmit Wi-Fi signals in both 2.4 GHz… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: This work has been accepted by IEEE ICCC Workshop 2024. Copyright may be transferred without notice, after which this version may no longer be accessible

  13. arXiv:2407.03050  [pdf, other

    eess.SP

    Semantic-Aware Power Allocation for Generative Semantic Communications with Foundation Models

    Authors: Chunmei Xu, Mahdi Boloursaz Mashhadi, Yi Ma, Rahim Tafazolli

    Abstract: Recent advancements in diffusion models have made a significant breakthrough in generative modeling. The combination of the generative model and semantic communication (SemCom) enables high-fidelity semantic information exchange at ultra-low rates. A novel generative SemCom framework for image tasks is proposed, wherein pre-trained foundation models serve as semantic encoders and decoders for sema… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  14. arXiv:2407.00196  [pdf, other

    eess.SP

    Multi-Satellite MIMO Systems for Direct User-Satellite Communications: A Survey

    Authors: Zohre Mashayekh Bakhsh, Yasaman Omid, Gaojie Chen, Farbod Kayhan, Yi Ma, Rahim Tafazolli

    Abstract: Advancements in satellite technology have made direct-to-device connectivity a viable solution for ensuring global access. This method is designed to provide internet connectivity to remote, rural, or underserved areas where traditional cellular or broadband networks are lacking or insufficient. This paper is a survey providing an in-depth review of multi-satellite Multiple Input Multiple Output (… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: 29 pages, 11 figures, 6 tables, IEEE Communication Survey and Tutorials

  15. arXiv:2406.18549  [pdf

    eess.IV cs.CV

    Advancements in Feature Extraction Recognition of Medical Imaging Systems Through Deep Learning Technique

    Authors: Qishi Zhan, Dan Sun, Erdi Gao, Yuhan Ma, Yaxin Liang, Haowei Yang

    Abstract: This study introduces a novel unsupervised medical image feature extraction method that employs spatial stratification techniques. An objective function based on weight is proposed to achieve the purpose of fast image recognition. The algorithm divides the pixels of the image into multiple subdomains and uses a quadtree to access the image. A technique for threshold optimization utilizing a simple… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

    Comments: conference

  16. arXiv:2406.16323  [pdf, other

    eess.SP

    Low-Complexity CSI Feedback for FDD Massive MIMO Systems via Learning to Optimize

    Authors: Yifan Ma, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

    Abstract: In frequency-division duplex (FDD) massive multiple-input multiple-output (MIMO) systems, the growing number of base station antennas leads to prohibitive feedback overhead for downlink channel state information (CSI). To address this challenge, state-of-the-art (SOTA) fully data-driven deep learning (DL)-based CSI feedback schemes have been proposed. However, the high computational complexity and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: submitted to IEEE for publication

  17. arXiv:2406.14333  [pdf, other

    cs.IR cs.SD eess.AS

    LARP: Language Audio Relational Pre-training for Cold-Start Playlist Continuation

    Authors: Rebecca Salganik, Xiaohao Liu, Yunshan Ma, Jian Kang, Tat-Seng Chua

    Abstract: As online music consumption increasingly shifts towards playlist-based listening, the task of playlist continuation, in which an algorithm suggests songs to extend a playlist in a personalized and musically cohesive manner, has become vital to the success of music streaming. Currently, many existing playlist continuation approaches rely on collaborative filtering methods to perform recommendation.… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  18. arXiv:2406.14264  [pdf, other

    eess.IV cs.CV

    Zero-Shot Image Denoising for High-Resolution Electron Microscopy

    Authors: Xuanyu Tian, Zhuoya Dong, Xiyue Lin, Yue Gao, Hongjiang Wei, Yanhang Ma, Jingyi Yu, Yuyao Zhang

    Abstract: High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 12 figures

  19. arXiv:2406.04740  [pdf, other

    eess.IV

    Activation Map-based Vector Quantization for 360-degree Image Semantic Communication

    Authors: Yang Ma, Wenchi Cheng, Jingqing Wang, Wei Zhang

    Abstract: In virtual reality (VR) applications, 360-degree images play a pivotal role in crafting immersive experiences and offering panoramic views, thus improving user Quality of Experience (QoE). However, the voluminous data generated by 360-degree images poses challenges in network storage and bandwidth. To address these challenges, we propose a novel Activation Map-based Vector Quantization (AM-VQ) fra… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  20. arXiv:2406.02483  [pdf, other

    eess.AS cs.AI cs.SD

    How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?

    Authors: Tianchi Liu, Lin Zhang, Rohan Kumar Das, Yi Ma, Ruijie Tao, Haizhou Li

    Abstract: Partially manipulating a sentence can greatly change its meaning. Recent work shows that countermeasures (CMs) trained on partially spoofed audio can effectively detect such spoofing. However, the current understanding of the decision-making process of CMs is limited. We utilize Grad-CAM and introduce a quantitative analysis metric to interpret CMs' decisions. We find that CMs prioritize the artif… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  21. arXiv:2406.02009  [pdf, other

    eess.AS cs.CL cs.SD

    Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

    Authors: Kun Zhou, Shengkui Zhao, Yukun Ma, Chong Zhang, Hao Wang, Dianwen Ng, Chongjia Ni, Nguyen Trung Hieu, Jia Qi Yip, Bin Ma

    Abstract: Recent language model-based text-to-speech (TTS) frameworks demonstrate scalability and in-context learning capabilities. However, they suffer from robustness issues due to the accumulation of errors in speech unit predictions during autoregressive language modeling. In this paper, we propose a phonetic enhanced language modeling method to improve the performance of TTS models. We leverage self-su… ▽ More

    Submitted 11 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  22. arXiv:2406.00233  [pdf, other

    eess.SP

    Plug-in UL-CSI-Assisted Precoder Upsampling Approach in Cellular FDD Systems

    Authors: Yu-Chien Lin, Yan Xin, Ta-Sung Lee, Charlie, Zhang, Yibo Ma, Zhi Ding

    Abstract: Acquiring downlink channel state information (CSI) is crucial for optimizing performance in massive Multiple Input Multiple Output (MIMO) systems operating under Frequency-Division Duplexing (FDD). Most cellular wireless communication systems employ codebook-based precoder designs, which offer advantages such as simpler, more efficient feedback mechanisms and reduced feedback overhead. Common code… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  23. arXiv:2406.00085  [pdf, other

    eess.IV cs.LG q-bio.NC

    Augmentation-based Unsupervised Cross-Domain Functional MRI Adaptation for Major Depressive Disorder Identification

    Authors: Yunling Ma, Chaojun Zhang, Xiaochuan Wang, Qianqian Wang, Liang Cao, Limei Zhang, Mingxia Liu

    Abstract: Major depressive disorder (MDD) is a common mental disorder that typically affects a person's mood, cognition, behavior, and physical health. Resting-state functional magnetic resonance imaging (rs-fMRI) data are widely used for computer-aided diagnosis of MDD. While multi-site fMRI data can provide more data for training reliable diagnostic models, significant cross-site data heterogeneity would… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

  24. arXiv:2405.09552  [pdf, other

    eess.IV cs.AI cs.CV

    ODFormer: Semantic Fundus Image Segmentation Using Transformer for Optic Nerve Head Detection

    Authors: Jiayi Wang, Yi-An Mao, Xiaoyu Ma, Sicen Guo, Yuting Shao, Xiao Lv, Wenting Han, Mark Christopher, Linda M. Zangwill, Yanlong Bi, Rui Fan

    Abstract: Optic nerve head (ONH) detection has been a crucial area of study in ophthalmology for years. However, the significant discrepancy between fundus image datasets, each generated using a single type of fundus camera, poses challenges to the generalizability of ONH detection approaches developed based on semantic segmentation networks. Despite the numerous recent advancements in general-purpose seman… ▽ More

    Submitted 2 June, 2024; v1 submitted 15 April, 2024; originally announced May 2024.

  25. arXiv:2405.08288  [pdf, other

    eess.SP

    Orthogonal Delay-Doppler Division Multiplexing Modulation with Tomlinson-Harashima Precoding

    Authors: Yiyan Ma, Akram Shafie, Jinhong Yuan, Guoyu Ma, Zhangdui Zhong, Bo Ai

    Abstract: The orthogonal delay-Doppler (DD) division multiplexing(ODDM) modulation has been recently proposed as a promising modulation scheme for next-generation communication systems with high mobility. Despite its benefits, ODDM modulation and other DD domain modulation schemes face the challenge of excessive equalization complexity. To address this challenge, we propose time domain Tomlinson-Harashima p… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  26. arXiv:2405.00739  [pdf, other

    cs.LG cs.CV eess.IV

    Why does Knowledge Distillation Work? Rethink its Attention and Fidelity Mechanism

    Authors: Chenqi Guo, Shiwei Zhong, Xiaofeng Liu, Qianli Feng, Yinglong Ma

    Abstract: Does Knowledge Distillation (KD) really work? Conventional wisdom viewed it as a knowledge transfer procedure where a perfect mimicry of the student to its teacher is desired. However, paradoxical studies indicate that closely replicating the teacher's behavior does not consistently improve student generalization, posing questions on its possible causes. Confronted with this gap, we hypothesize th… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

  27. arXiv:2404.18081  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ComposerX: Multi-Agent Symbolic Music Composition with LLMs

    Authors: Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

    Abstract: Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and C… ▽ More

    Submitted 30 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  28. arXiv:2404.12604  [pdf, ps, other

    cs.IT eess.SP

    Transmitter Side Beyond-Diagonal RIS for mmWave Integrated Sensing and Communications

    Authors: Kexin Chen, Yijie Mao

    Abstract: This work initiates the study of a beyond-diagonal reconfigurable intelligent surface (BD-RIS)-aided transmitter architecture for integrated sensing and communication (ISAC) in the millimeter-wave (mmWave) frequency band. Deploying BD-RIS at the transmitter side not only alleviates the need for extensive fully digital radio frequency (RF) chains but also enhances both communication and sensing per… ▽ More

    Submitted 25 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  29. arXiv:2404.12595  [pdf, other

    eess.SP

    Deep Reinforcement Learning-aided Transmission Design for Energy-efficient Link Optimization in Vehicular Communications

    Authors: Zhengpeng Wang, Yanqun Tang, Yingzhe Mao, Tao Wang, Xiunan Huang

    Abstract: This letter presents a deep reinforcement learning (DRL) approach for transmission design to optimize the energy efficiency in vehicle-to-vehicle (V2V) communication links. Considering the dynamic environment of vehicular communications, the optimization problem is non-convex and mathematically difficult to solve. Hence, we propose scenario identification-based double and Dueling deep Q-Network (S… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 5 pages, 3 figures

  30. arXiv:2404.11383  [pdf, other

    eess.SP

    Lower Limb Movements Recognition Based on Feature Recursive Elimination and Backpropagation Neural Network

    Authors: Yongkai Ma, Shili Liang, Zekun Chen

    Abstract: Surface electromyographic (sEMG) signal serve as a signal source commonly used for lower limb movement recognition, reflecting the intent of human movement. However, it has been a challenge to improve the movements recognition rate while using fewer features in this area of research area. In this paper, a method for lower limb movements recognition based on recursive feature elimination and backpr… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  31. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  32. arXiv:2404.07473  [pdf

    eess.IV cs.CV cs.LG

    LUCF-Net: Lightweight U-shaped Cascade Fusion Network for Medical Image Segmentation

    Authors: Songkai Sun, Qingshan She, Yuliang Ma, Rihui Li, Yingchun Zhang

    Abstract: In this study, the performance of existing U-shaped neural network architectures was enhanced for medical image segmentation by adding Transformer. Although Transformer architectures are powerful at extracting global information, its ability to capture local information is limited due to its high complexity. To address this challenge, we proposed a new lightweight U-shaped cascade fusion network (… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  33. arXiv:2404.06393  [pdf, other

    cs.SD cs.AI eess.AS

    MuPT: A Generative Symbolic Music Pretrained Transformer

    Authors: Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan , et al. (4 additional authors not shown)

    Abstract: In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal… ▽ More

    Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  34. arXiv:2404.04916  [pdf, other

    eess.IV cs.CV cs.LG

    Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder

    Authors: Yiyang Ma, Wenhan Yang, Jiaying Liu

    Abstract: The images produced by diffusion models can attain excellent perceptual quality. However, it is challenging for diffusion models to guarantee distortion, hence the integration of diffusion models and image compression models still needs more comprehensive explorations. This paper presents a diffusion-based image compression method that employs a privileged end-to-end decoder model as correction, w… ▽ More

    Submitted 2 May, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by ICML 2024

  35. arXiv:2404.01716  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Effective internal language model training and fusion for factorized transducer model

    Authors: Jinxi Guo, Niko Moritz, Yingyi Ma, Frank Seide, Chunyang Wu, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

    Abstract: The internal language model (ILM) of the neural transducer has been widely studied. In most prior work, it is mainly used for estimating the ILM score and is subsequently subtracted during inference to facilitate improved integration with external language models. Recently, various of factorized transducer models have been proposed, which explicitly embrace a standalone internal language model for… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to ICASSP 2024

  36. arXiv:2403.19127  [pdf, ps, other

    eess.SP cs.IT

    Decentralizing Coherent Joint Transmission Precoding via Fast ADMM with Deterministic Equivalents

    Authors: Xinyu Bian, Yuhao Liu, Yizhou Xu, Tianqi Hou, Wenjie Wang, Yuyi Mao, Jun Zhang

    Abstract: Inter-cell interference (ICI) suppression is critical for multi-cell multi-user networks. In this paper, we investigate advanced precoding techniques for coordinated multi-point (CoMP) with downlink coherent joint transmission, an effective approach for ICI suppression. Different from the centralized precoding schemes that require frequent information exchange among the cooperating base stations,… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  37. arXiv:2403.11155  [pdf, other

    eess.IV cs.MM

    Interactive $360^{\circ}$ Video Streaming Using FoV-Adaptive Coding with Temporal Prediction

    Authors: Yixiang Mao, Liyang Sun, Yong Liu, Yao Wang

    Abstract: For $360^{\circ}$ video streaming, FoV-adaptive coding that allocates more bits for the predicted user's field of view (FoV) is an effective way to maximize the rendered video quality under the limited bandwidth. We develop a low-latency FoV-adaptive coding and streaming system for interactive applications that is robust to bandwidth variations and FoV prediction errors. To minimize the end-to-end… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  38. arXiv:2403.09958  [pdf, other

    eess.SP cs.IT

    Decentralizing Coherent Joint Transmission Precoding via Deterministic Equivalents

    Authors: Yuhao Liu, Xinyu Bian, Yizhou Xu, Tianqi Hou, Wenjie Wang, Yuyi Mao, Jun Zhang

    Abstract: In order to control the inter-cell interference for a multi-cell multi-user multiple-input multiple-output network, we consider the precoder design for coordinated multi-point with downlink coherent joint transmission. To avoid costly information exchange among the cooperating base stations in a centralized precoding scheme, we propose a decentralized one by considering the power minimization prob… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  39. arXiv:2402.17996  [pdf, ps, other

    eess.SP cs.IT

    Joint Activity-Delay Detection and Channel Estimation for Asynchronous Massive Random Access: A Free Probability Theory Approach

    Authors: Xinyu Bian, Yuyi Mao, Jun Zhang

    Abstract: Grant-free random access (RA) has been recognized as a promising solution to support massive connectivity due to the removal of the uplink grant request procedures. While most endeavours assume perfect synchronization among users and the base station, this paper investigates asynchronous grant-free massive RA, and develop efficient algorithms for joint user activity detection, synchronization dela… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: text overlap with arXiv:2305.12372

  40. arXiv:2402.17487  [pdf, other

    cs.CV cs.LG eess.IV

    Bit Rate Matching Algorithm Optimization in JPEG-AI Verification Model

    Authors: Panqi Jia, A. Burakhan Koyuncu, Jue Mao, Ze Cui, Yi Ma, Tiansheng Guo, Timofey Solovyev, Alexander Karabutov, Yin Zhao, Jing Wang, Elena Alshina, Andre Kaup

    Abstract: The research on neural network (NN) based image compression has shown superior performance compared to classical compression frameworks. Unlike the hand-engineered transforms in the classical frameworks, NN-based models learn the non-linear transforms providing more compact bit representations, and achieve faster coding speed on parallel devices over their classical counterparts. Those properties… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted at (IEEE) PCS 2024; 6 pages

  41. arXiv:2402.16153  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ChatMusician: Understanding and Generating Music Intrinsically with LLM

    Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu , et al. (10 additional authors not shown)

    Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: GitHub: https://shanghaicannon.github.io/ChatMusician/

  42. arXiv:2402.15334  [pdf, other

    cs.IT eess.SP

    Iterative Inversion of (ELAA-)MIMO Channels Using Symmetric Rank-$1$ Regularization

    Authors: Jinfei Wang, Yi Ma, Rahim Tafazolli

    Abstract: While iterative matrix inversion methods excel in computational efficiency, memory optimization, and support for parallel and distributed computing when managing large matrices, their limitations are also evident in multiple-input multiple-output (MIMO) fading channels. These methods encounter challenges related to slow convergence and diminished accuracy, especially in ill-conditioned scenarios,… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 13 pages, 12 figures

  43. arXiv:2402.15047  [pdf

    cs.IT eess.SP

    Networked Collaborative Sensing using Multi-domain Measurements: Architectures, Performance Limits and Algorithms

    Authors: Yihua Ma, Shuqiang Xia, Chen bai, Yuxin Wang, Zhongbin Wang, Songqian Li

    Abstract: As a promising 6G technology, integrated sensing and communication (ISAC) gains growing interest. ISAC provides integration gain via sharing spectrum, hardware, and software. However, concerns exist regarding its sensing performance when compared to dedicated radar systems. To address this issue, the advantages of widely deployed networks should be utilized, and this paper proposes networked colla… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  44. arXiv:2402.10071  [pdf, other

    eess.SP cs.IT

    Approximate Message Passing-Enhanced Graph Neural Network for OTFS Data Detection

    Authors: Wenhao Zhuang, Yuyi Mao, Hengtao He, Lei Xie, Shenghui Song, Yao Ge, Zhi Ding

    Abstract: Orthogonal time frequency space (OTFS) modulation has emerged as a promising solution to support high-mobility wireless communications, for which, cost-effective data detectors are critical. Although graph neural network (GNN)-based data detectors can achieve decent detection accuracy at reasonable computational cost, they fail to best harness prior information of transmitted data. To further mini… ▽ More

    Submitted 14 April, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: 8 pages, 7 figures, and 3 tables. Part of this article was submitted to IEEE for possible publication

  45. arXiv:2402.08445  [pdf, other

    eess.SP

    $1$-Bit SubTHz RIS with Planar Tightly Coupled Dipoles: Beam Shaping and Prototypes

    Authors: Xianjun Ma, Yonggang Zhou, Qi Luo, Yihan Ma, Kyriakos Stylianopoulos, George C. Alexandropoulos

    Abstract: In this paper, a proof-of-concept study of a $1$-bit wideband reconfigurable intelligent surface (RIS) comprising planar tightly coupled dipoles (PTCD) is presented. The developed RIS operates at subTHz frequencies and a $3$-dB gain bandwidth of $27.4\%$ with the center frequency at $102$ GHz is shown to be obtainable via full-wave electromagnetic simulations. The binary phase shift offered by eac… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: 5 pages, 11 figures, 18th European Conference on Antennas and Propagation (EuCAP) - to be presented

  46. arXiv:2401.17014  [pdf, other

    cs.IT eess.SP

    Near-Field Fading Channel Modeling for ELAAs: From Communication to ISAC

    Authors: Jiuyu Liu, Yi Ma, Ahmed Elzanaty, Rahim Tafazolli

    Abstract: Extremely large aperture array (ELAA) is anticipated to serve as a pivotal feature of future multiple-input multiple-output (MIMO) systems in 6G. Near-field (NF) fading channel models are essential for reliable link-level simulation and ELAA system design. In this article, we propose a framework designed to generate NF fading channels for both communication and integrated sensing and communication… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  47. arXiv:2401.15955  [pdf

    eess.SP eess.SY

    A Novel Geometric Solution for Moving Target Localization through Multistatic Sensing in the ISAC System

    Authors: S. Zhuge, Y. Ma, Z. Lin, Y. Zeng

    Abstract: This paper proposes a novel geometric solution for tracking a moving target through multistatic sensing. In contrast to existing two-step weighted least square (2SWLS) methods which use the bistatic range (BR) and bistatic range rate (BRR) measurements, the proposed method incorporates an additional direction of arrival (DOA) measurement of the target obtained from a communication receiver in an i… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  48. arXiv:2401.15105  [pdf, other

    eess.IV cs.CV cs.LG

    Diffusion Enhancement for Cloud Removal in Ultra-Resolution Remote Sensing Imagery

    Authors: Jialu Sui, Yiyang Ma, Wenhan Yang, Xiaokang Zhang, Man-On Pun, Jiaying Liu

    Abstract: The presence of cloud layers severely compromises the quality and effectiveness of optical remote sensing (RS) images. However, existing deep-learning (DL)-based Cloud Removal (CR) techniques encounter difficulties in accurately reconstructing the original visual authenticity and detailed semantic content of the images. To tackle this challenge, this work proposes to encompass enhancements at the… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  49. arXiv:2401.14978  [pdf, other

    cs.HC cs.MM cs.SD eess.AS

    Robust Dual-Modal Speech Keyword Spotting for XR Headsets

    Authors: Zhuojiang Cai, Yuhan Ma, Feng Lu

    Abstract: While speech interaction finds widespread utility within the Extended Reality (XR) domain, conventional vocal speech keyword spotting systems continue to grapple with formidable challenges, including suboptimal performance in noisy environments, impracticality in situations requiring silence, and susceptibility to inadvertent activations when others speak nearby. These challenges, however, can pot… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: Accepted to IEEE VR 2024

  50. arXiv:2401.10392  [pdf, other

    physics.optics eess.SP

    Deep learning and random light structuring ensure robust free-space communications

    Authors: Xiaofei Li, Yu Wang, Xin Liu, Yuan Ma, Yangjian Cai, Sergey A. Ponomarenko, Xianlong Liu

    Abstract: Having shown early promise, free-space optical communications (FSO) face formidable challenges in the age of information explosion. The ever-growing demand for greater channel communication capacity is one of the challenges. The inter-channel crosstalk, which severely degrades the quality of transmitted information, creates another roadblock in the way of efficient FSO implementation. Here we adva… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: 18 pages,13 figures