Zum Hauptinhalt springen

Showing 1–50 of 196 results for author: Guo, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.11230  [pdf, other

    eess.SP

    Multi-User Continuous-Aperture Array Communications: How to Learn Current Distribution?

    Authors: Jia Guo, Yuanwei Liu, Arumugam Nallanathan

    Abstract: The continuous aperture array (CAPA) can provide higher degree-of-freedom and spatial resolution than the spatially discrete array (SDPA), where optimizing multi-user current distributions in CAPA systems is crucial but challenging. The challenge arises from solving non-convex functional optimization problems without closed-form objective functions and constraints. In this paper, we propose a deep… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 6 pages, 6 figures

  2. arXiv:2408.06381  [pdf, other

    eess.IV cs.AI cs.CV

    Assessment of Cell Nuclei AI Foundation Models in Kidney Pathology

    Authors: Junlin Guo, Siqi Lu, Can Cui, Ruining Deng, Tianyuan Yao, Zhewen Tao, Yizhe Lin, Marilyn Lionts, Quan Liu, Juming Xiong, Catie Chang, Mitchell Wilkes, Mengmeng Yin, Haichun Yang, Yuankai Huo

    Abstract: Cell nuclei instance segmentation is a crucial task in digital kidney pathology. Traditional automatic segmentation methods often lack generalizability when applied to unseen datasets. Recently, the success of foundation models (FMs) has provided a more generalizable solution, potentially enabling the segmentation of any cell type. In this study, we perform a large-scale evaluation of three widely… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  3. arXiv:2408.06185  [pdf, other

    eess.SY cs.CY cs.GT cs.NI

    Hi-SAM: A high-scalable authentication model for satellite-ground Zero-Trust system using mean field game

    Authors: Xuesong Wu, Tianshuai Zheng, Runfang Wu, Jie Ren, Junyan Guo, Ye Du

    Abstract: As more and more Internet of Thing (IoT) devices are connected to satellite networks, the Zero-Trust Architecture brings dynamic security to the satellite-ground system, while frequent authentication creates challenges for system availability. To make the system's accommodate more IoT devices, this paper proposes a high-scalable authentication model (Hi-SAM). Hi-SAM introduces the Proof-of-Work id… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  4. arXiv:2408.04300  [pdf, other

    eess.IV cs.CV

    An Explainable Non-local Network for COVID-19 Diagnosis

    Authors: Jingfu Yang, Peng Huang, Jing Hu, Shu Hu, Siwei Lyu, Xin Wang, Jun Guo, Xi Wu

    Abstract: The CNN has achieved excellent results in the automatic classification of medical images. In this study, we propose a novel deep residual 3D attention non-local network (NL-RAN) to classify CT images included COVID-19, common pneumonia, and normal to perform rapid and explainable COVID-19 diagnosis. We built a deep residual 3D attention non-local network that could achieve end-to-end training. The… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  5. arXiv:2407.21514  [pdf

    eess.SP

    Wireless Communications in Doubly Selective Channels with Domain Adaptivity

    Authors: J. Andrew Zhang, Hongyang Zhang, Kai Wu, Xiaojing Huang, Jinhong Yuan, Y. Jay Guo

    Abstract: Wireless communications are significantly impacted by the propagation environment, particularly in doubly selective channels with variations in both time and frequency domains. Orthogonal Time Frequency Space (OTFS) modulation has emerged as a promising solution; however, its high equalization complexity, if performed in the delay-Doppler domain, limits its universal application. This article expl… ▽ More

    Submitted 31 July, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: Magazine article, 7 pages, 4 figures, 2 tables

  6. arXiv:2407.16664  [pdf, other

    cs.CL eess.AS

    Towards scalable efficient on-device ASR with transfer learning

    Authors: Laxmi Pandey, Ke Li, Jinxi Guo, Debjyoti Paul, Arthur Guo, Jay Mahadeokar, Xuedong Zhang

    Abstract: Multilingual pretraining for transfer learning significantly boosts the robustness of low-resource monolingual ASR models. This study systematically investigates three main aspects: (a) the impact of transfer learning on model performance during initial training or fine-tuning, (b) the influence of transfer learning across dataset domains and languages, and (c) the effect on rare-word recognition… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  7. arXiv:2407.05905  [pdf, other

    eess.SP

    Deep Learning-based CSI Feedback in Wi-Fi Systems

    Authors: Fan Qi, Jiajia Guo, Yiming Cui, Xiangyi Li, Chao-Kai Wen, Shi Jin

    Abstract: In Wi-Fi systems, channel state information (CSI) plays a crucial role in enabling access points to execute beamforming operations. However, the feedback overhead associated with CSI significantly hampers the throughput improvements. Recent advancements in deep learning (DL) have transformed the approach to CSI feedback in cellular systems. Drawing inspiration from the successes witnessed in the r… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  8. arXiv:2407.05558  [pdf

    math.OC eess.SY

    Hidden Convexity-Based Distributed Operation of Integrated Electricity-Gas Systems

    Authors: Rong-Peng Liu, Yue Song, Junhong Liu, Xiaozhe Wang, Jinpeng Guo, Yunhe Hou

    Abstract: We propose a hidden convexity-based method to address distributed optimal energy flow (OEF) problems for transmission-level integrated electricity-gas systems. First, we develop a node-wise decoupling method to de-compose an OEF problem into multiple OEF subproblems. Then, we propose a hidden convexity-based method to equivalently reformulate nonconvex OEF subproblems as semi-definite programs. Th… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 7 pages

  9. arXiv:2407.02918  [pdf, other

    cs.CV eess.IV

    Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction

    Authors: Jiaxin Guo, Jiangliu Wang, Di Kang, Wenzhen Dong, Wenting Wang, Yun-hui Liu

    Abstract: Real-time 3D reconstruction of surgical scenes plays a vital role in computer-assisted surgery, holding a promise to enhance surgeons' visibility. Recent advancements in 3D Gaussian Splatting (3DGS) have shown great potential for real-time novel view synthesis of general scenes, which relies on accurate poses and point clouds generated by Structure-from-Motion (SfM) for initialization. However, 3D… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  10. arXiv:2407.02005  [pdf, other

    cs.CL cs.SD eess.AS

    An End-to-End Speech Summarization Using Large Language Model

    Authors: Hengchao Shang, Zongyao Li, Jiaxin Guo, Shaojun Li, Zhiqiang Rao, Yuanchang Luo, Daimeng Wei, Hao Yang

    Abstract: Abstractive Speech Summarization (SSum) aims to generate human-like text summaries from spoken content. It encounters difficulties in handling long speech input and capturing the intricate cross-modal mapping between long speech inputs and short text summaries. Research on large language models (LLMs) and multimodal information fusion has provided new insights for addressing these challenges. In t… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: InterSpeech 2024

  11. arXiv:2406.18538  [pdf, other

    cs.CV cs.AI eess.IV

    VideoQA-SC: Adaptive Semantic Communication for Video Question Answering

    Authors: Jiangyuan Guo, Wei Chen, Yuxuan Sun, Jialong Xu, Bo Ai

    Abstract: Although semantic communication (SC) has shown its potential in efficiently transmitting multi-modal data such as text, speeches and images, SC for videos has focused primarily on pixel-level reconstruction. However, these SC systems may be suboptimal for downstream intelligent tasks. Moreover, SC systems without pixel-level video reconstruction present advantages by achieving higher bandwidth eff… ▽ More

    Submitted 17 May, 2024; originally announced June 2024.

  12. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, Jinming Guo, Xiaolin Chen, Jingcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  13. arXiv:2406.04791  [pdf, other

    cs.SD eess.AS

    Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

    Authors: Shaojun Li, Daimeng Wei, Hengchao Shang, Jiaxin Guo, ZongYao Li, Zhanglin Wu, Zhiqiang Rao, Yuanchang Luo, Xianghui He, Hao Yang

    Abstract: Despite recent improvements in End-to-End Automatic Speech Recognition (E2E ASR) systems, the performance can degrade due to vocal characteristic mismatches between training and testing data, particularly with limited target speaker adaptation data. We propose a novel speaker adaptation approach Speaker-Smoothed kNN that leverages k-Nearest Neighbors (kNN) retrieval techniques to improve model out… ▽ More

    Submitted 1 July, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  14. arXiv:2406.02438  [pdf, other

    eess.AS cs.MM cs.SD

    CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

    Authors: Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, Jing Guo, Tomoki Toda, Zhiyao Duan

    Abstract: Recent singing voice synthesis and conversion advancements necessitate robust singing voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to limited controllability, diversity in deepfake methods, and licensing restrictions. Addressing these gaps, we introduce CtrSVDD, a large-scale, diverse collection of bonafide and deepfake singing vocals. These vocals are synthesi… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  15. arXiv:2405.14590  [pdf, other

    eess.IV cs.CV

    MAMOC: MRI Motion Correction via Masked Autoencoding

    Authors: Lennart Alexander Van der Goten, Jingyu Guo, Kevin Smith

    Abstract: The presence of motion artifacts in magnetic resonance imaging (MRI) scans poses a significant challenge, where even minor patient movements can lead to artifacts that may compromise the scan's utility. This paper introduces Masked Motion Correction (MAMOC), a novel method designed to address the issue of Retrospective Artifact Correction (RAC) in motion-affected MRI brain scans. MAMOC uses masked… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  16. arXiv:2405.07023  [pdf, other

    eess.IV cs.CV

    Efficient Real-world Image Super-Resolution Via Adaptive Directional Gradient Convolution

    Authors: Long Peng, Yang Cao, Renjing Pei, Wenbo Li, Jiaming Guo, Xueyang Fu, Yang Wang, Zheng-Jun Zha

    Abstract: Real-SR endeavors to produce high-resolution images with rich details while mitigating the impact of multiple degradation factors. Although existing methods have achieved impressive achievements in detail recovery, they still fall short when addressing regions with complex gradient arrangements due to the intensity-based linear weighting feature extraction manner. Moreover, the stochastic artifact… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  17. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  18. arXiv:2404.14063  [pdf, other

    cs.SD cs.LG cs.NE eess.AS

    LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Search

    Authors: Jinyue Guo, Anna-Maria Christodoulou, Balint Laczko, Kyrre Glette

    Abstract: Evolutionary Algorithms and Generative Deep Learning have been two of the most powerful tools for sound generation tasks. However, they have limitations: Evolutionary Algorithms require complicated designs, posing challenges in control and achieving realistic sound generation. Generative Deep Learning models often copy from the dataset and lack creativity. In this paper, we propose LVNS-RAVE, a me… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to GECCO 24 Companion

  19. arXiv:2404.13568  [pdf

    cs.SD eess.AS

    Sparse Direction of Arrival Estimation Method Based on Vector Signal Reconstruction with a Single Vector Sensor

    Authors: Jiabin Guo

    Abstract: This study investigates the application of single vector hydrophones in underwater acoustic signal processing for Direction of Arrival (DOA) estimation. Addressing the limitations of traditional DOA estimation methods in multi-source environments and under noise interference, this research proposes a Vector Signal Reconstruction (VSR) technique. This technique transforms the covariance matrix of s… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 20 pages

  20. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  21. arXiv:2404.10558  [pdf, other

    cs.AR eess.SP

    Decade-Bandwidth RF-Input Pseudo-Doherty Load Modulated Balanced Amplifier using Signal-Flow-Based Phase Alignment Design

    Authors: Pingzhu Gong, Jiachen Guo, Niteesh Bharadwaj Vangipurapu, Kenle Chen

    Abstract: This paper reports a first-ever decade-bandwidth pseudo-Doherty load-modulated balanced amplifier (PD-LMBA), designed for emerging 4G/5G communications and multi-band operations. By revisiting the LMBA theory using the signal-flow graph, a frequency-agnostic phase-alignment condition is found that is critical for ensuring intrinsically broadband load modulation behavior. This unique design methodo… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted for publication by IEEE Microwave and Wireless Technology Letters (not published). The IEEE copyright receipt is attached

  22. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  23. arXiv:2404.01716  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Effective internal language model training and fusion for factorized transducer model

    Authors: Jinxi Guo, Niko Moritz, Yingyi Ma, Frank Seide, Chunyang Wu, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

    Abstract: The internal language model (ILM) of the neural transducer has been widely studied. In most prior work, it is mainly used for estimating the ILM score and is subsequently subtracted during inference to facilitate improved integration with external language models. Recently, various of factorized transducer models have been proposed, which explicitly embrace a standalone internal language model for… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to ICASSP 2024

  24. arXiv:2403.20289  [pdf, other

    cs.CL cs.SD eess.AS

    Emotion-Anchored Contrastive Learning Framework for Emotion Recognition in Conversation

    Authors: Fangxu Yu, Junjie Guo, Zhen Wu, Xinyu Dai

    Abstract: Emotion Recognition in Conversation (ERC) involves detecting the underlying emotion behind each utterance within a conversation. Effectively generating representations for utterances remains a significant challenge in this task. Recent works propose various models to address this issue, but they still struggle with differentiating similar emotions such as excitement and happiness. To alleviate thi… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by Findings of NAACL 2024

  25. Linear Hybrid Asymmetrical Load-Modulated Balanced Amplifier with Multi-Band Reconfigurability and Antenna-VSWR Resilience

    Authors: Jiachen Guo, Yuchen Cao, Kenle Chen

    Abstract: This paper presents the first-ever highly linear and load-insensitive three-way load-modulation power amplifier (PA) based on reconfigurable hybrid asymmetrical load modulated balanced amplifier (H-ALMBA). Through proper amplitude and phase controls, the carrier, control amplifier (CA), and two peaking balanced amplifiers (BA1 and BA2) can form a linear high-order load modulation over wide bandwid… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  26. arXiv:2403.11806  [pdf, other

    cs.IT eess.SP

    Fluid Antenna for Mobile Edge Computing

    Authors: Yiping Zuo, Jiajia Guo, Biyun Sheng, Chen Dai, Fu Xiao, Shi Jin

    Abstract: In the evolving environment of mobile edge computing (MEC), optimizing system performance to meet the growing demand for low-latency computing services is a top priority. Integrating fluidic antenna (FA) technology into MEC networks provides a new approach to address this challenge. This letter proposes an FA-enabled MEC scheme that aims to minimize the total system delay by leveraging the mobilit… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  27. arXiv:2403.10518  [pdf, other

    cs.CV cs.GR cs.SD eess.AS

    Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives

    Authors: Ronghui Li, YuXiang Zhang, Yachao Zhang, Hongwen Zhang, Jie Guo, Yan Zhang, Yebin Liu, Xiu Li

    Abstract: We propose Lodge, a network capable of generating extremely long dance sequences conditioned on given music. We design Lodge as a two-stage coarse to fine diffusion architecture, and propose the characteristic dance primitives that possess significant expressiveness as intermediate representations between two diffusion models. The first stage is global diffusion, which focuses on comprehending the… ▽ More

    Submitted 19 April, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024, Project page: https://li-ronghui.github.io/lodge

  28. arXiv:2403.07274  [pdf, other

    cs.IT eess.SP

    Achievable Rate Analysis and Optimization of Double-RIS Assisted Spatially Correlated MIMO with Statistical CSI

    Authors: Kaizhe Xu, Jiajia Guo, Jun Zhang, Shi Jin, Shaodan Ma

    Abstract: Reconfigurable intelligent surface (RIS) is a novel meta-material which can form a smart radio environment by dynamically altering reflection directions of the impinging electromagnetic waves. In the prior literature, the inter-RIS links which also contribute to the performance of the whole system are usually neglected when multiple RISs are deployed. In this paper we investigate a general double-… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  29. arXiv:2402.18332  [pdf, other

    eess.SP

    Recursive GNNs for Learning Precoding Policies with Size-Generalizability

    Authors: Jia Guo, Chenyang Yang

    Abstract: Graph neural networks (GNNs) have been shown promising in optimizing power allocation and link scheduling with good size generalizability and low training complexity. These merits are important for learning wireless policies under dynamic environments, which partially come from the matched permutation equivariance (PE) properties of the GNNs to the policies to be learned. Nonetheless, it has been… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 37 pages, 8 figures

  30. arXiv:2402.09048  [pdf, other

    eess.SP

    Sensing in Bi-Static ISAC Systems with Clock Asynchronism: A Signal Processing Perspective

    Authors: Kai Wu, Jacopo Pegoraro, Francesca Meneghello, J. Andrew Zhang, Jesus O. Lacruz, Joerg Widmer, Francesco Restuccia, Michele Rossi, Xiaojing Huang, Daqing Zhang, Giuseppe Caire, Y. Jay Guo

    Abstract: Integrated Sensing and Communication (ISAC) has been identified as a pillar usage scenario for the impending 6G era. Bi-static sensing, a major type of sensing in ISAC, is promising to expedite ISAC in the near future, as it requires minimal changes to the existing network infrastructure. However, a critical challenge for bi-static sensing is clock asynchronism due to the use of different clocks a… ▽ More

    Submitted 24 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 20 pages, 6 figures, 1 table

  31. arXiv:2401.09904  [pdf, ps, other

    eess.SP

    Distributed Task-Oriented Communication Networks with Multimodal Semantic Relay and Edge Intelligence

    Authors: Jie Guo, Hao Chen, Bin Song, Yuhao Chi, Chau Yuen, Fei Richard Yu, Geoffrey Ye Li, Dusit Niyato

    Abstract: In this article, we present a novel framework, named distributed task-oriented communication networks (DTCN), based on recent advances in multimodal semantic transmission and edge intelligence. In DTCN, the multimodal knowledge of semantic relays and the adaptive adjustment capability of edge intelligence can be integrated to improve task performance. Specifically, we propose the key techniques in… ▽ More

    Submitted 19 January, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 7 pages, 5 figures, 1 table, accepted by IEEE Communications Magazine

  32. arXiv:2401.09119  [pdf, other

    eess.SP

    Anchor-points Assisted Uplink Sensing in Perceptive Mobile Networks

    Authors: Yanmo Hu, J. Andrew Zhang, Weibo Deng, Y. Jay Guo

    Abstract: Uplink sensing in integrated sensing and communications (ISAC) systems, such as Perceptive Mobile Networks, is challenging due to the clock asynchronism between transmitter and receiver. Existing solutions typically require the presence of a dominating line-of-sight path and the knowledge of transmitter location at the receiver. In this paper, relaxing these requirements, we propose a novel and ef… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 14 pages, 12 figures, journal paper

  33. arXiv:2401.09064  [pdf, other

    cs.IT eess.SP

    Performance Bounds and Optimization for CSI-Ratio based Bi-static Doppler Sensing in ISAC Systems

    Authors: Yanmo Hu, Kai Wu, J. Andrew Zhang, Weibo Deng, Y. Jay Guo

    Abstract: Bi-static sensing is crucial for exploring the potential of networked sensing capabilities in integrated sensing and communications (ISAC). However, it suffers from the challenging clock asynchronism issue. CSI ratio-based sensing is an effective means to address the issue. Its performance bounds, particular for Doppler sensing, have not been fully understood yet. This work endeavors to fill the r… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 14 pages, 15 figures, journal paper

  34. arXiv:2401.06485  [pdf, other

    eess.AS cs.SD

    Contrastive Learning With Audio Discrimination For Customizable Keyword Spotting In Continuous Speech

    Authors: Yu Xi, Baochen Yang, Hao Li, Jiaqi Guo, Kai Yu

    Abstract: Customizable keyword spotting (KWS) in continuous speech has attracted increasing attention due to its real-world application potential. While contrastive learning (CL) has been widely used to extract keyword representations, previous CL approaches all operate on pre-segmented isolated words and employ only audio-text representations matching strategy. However, for KWS in continuous speech, co-art… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP2024

  35. UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction

    Authors: Jiaxin Guo, Minghan Wang, Xiaosong Qiao, Daimeng Wei, Hengchao Shang, Zongyao Li, Zhengzhe Yu, Yinglu Li, Chang Su, Min Zhang, Shimin Tao, Hao Yang

    Abstract: Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER). Previous works usually adopt end-to-end models and has strong dependency on Pseudo Paired Data and Original Paired Data. But when only pre-training on Pseudo Paired Data, previous models have negative effect on correction. While fine-tu… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted in ICASSP 2023

  36. arXiv:2401.05000  [pdf

    eess.SP

    Mapping Information in Feature Extraction Transformation for Chirp Signal

    Authors: Shuyi Gu, Zhenghua Luo, Lin Hu, Yilin Zhang, Junxiong Guo

    Abstract: Chirp signals have established diverse applications caused by the capable of producing time-dependent linear frequencies. Most feature extraction transformation methods for chirp signals focus on enhancing the performance of transform methods but neglecting the information derived from the transformation process. Consequently, they may fail to fully exploit the information from observations, resul… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: 14 pages,10 figures

  37. Exploring Multi-Modal Control in Music-Driven Dance Generation

    Authors: Ronghui Li, Yuqin Dai, Yachao Zhang, Jun Li, Jian Yang, Jie Guo, Xiu Li

    Abstract: Existing music-driven 3D dance generation methods mainly concentrate on high-quality dance generation, but lack sufficient control during the generation process. To address these issues, we propose a unified framework capable of generating high-quality dance movements and supporting multi-modal control, including genre control, semantic control, and spatial control. First, we decouple the dance ge… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  38. arXiv:2312.10892  [pdf, other

    eess.IV cs.CV q-bio.QM

    Deep Learning-based MRI Reconstruction with Artificial Fourier Transform (AFT)-Net

    Authors: Yanting Yang, Jeffery Siyuan Tian, Matthieu Dagommer, Jia Guo

    Abstract: The deep complex-valued neural network provides a powerful way to leverage complex number operations and representations, which has succeeded in several phase-based applications. However, most previously published networks have not fully accessed the impact of complex-valued networks in the frequency domain. Here, we introduced a unified complex-valued deep learning framework - artificial Fourier… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

  39. arXiv:2312.08343  [pdf

    eess.IV cs.CV q-bio.QM

    Enhancing CT Image synthesis from multi-modal MRI data based on a multi-task neural network framework

    Authors: Zhuoyao Xin, Christopher Wu, Dong Liu, Chunming Gu, Jia Guo, Jun Hua

    Abstract: Image segmentation, real-value prediction, and cross-modal translation are critical challenges in medical imaging. In this study, we propose a versatile multi-task neural network framework, based on an enhanced Transformer U-Net architecture, capable of simultaneously, selectively, and adaptively addressing these medical image tasks. Validation is performed on a public repository of human brain MR… ▽ More

    Submitted 17 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: 4 pages, 3 figures, 2 tables

  40. arXiv:2312.08267  [pdf, other

    eess.IV cs.CV q-bio.QM

    TABSurfer: a Hybrid Deep Learning Architecture for Subcortical Segmentation

    Authors: Aaron Cao, Vishwanatha M. Rao, Kejia Liu, Xinru Liu, Andrew F. Laine, Jia Guo

    Abstract: Subcortical segmentation remains challenging despite its important applications in quantitative structural analysis of brain MRI scans. The most accurate method, manual segmentation, is highly labor intensive, so automated tools like FreeSurfer have been adopted to handle this task. However, these traditional pipelines are slow and inefficient for processing large datasets. In this study, we propo… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: 5 pages, 3 figures, 2 tables

  41. arXiv:2312.05279  [pdf

    eess.IV cs.CV

    Quantitative perfusion maps using a novelty spatiotemporal convolutional neural network

    Authors: Anbo Cao, Pin-Yu Le, Zhonghui Qie, Haseeb Hassan, Yingwei Guo, Asim Zaman, Jiaxi Lu, Xueqiang Zeng, Huihui Yang, Xiaoqiang Miao, Taiyu Han, Guangtao Huang, Yan Kang, Yu Luo, Jia Guo

    Abstract: Dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) is widely used to evaluate acute ischemic stroke to distinguish salvageable tissue and infarct core. For this purpose, traditional methods employ deconvolution techniques, like singular value decomposition, which are known to be vulnerable to noise, potentially distorting the derived perfusion parameters. However, deep learning t… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  42. arXiv:2312.05119  [pdf, other

    eess.IV cs.CV

    Quantifying white matter hyperintensity and brain volumes in heterogeneous clinical and low-field portable MRI

    Authors: Pablo Laso, Stefano Cerri, Annabel Sorby-Adams, Jennifer Guo, Farrah Mateen, Philipp Goebl, Jiaming Wu, Peirong Liu, Hongwei Li, Sean I. Young, Benjamin Billot, Oula Puonti, Gordon Sze, Sam Payabavash, Adam DeHavenon, Kevin N. Sheth, Matthew S. Rosen, John Kirsch, Nicola Strisciuglio, Jelmer M. Wolterink, Arman Eshaghi, Frederik Barkhof, W. Taylor Kimberly, Juan Eugenio Iglesias

    Abstract: Brain atrophy and white matter hyperintensity (WMH) are critical neuroimaging features for ascertaining brain injury in cerebrovascular disease and multiple sclerosis. Automated segmentation and quantification is desirable but existing methods require high-resolution MRI with good signal-to-noise ratio (SNR). This precludes application to clinical and low-field portable MRI (pMRI) scans, thus hamp… ▽ More

    Submitted 15 February, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

  43. arXiv:2311.02865  [pdf, other

    cs.IT eess.SP

    Geometrically-Shaped Constellation for Visible Light Communications at Short Blocklength

    Authors: Jia-Ning Guo, Ru-Han Chen, Jian Zhang, Longguang Li, Xu Yang, Jing Zhou

    Abstract: In this paper, we present a general framework of designing geometrically shaped constellations for short-packet visible light communications with a peak- and an average-intensity constraints. By leveraging tools from large deviation theory, we first characterize the second-order asymptotics of the optimal constellation shaping region under aforementioned intensity constraints, which serves as a go… ▽ More

    Submitted 28 April, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

  44. arXiv:2310.15548  [pdf, ps, other

    eess.SP

    Knowledge-driven Meta-learning for CSI Feedback

    Authors: Han Xiao, Wenqiang Tian, Wendong Liu, Jiajia Guo, Zhi Zhang, Shi Jin, Zhihua Shi, Li Guo, Jia Shen

    Abstract: Accurate and effective channel state information (CSI) feedback is a key technology for massive multiple-input and multiple-output systems. Recently, deep learning (DL) has been introduced for CSI feedback enhancement through massive collected training data and lengthy training time, which is quite costly and impractical for realistic deployment. In this article, a knowledge-driven meta-learning a… ▽ More

    Submitted 25 October, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: text overlap with arXiv:2301.13475

  45. arXiv:2309.13018  [pdf, other

    eess.AS cs.CL cs.SD

    Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

    Authors: Jiamin Xie, Ke Li, Jinxi Guo, Andros Tjandra, Yuan Shangguan, Leda Sari, Chunyang Wu, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli

    Abstract: Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each language. In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in… ▽ More

    Submitted 11 January, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

  46. AI/ML for Beam Management in 5G-Advanced: A Standardization Perspective

    Authors: Qing Xue, Jiajia Guo, Binggui Zhou, Yongjun Xu, Zhidu Li, Shaodan Ma

    Abstract: In beamformed wireless cellular systems such as 5G New Radio (NR) networks, beam management (BM) is a crucial operation. In the second phase of 5G NR standardization, known as 5G-Advanced, which is being vigorously promoted, the key component is the use of artificial intelligence (AI) based on machine learning (ML) techniques. AI/ML for BM is selected as a representative use case. This article pro… ▽ More

    Submitted 24 July, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: accepted by IEEE Vehicular Technology Magazine

  47. arXiv:2309.06981  [pdf, other

    cs.CR cs.AI cs.LG cs.SD eess.AS

    MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems

    Authors: Hanqing Guo, Xun Chen, Junfeng Guo, Li Xiao, Qiben Yan

    Abstract: Speaker Verification (SV) is widely deployed in mobile systems to authenticate legitimate users by using their voice traits. In this work, we propose a backdoor attack MASTERKEY, to compromise the SV models. Different from previous attacks, we focus on a real-world practical setting where the attacker possesses no knowledge of the intended victim. To design MASTERKEY, we investigate the limitation… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted by Mobicom 2023

  48. arXiv:2309.06681  [pdf

    eess.IV cs.AI

    A plug-and-play synthetic data deep learning for undersampled magnetic resonance image reconstruction

    Authors: Min Xiao, Zi Wang, Jiefeng Guo, Xiaobo Qu

    Abstract: Magnetic resonance imaging (MRI) plays an important role in modern medical diagnostic but suffers from prolonged scan time. Current deep learning methods for undersampled MRI reconstruction exhibit good performance in image de-aliasing which can be tailored to the specific k-space undersampling scenario. But it is very troublesome to configure different deep networks when the sampling setting chan… ▽ More

    Submitted 8 October, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: 5 pages, 3 figures

  49. arXiv:2309.05423  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP

    Authors: Jinzuomu Zhong, Yang Li, Hui Huang, Korin Richmond, Jie Liu, Zhiba Su, Jing Guo, Benlai Tang, Fengjie Zhu

    Abstract: In expressive and controllable Text-to-Speech (TTS), explicit prosodic features significantly improve the naturalness and controllability of synthesised speech. However, manual prosody annotation is labor-intensive and inconsistent. To address this issue, a two-stage automatic annotation pipeline is novelly proposed in this paper. In the first stage, we use contrastive pretraining of Speech-Silenc… ▽ More

    Submitted 11 June, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

  50. arXiv:2309.04976  [pdf, other

    cs.LG cs.AI eess.SY

    AVARS -- Alleviating Unexpected Urban Road Traffic Congestion using UAVs

    Authors: Jiaying Guo, Michael R. Jones, Soufiene Djahel, Shen Wang

    Abstract: Reducing unexpected urban traffic congestion caused by en-route events (e.g., road closures, car crashes, etc.) often requires fast and accurate reactions to choose the best-fit traffic signals. Traditional traffic light control systems, such as SCATS and SCOOT, are not efficient as their traffic data provided by induction loops has a low update frequency (i.e., longer than 1 minute). Moreover, th… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.