Zum Hauptinhalt springen

Showing 1–50 of 1,552 results for author: Zhang, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.17397  [pdf, other

    cs.IT eess.SP

    End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework

    Authors: Chang Cai, Xiaojun Yuan, Ying-Jun Angela Zhang

    Abstract: This paper addresses the problem of end-to-end (E2E) design of learning and communication in a task-oriented semantic communication system. In particular, we consider a multi-device cooperative edge inference system over a wireless multiple-input multiple-output (MIMO) multiple access channel, where multiple devices transmit extracted features to a server to perform a classification task. We formu… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: major revision in IEEE JSAC

  2. arXiv:2408.16850  [pdf, other

    eess.SP

    MPADA: Open source framework for multimodal time series antenna array measurements

    Authors: Yuyi Chang, Yingzhe Zhang, Asimina Kiourti, Emre Ertin

    Abstract: This paper presents an open-source framework for collecting time series S-parameter measurements across multiple antenna elements, dubbed MPADA: Multi-Port Antenna Data Acquisition. The core of MPADA relies on the standard SCPI protocol to be compatible with a wide range of hardware platforms. Time series measurements are enabled through the use of a high-precision real-time clock (RTC), allowing… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: AMTA 2024

  3. arXiv:2408.16315  [pdf, other

    cs.HC cs.LG eess.SP

    Passenger hazard perception based on EEG signals for highly automated driving vehicles

    Authors: Ashton Yu Xuan Tan, Yingkai Yang, Xiaofei Zhang, Bowen Li, Xiaorong Gao, Sifa Zheng, Jianqiang Wang, Xinyu Gu, Jun Li, Yang Zhao, Yuxin Zhang, Tania Stathaki

    Abstract: Enhancing the safety of autonomous vehicles is crucial, especially given recent accidents involving automated systems. As passengers in these vehicles, humans' sensory perception and decision-making can be integrated with autonomous systems to improve safety. This study explores neural mechanisms in passenger-vehicle interactions, leading to the development of a Passenger Cognitive Model (PCM) and… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  4. arXiv:2408.16132  [pdf, other

    eess.AS cs.MM cs.SD

    SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge

    Authors: You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, Zhiyao Duan

    Abstract: With the advancements in singing voice generation and the growing presence of AI singers on media platforms, the inaugural Singing Voice Deepfake Detection (SVDD) Challenge aims to advance research in identifying AI-generated singing voices from authentic singers. This challenge features two tracks: a controlled setting track (CtrSVDD) and an in-the-wild scenario track (WildSVDD). The CtrSVDD trac… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  5. arXiv:2408.14892  [pdf, other

    cs.CL cs.SD eess.AS

    A Functional Trade-off between Prosodic and Semantic Cues in Conveying Sarcasm

    Authors: Zhu Li, Xiyuan Gao, Yuqing Zhang, Shekhar Nayak, Matt Coler

    Abstract: This study investigates the acoustic features of sarcasm and disentangles the interplay between the propensity of an utterance being used sarcastically and the presence of prosodic cues signaling sarcasm. Using a dataset of sarcastic utterances compiled from television shows, we analyze the prosodic features within utterances and key phrases belonging to three distinct sarcasm categories (embedded… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: accepted at Interspeech 2024

  6. arXiv:2408.14771  [pdf, other

    eess.AS

    Impact of Noisy Labels on Sound Event Detection: Deletion Errors Are More Detrimental Than Insertion Errors

    Authors: Yuliang Zhang, Defeng, Huang, Roberto Togneri

    Abstract: This study explores the critical but underexamined impact of label noise on Sound Event Detection (SED), which requires both sound identification and precise temporal localization. We categorize label noise into deletion, insertion, substitution, and subjective types and systematically evaluate their effects on SED using synthetic and real-life datasets. Our analysis shows that deletion noise sign… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  7. arXiv:2408.13832  [pdf, other

    eess.IV cs.CV

    A Low-dose CT Reconstruction Network Based on TV-regularized OSEM Algorithm

    Authors: Ran An, Yinghui Zhang, Xi Chen, Lemeng Li, Ke Chen, Hongwei Li

    Abstract: Low-dose computed tomography (LDCT) offers significant advantages in reducing the potential harm to human bodies. However, reducing the X-ray dose in CT scanning often leads to severe noise and artifacts in the reconstructed images, which might adversely affect diagnosis. By utilizing the expectation maximization (EM) algorithm, statistical priors could be combined with artificial priors to improv… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 11 pages, 8 figures

    ACM Class: I.4.5

  8. arXiv:2408.13733  [pdf, other

    eess.IV cs.CV

    Anatomical Consistency Distillation and Inconsistency Synthesis for Brain Tumor Segmentation with Missing Modalities

    Authors: Zheyu Zhang, Xinzhao Liu, Zheng Chen, Yueyi Zhang, Huanjing Yue, Yunwei Ou, Xiaoyan Sun

    Abstract: Multi-modal Magnetic Resonance Imaging (MRI) is imperative for accurate brain tumor segmentation, offering indispensable complementary information. Nonetheless, the absence of modalities poses significant challenges in achieving precise segmentation. Recognizing the shared anatomical structures between mono-modal and multi-modal representations, it is noteworthy that mono-modal images typically ex… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: Accepted Paper to European Conference on Artificial Intelligence (ECAI 2024)

  9. arXiv:2408.13549  [pdf, other

    eess.SP

    A Superdirective Beamforming Approach based on MultiTransUNet-GAN

    Authors: Yali Zhang, Haifan Yin, Liangcheng Han

    Abstract: In traditional multiple-input multiple-output (MIMO) communication systems, the antenna spacing is often no smaller than half a wavelength. However, by exploiting the coupling between more closely-spaced antennas, a superdirective array may achieve a much higher beamforming gain than traditional MIMO. In this paper, we present a novel utilization of neural networks in the context of superdirective… ▽ More

    Submitted 27 August, 2024; v1 submitted 24 August, 2024; originally announced August 2024.

    Comments: 12 pages, 11 figures, 6 tables, to appear in IEEE Trans. Commun

  10. arXiv:2408.13290  [pdf, ps, other

    eess.IV cs.CV

    Multi-modal Intermediate Feature Interaction AutoEncoder for Overall Survival Prediction of Esophageal Squamous Cell Cancer

    Authors: Chengyu Wu, Yatao Zhang, Yaqi Wang, Qifeng Wang, Shuai Wang

    Abstract: Survival prediction for esophageal squamous cell cancer (ESCC) is crucial for doctors to assess a patient's condition and tailor treatment plans. The application and development of multi-modal deep learning in this field have attracted attention in recent years. However, the prognostically relevant features between cross-modalities have not been further explored in previous studies, which could hi… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by ISBI 2024

  11. arXiv:2408.12822  [pdf, other

    cs.RO eess.SY

    Courteous MPC for Autonomous Driving with CBF-inspired Risk Assessment

    Authors: Yanze Zhang, Yiwei Lyu, Sude E. Demir, Xingyu Zhou, Yupeng Yang, Junmin Wang, Wenhao Luo

    Abstract: With more autonomous vehicles (AVs) sharing roadways with human-driven vehicles (HVs), ensuring safe and courteous maneuvers that respect HVs' behavior becomes increasingly important. To promote both safety and courtesy in AV's behavior, an extension of Control Barrier Functions (CBFs)-inspired risk evaluation framework is proposed in this paper by considering both noisy observed positions and vel… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 7 pages, accepted to ITSC 2024

  12. arXiv:2408.12535  [pdf, other

    eess.SY

    Impact of the Inflation Reduction Act and Carbon Capture on Transportation Electrification for a Net-Zero Western U.S. Grid

    Authors: Samrat Acharya, Malini Ghosal, Travis Thurber, Ying Zhang, Casey D. Burleyson, Nathalie Voisin

    Abstract: The electrification of transportation is critical to mitigate Greenhouse Gas (GHG) emissions. The United States (U.S.) government's Inflation Reduction Act (IRA) of 2022 introduces policies to promote the electrification of transportation. In addition to electrifying transportation, clean energy technologies such as Carbon Capture and Storage (CCS) may play a major role in achieving a net-zero ene… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: This is a preprint. It's complete copyright version will be available on the publisher's website after publication

  13. arXiv:2408.12534  [pdf, other

    eess.IV cs.AI cs.CV

    Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge

    Authors: Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Ershuai Wang, Qin Zhou, Ziyan Huang, Pengju Lyu, Jian He, Bo Wang

    Abstract: Organ and cancer segmentation in abdomen Computed Tomography (CT) scans is the prerequisite for precise cancer diagnosis and treatment. Most existing benchmarks and algorithms are tailored to specific cancer types, limiting their ability to provide comprehensive cancer analysis. This work presents the first international competition on abdominal organ and pan-cancer segmentation by providing a lar… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: MICCAI 2024 FLARE Challenge Summary

  14. arXiv:2408.11290  [pdf, other

    eess.SP cs.IT

    Privacy Preservation in Delay-Based Localization Systems: Artificial Noise or Artificial Multipath?

    Authors: Yuchen Zhang, Hui Chen, Henk Wymeersch

    Abstract: Localization plays an increasingly pivotal role in 5G/6G systems, enabling various applications. This paper focuses on the privacy concerns associated with delay-based localization, where unauthorized base stations attempt to infer the location of the end user. We propose a method to disrupt localization at unauthorized nodes by injecting artificial components into the pilot signal, exploiting mod… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 6pages, conference paper

  15. arXiv:2408.10235  [pdf, other

    eess.SP cs.HC cs.LG

    Multi-Source EEG Emotion Recognition via Dynamic Contrastive Domain Adaptation

    Authors: Yun Xiao, Yimeng Zhang, Xiaopeng Peng, Shuzheng Han, Xia Zheng, Dingyi Fang, Xiaojiang Chen

    Abstract: Electroencephalography (EEG) provides reliable indications of human cognition and mental states. Accurate emotion recognition from EEG remains challenging due to signal variations among individuals and across measurement sessions. To address these challenges, we introduce a multi-source dynamic contrastive domain adaptation method (MS-DCDA), which models coarse-grained inter-domain and fine-graine… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  16. arXiv:2408.09938  [pdf, other

    eess.SY

    Minimal Sensor Placement for Generic State and Unknown Input Observability

    Authors: Ranbo Cheng, Yuan Zhang, Amin MD Al, Yuanqing Xia

    Abstract: This paper addresses the problem of selecting the minimum number of dedicated sensors to achieve observability in the presence of unknown inputs, namely, the state and input observability, for linear time-invariant systems. We assume that the only available information is the zero-nonzero structure of system matrices, and approach this problem within a structured system model. We revisit the conce… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 12 pages, 6 figures

  17. arXiv:2408.09491  [pdf, other

    cs.SD eess.AS

    A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition

    Authors: Yangze Li, Xiong Wang, Songjun Cao, Yike Zhang, Long Ma, Lei Xie

    Abstract: Audio-LLM introduces audio modality into a large language model (LLM) to enable a powerful LLM to recognize, understand, and generate audio. However, during speech recognition in noisy environments, we observed the presence of illusions and repetition issues in audio-LLM, leading to substitution and insertion errors. This paper proposes a transcription prompt-based audio-LLM by introducing an ASR… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  18. arXiv:2408.08669  [pdf, other

    cs.SD eess.AS

    HSDreport: Heart Sound Diagnosis with Echocardiography Reports

    Authors: Zihan Zhao, Pingjie Wang, Liudan Zhao, Yuchen Yang, Ya Zhang, Kun Sun, Xin Sun, Xin Zhou, Yu Wang, Yanfeng Wang

    Abstract: Heart sound auscultation holds significant importance in the diagnosis of congenital heart disease. However, existing methods for Heart Sound Diagnosis (HSD) tasks are predominantly limited to a few fixed categories, framing the HSD task as a rigid classification problem that does not fully align with medical practice and offers only limited information to physicians. Besides, such methods do not… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  19. arXiv:2408.08593  [pdf, other

    cs.LG eess.SY

    RadioDiff: An Effective Generative Diffusion Model for Sampling-Free Dynamic Radio Map Construction

    Authors: Xiucheng Wang, Keda Tao, Nan Cheng, Zhisheng Yin, Zan Li, Yuan Zhang, Xuemin Shen

    Abstract: Radio map (RM) is a promising technology that can obtain pathloss based on only location, which is significant for 6G network applications to reduce the communication costs for pathloss estimation. However, the construction of RM in traditional is either computationally intensive or depends on costly sampling-based pathloss measurements. Although the neural network (NN)-based method can efficientl… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  20. arXiv:2408.06558  [pdf, other

    eess.SP

    Can Wireless Environmental Information Decrease Pilot Overhead: A CSI Prediction Example

    Authors: Lianzheng Shi, Jianhua Zhang, Li Yu, Yuxiang Zhang, Zhen Zhang, Yichen Cai, Guangyi Liu

    Abstract: Channel state information (CSI) is crucial for massive multi-input multi-output (MIMO) system. As the antenna scale increases, acquiring CSI results in significantly higher system overhead. In this letter, we propose a novel channel prediction method which utilizes wireless environmental information with pilot pattern optimization for CSI prediction (WEI-CSIP). Specifically, scatterers around the… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  21. arXiv:2408.05928  [pdf, other

    cs.SD eess.AS

    Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation

    Authors: Xiaoxiao Miao, Yuxiang Zhang, Xin Wang, Natalia Tomashenko, Donny Cheng Lock Soh, Ian Mcloughlin

    Abstract: A general disentanglement-based speaker anonymization system typically separates speech into content, speaker, and prosody features using individual encoders. This paper explores how to adapt such a system when a new speech attribute, for example, emotion, needs to be preserved to a greater extent. While existing systems are good at anonymizing speaker embeddings, they are not designed to preserve… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  22. Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI

    Authors: Lei Zhou, Yuzhong Zhang, Jiadong Zhang, Xuejun Qian, Chen Gong, Kun Sun, Zhongxiang Ding, Xing Wang, Zhenhui Li, Zaiyi Liu, Dinggang Shen

    Abstract: Automated breast tumor segmentation on the basis of dynamic contrast-enhancement magnetic resonance imaging (DCE-MRI) has shown great promise in clinical practice, particularly for identifying the presence of breast disease. However, accurate segmentation of breast tumor is a challenging task, often necessitating the development of complex networks. To strike an optimal trade-off between computati… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Journal ref: 2024,IEEE Transactions on Medical Imaging

  23. arXiv:2408.05777  [pdf, other

    cs.CV cs.AI eess.IV

    Seg-CycleGAN : SAR-to-optical image translation guided by a downstream task

    Authors: Hannuo Zhang, Huihui Li, Jiarui Lin, Yujie Zhang, Jianghua Fan, Hang Liu

    Abstract: Optical remote sensing and Synthetic Aperture Radar(SAR) remote sensing are crucial for earth observation, offering complementary capabilities. While optical sensors provide high-quality images, they are limited by weather and lighting conditions. In contrast, SAR sensors can operate effectively under adverse conditions. This letter proposes a GAN-based SAR-to-optical image translation method name… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 8 pages, 5 figures

  24. arXiv:2408.05185  [pdf, other

    eess.SY math.OC

    Efficient Unit Commitment Constraint Screening under Uncertainty

    Authors: Xuan He, Honglin Wen, Yufan Zhang, Yize Chen, Danny H. K. Tsang

    Abstract: Day-ahead unit commitment (UC) is a fundamental task for power system operators, where generator statuses and power dispatch are determined based on the forecasted nodal net demands. The uncertainty inherent in renewables and load forecasting requires the use of techniques in optimization under uncertainty to find more resilient and reliable UC solutions. However, the solution procedure of such sp… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: In submission, 11 pages, 10 figures

  25. arXiv:2408.04967  [pdf, other

    eess.AS cs.SD

    ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild

    Authors: Jiangyan Yi, Chu Yuan Zhang, Jianhua Tao, Chenglong Wang, Xinrui Yan, Yong Ren, Hao Gu, Junzuo Zhou

    Abstract: The growing prominence of the field of audio deepfake detection is driven by its wide range of applications, notably in protecting the public from potential fraud and other malicious activities, prompting the need for greater attention and research in this area. The ADD 2023 challenge goes beyond binary real/fake classification by emulating real-world scenarios, such as the identification of manip… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  26. arXiv:2408.03885  [pdf, other

    cs.CV eess.IV

    Global-Local Progressive Integration Network for Blind Image Quality Assessment

    Authors: Xiaoqi Wang, Yun Zhang

    Abstract: Vision transformers (ViTs) excel in computer vision for modeling long-term dependencies, yet face two key challenges for image quality assessment (IQA): discarding fine details during patch embedding, and requiring extensive training data due to lack of inductive biases. In this study, we propose a Global-Local progressive INTegration network for IQA, called GlintIQA, to address these issues throu… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  27. arXiv:2408.02160  [pdf, ps, other

    eess.SP

    Modeling and Design of RIS-Assisted Multi-cell Multi-band Networks with RSMA

    Authors: Abdelhamid Salem, Kai-Kit Wong, Chan-Byoung Chae, Yangyang Zhang

    Abstract: Reconfigurable intelligent surface (RIS) has been identified as a promising technology for future wireless communication systems due to its ability to manipulate the propagation environment intelligently. RIS is a frequency-selective device, thus it can only effectively manipulate the propagation of signals within a specific frequency band. This frequency selective characteristic can make deployin… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: 13 pages

  28. arXiv:2408.01672  [pdf, ps, other

    eess.SP cs.AI

    radarODE: An ODE-Embedded Deep Learning Model for Contactless ECG Reconstruction from Millimeter-Wave Radar

    Authors: Yuanyuan Zhang, Runwei Guan, Lingxiao Li, Rui Yang, Yutao Yue, Eng Gee Lim

    Abstract: Radar-based contactless cardiac monitoring has become a popular research direction recently, but the fine-grained electrocardiogram (ECG) signal is still hard to reconstruct from millimeter-wave radar signal. The key obstacle is to decouple the cardiac activities in the electrical domain (i.e., ECG) from that in the mechanical domain (i.e., heartbeat), and most existing research only uses pure dat… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  29. arXiv:2408.01648  [pdf

    eess.IV cs.CV

    Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2

    Authors: Ange Lou, Yamin Li, Yike Zhang, Robert F. Labadie, Jack Noble

    Abstract: The Segment Anything Model 2 (SAM 2) is the latest generation foundation model for image and video segmentation. Trained on the expansive Segment Anything Video (SA-V) dataset, which comprises 35.5 million masks across 50.9K videos, SAM 2 advances its predecessor's capabilities by supporting zero-shot segmentation through various prompts (e.g., points, boxes, and masks). Its robust zero-shot perfo… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: The first work evaluates the performance of SAM 2 in surgical videos

  30. arXiv:2408.00381  [pdf, other

    cs.IT eess.SY

    Statistical AoI Guarantee Optimization for Supporting xURLLC in ISAC-enabled V2I Networks

    Authors: Yanxi Zhang, Mingwu Yao, Qinghai Yang, Dongqi Yan, Xu Zhang, Xu Bao, Muyu Mei

    Abstract: This paper addresses the critical challenge of supporting next-generation ultra-reliable and low-latency communication (xURLLC) within integrated sensing and communication (ISAC)-enabled vehicle-to-infrastructure (V2I) networks. We incorporate channel evaluation and retransmission mechanisms for real-time reliability enhancement. Using stochastic network calculus (SNC), we establish a theoretical… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  31. arXiv:2407.19253  [pdf, other

    eess.SY

    Taylor-Expansion-Based Robust Power Flow in Unbalanced Distribution Systems: A Hybrid Data-Aided Method

    Authors: Sungjoo Chung, Ying Zhang, Zhaoyu Wang, Fei Ding

    Abstract: Traditional power flow methods often adopt certain assumptions designed for passive balanced distribution systems, thus lacking practicality for unbalanced operation. Moreover, their computation accuracy and efficiency are heavily subject to unknown errors and bad data in measurements or prediction data of distributed energy resources (DERs). To address these issues, this paper proposes a hybrid d… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: Physics-informed machine learning, unbalanced distribution systems, power flow, data-driven, distributed energy resources, outliers, regression

  32. arXiv:2407.19161  [pdf

    eess.SP physics.app-ph

    Compact SPICE model for TeraFET resonant detectors

    Authors: Xueqing Liu, Yuhui Zhang, Trond Ytterdal, Michael Shur

    Abstract: This paper presents an improved compact model for TeraFETs employing a nonlinear transmission line approach to describe the non-uniform carrier density oscillations and electron inertia effects in the TeraFET channels. By calculating the equivalent components for each segment of the channel: conductance, capacitance, and inductance, based on the voltages at the segment's nodes, our model accommoda… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  33. arXiv:2407.18596  [pdf, ps, other

    eess.SY

    Piecewise constant tuning gain based singularity-free MRAC with application to aircraft control systems

    Authors: Zhipeng Zhang, Yanjun Zhang, Jian Sun

    Abstract: This paper introduces an innovative singularity-free output feedback model reference adaptive control (MRAC) method applicable to a wide range of continuous-time linear time-invariant (LTI) systems with general relative degrees. Unlike existing solutions such as Nussbaum and multiple-model-based methods, which manage unknown high-frequency gains through persistent switching and repeated parameter… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 9 pages, 6 figures

    MSC Class: 93A10; 93B52; 93C40; 93D20

  34. arXiv:2407.16684  [pdf, other

    eess.IV cs.CV q-bio.NC

    AutoRG-Brain: Grounded Report Generation for Brain MRI

    Authors: Jiayu Lei, Xiaoman Zhang, Chaoyi Wu, Lisong Dai, Ya Zhang, Yanyong Zhang, Yanfeng Wang, Weidi Xie, Yuehua Li

    Abstract: Radiologists are tasked with interpreting a large number of images in a daily base, with the responsibility of generating corresponding reports. This demanding workload elevates the risk of human error, potentially leading to treatment delays, increased healthcare costs, revenue loss, and operational inefficiencies. To address these challenges, we initiate a series of work on grounded Automatic Re… ▽ More

    Submitted 29 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

  35. arXiv:2407.15226  [pdf, other

    eess.SP eess.SY

    Variation Bayesian Interference for Multiple Extended Targets or Unresolved Group Targets Tracking

    Authors: Yuanhao Cheng, Yunhe Cao, Tat-Soon Yeo, Yulin Zhang, Fu Jie

    Abstract: In this work, we propose a tracking method for multiple extended targets or unresolvable group targets based on the Variational Bayesian Inference (VBI). Firstly, based on the most commonly used Random Matrix Model (RMM), the joint states of a single target are modeled as a Gamma Gaussian Inverse Wishart (GGIW) distribution, and the multi-target joint association variables are involved in the esti… ▽ More

    Submitted 6 August, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: 21 pages, 15 figures, 3 tables

  36. arXiv:2407.14775  [pdf, other

    eess.SY

    Phase Re-service in Reinforcement Learning Traffic Signal Control

    Authors: Zhiyao Zhang, George Gunter, Marcos Quinones-Grueiro, Yuhang Zhang, William Barbour, Gautam Biswas, Daniel Work

    Abstract: This article proposes a novel approach to traffic signal control that combines phase re-service with reinforcement learning (RL). The RL agent directly determines the duration of the next phase in a pre-defined sequence. Before the RL agent's decision is executed, we use the shock wave theory to estimate queue expansion at the designated movement allowed for re-service and decide if phase re-servi… ▽ More

    Submitted 2 August, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

    Comments: Accepted to IEEE ITSC 2024

  37. arXiv:2407.14651  [pdf, other

    eess.IV cs.AI cs.CV

    Improving Representation of High-frequency Components for Medical Foundation Models

    Authors: Yuetan Chu, Yilan Zhang, Zhongyi Han, Changchun Yang, Longxi Zhou, Gongning Luo, Xin Gao

    Abstract: Foundation models have recently attracted significant attention for their impressive generalizability across diverse downstream tasks. However, these models are demonstrated to exhibit great limitations in representing high-frequency components and fine-grained details. In many medical imaging tasks, the precise representation of such information is crucial due to the inherently intricate anatomic… ▽ More

    Submitted 25 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

  38. arXiv:2407.13491  [pdf, other

    eess.SP cs.IT

    Performance Analysis and Low-Complexity Beamforming Design for Near-Field Physical Layer Security

    Authors: Yunpu Zhang, Yuan Fang, Xianghao Yu, Changsheng You, Ying-Jun Angela Zhang

    Abstract: Extremely large-scale arrays (XL-arrays) have emerged as a key enabler in achieving the unprecedented performance requirements of future wireless networks, leading to a significant increase in the range of the near-field region. This transition necessitates the spherical wavefront model for characterizing the wireless propagation rather than the far-field planar counterpart, thereby introducing ex… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 13 pages, 13 figures

  39. Trajectory and Power Optimization for Multi-UAV Enabled Emergency Wireless Communications Networks

    Authors: Yixin Zhang, Wenchi Cheng

    Abstract: Recently, unmanned aerial vehicle (UAV) has attracted much attention due to its flexible deployment and controllable mobility. As the general communication network cannot meet the emergency requirements, in this paper we study the multi-UAV enabled wireless emergency communication system. Our goal is to maximize the capacity with jointly optimizing trajectory and allocating power. To tackle this n… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 6 pages, 3 figures

    Journal ref: 2019 IEEE International Conference on Communications Workshops (ICC Workshops)

  40. arXiv:2407.10328  [pdf, other

    cs.SD cs.AI eess.AS

    The Interpretation Gap in Text-to-Music Generation Models

    Authors: Yongyi Zang, Yixiao Zhang

    Abstract: Large-scale text-to-music generation models have significantly enhanced music creation capabilities, offering unprecedented creative freedom. However, their ability to collaborate effectively with human musicians remains limited. In this paper, we propose a framework to describe the musical interaction process, which includes expression, interpretation, and execution of controls. Following this fr… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Under review

  41. arXiv:2407.09026  [pdf, other

    cs.CV cs.LG cs.MM eess.IV

    HPC: Hierarchical Progressive Coding Framework for Volumetric Video

    Authors: Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya Zhang, Yanfeng Wang

    Abstract: Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications, but its substantial data volume poses significant challenges for compression and transmission. Current NeRF compression lacks the flexibility to adjust video quality and bitrate within a single model for various network and device capacities. To address these issues, we propose HPC, a novel hie… ▽ More

    Submitted 2 August, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: 11 pages, 7 figures, ACM Multimedia 24

  42. arXiv:2407.08167  [pdf, other

    eess.IV cs.CV

    DSCENet: Dynamic Screening and Clinical-Enhanced Multimodal Fusion for MPNs Subtype Classification

    Authors: Yuan Zhang, Yaolei Qi, Xiaoming Qi, Yongyue Wei, Guanyu Yang

    Abstract: The precise subtype classification of myeloproliferative neoplasms (MPNs) based on multimodal information, which assists clinicians in diagnosis and long-term treatment plans, is of great clinical significance. However, it remains a great challenging task due to the lack of diagnostic representativeness for local patches and the absence of diagnostic-relevant features from a single modality. In th… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI2024

  43. arXiv:2407.07302  [pdf, other

    eess.IV cs.CV

    Pairwise Distance Distillation for Unsupervised Real-World Image Super-Resolution

    Authors: Yuehan Zhang, Seungjun Lee, Angela Yao

    Abstract: Standard single-image super-resolution creates paired training data from high-resolution images through fixed downsampling kernels. However, real-world super-resolution (RWSR) faces unknown degradations in the low-resolution inputs, all the while lacking paired training data. Existing methods approach this problem by learning blind general models through complex synthetic augmentations on training… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  44. arXiv:2407.06691  [pdf, other

    cs.IT eess.SP

    OFDM Achieves the Lowest Ranging Sidelobe Under Random ISAC Signaling

    Authors: Fan Liu, Ying Zhang, Yifeng Xiong, Shuangyang Li, Weijie Yuan, Feifei Gao, Shi Jin, Giuseppe Caire

    Abstract: This paper aims to answer a fundamental question in the area of Integrated Sensing and Communications (ISAC): What is the optimal communication-centric ISAC waveform for ranging? Towards that end, we first established a generic framework to analyze the sensing performance of communication-centric ISAC waveforms built upon orthonormal signaling bases and random data symbols. Then, we evaluated thei… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 14 pages, 12 figures, submitted to IEEE for possible publication

  45. arXiv:2407.05758  [pdf, other

    eess.IV cs.AI cs.CV

    Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports

    Authors: Yutong Zhang, Yi Pan, Tianyang Zhong, Peixin Dong, Kangni Xie, Yuxiao Liu, Hanqi Jiang, Zhengliang Liu, Shijie Zhao, Tuo Zhang, Xi Jiang, Dinggang Shen, Tianming Liu, Xin Zhang

    Abstract: Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecti… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  46. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  47. arXiv:2407.04336  [pdf, ps, other

    eess.SP cs.AI

    AI-Based Beam-Level and Cell-Level Mobility Management for High Speed Railway Communications

    Authors: Wen Li, Wei Chen, Shiyue Wang, Yuanyuan Zhang, Michail Matthaiou, Bo Ai

    Abstract: High-speed railway (HSR) communications are pivotal for ensuring rail safety, operations, maintenance, and delivering passenger information services. The high speed of trains creates rapidly time-varying wireless channels, increases the signaling overhead, and reduces the system throughput, making it difficult to meet the growing and stringent needs of HSR applications. In this article, we explore… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  48. arXiv:2407.03885  [pdf, other

    cs.CV eess.IV

    Perception-Guided Quality Metric of 3D Point Clouds Using Hybrid Strategy

    Authors: Yujie Zhang, Qi Yang, Yiling Xu, Shan Liu

    Abstract: Full-reference point cloud quality assessment (FR-PCQA) aims to infer the quality of distorted point clouds with available references. Most of the existing FR-PCQA metrics ignore the fact that the human visual system (HVS) dynamically tackles visual information according to different distortion levels (i.e., distortion detection for high-quality samples and appearance perception for low-quality sa… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  49. arXiv:2407.02744  [pdf, other

    eess.IV cs.CV

    Highly Accelerated MRI via Implicit Neural Representation Guided Posterior Sampling of Diffusion Models

    Authors: Jiayue Chu, Chenhe Du, Xiyue Lin, Yuyao Zhang, Hongjiang Wei

    Abstract: Reconstructing high-fidelity magnetic resonance (MR) images from under-sampled k-space is a commonly used strategy to reduce scan time. The posterior sampling of diffusion models based on the real measurement data holds significant promise of improved reconstruction accuracy. However, traditional posterior sampling methods often lack effective data consistency guidance, leading to inaccurate and u… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  50. Coding-Enhanced Cooperative Jamming for Secret Communication in Fluid Antenna Systems

    Authors: Hao Xu, Kai-Kit Wong, Wee Kiat New, Guyue Li, Farshad Rostami Ghadi, Yongxu Zhu, Shi Jin, Chan-Byoung Chae, Yangyang Zhang

    Abstract: This letter investigates the secret communication problem for a fluid antenna system (FAS)-assisted wiretap channel, where the legitimate transmitter transmits an information-bearing signal to the legitimate receiver, and at the same time, transmits a jamming signal to interfere with the eavesdropper (Eve). Unlike the conventional jamming scheme, which usually transmits Gaussian noise that interfe… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 6 pages, 3 figures, this paper has been accepted by IEEE Communications Letters