Zum Hauptinhalt springen

Showing 1–50 of 238 results for author: Zhang, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.11289  [pdf, other

    eess.IV cs.CV

    HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image Segmentation

    Authors: Mingya Zhang, Limei Gu, Tingshen Ling, Xianping Tao

    Abstract: In the field of medical image segmentation, models based on both CNN and Transformer have been thoroughly investigated. However, CNNs have limited modeling capabilities for long-range dependencies, making it challenging to exploit the semantic information within images fully. On the other hand, the quadratic computational complexity poses a challenge for Transformers. State Space Models (SSMs), su… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.09157; text overlap with arXiv:2407.08083 by other authors

  2. arXiv:2408.03651  [pdf, other

    eess.IV cs.CV

    SAM2-PATH: A better segment anything model for semantic segmentation in digital pathology

    Authors: Mingya Zhang, Liang Wang, Limei Gu, Zhao Li, Yaohui Wang, Tingshen Ling, Xianping Tao

    Abstract: The semantic segmentation task in pathology plays an indispensable role in assisting physicians in determining the condition of tissue lesions. Foundation models, such as the SAM (Segment Anything Model) and SAM2, exhibit exceptional performance in instance segmentation within everyday natural scenes. SAM-PATH has also achieved impressive results in semantic segmentation within the field of pathol… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 6 pages , 3 figures

  3. arXiv:2407.20554  [pdf, other

    math.AP eess.SY

    An anisotropic traffic flow model with look-ahead effect for mixed autonomy traffic

    Authors: Shouwei Hui, Michael Zhang

    Abstract: In this paper we extend the Aw-Rascle-Zhang (ARZ) non-equilibrium traffic flow model to take into account the look-ahead capability of connected and autonomous vehicles (CAVs), and the mixed flow dynamics of human driven and autonomous vehicles. The look-ahead effect of CAVs is captured by a non-local averaged density within a certain distance (the look-ahead distance). We show, using wave perturb… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Submitted to TRB Annual Meeting 2025

  4. arXiv:2407.18324  [pdf, other

    cs.LG cs.CL eess.AS q-fin.CP q-fin.ST

    AMA-LSTM: Pioneering Robust and Fair Financial Audio Analysis for Stock Volatility Prediction

    Authors: Shengkun Wang, Taoran Ji, Jianfeng He, Mariam Almutairi, Dan Wang, Linhan Wang, Min Zhang, Chang-Tien Lu

    Abstract: Stock volatility prediction is an important task in the financial industry. Recent advancements in multimodal methodologies, which integrate both textual and auditory data, have demonstrated significant improvements in this domain, such as earnings calls (Earnings calls are public available and often involve the management team of a public company and interested parties to discuss the company's ea… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  5. arXiv:2407.11219  [pdf, other

    cs.CV eess.IV

    TLRN: Temporal Latent Residual Networks For Large Deformation Image Registration

    Authors: Nian Wu, Jiarui Xing, Miaomiao Zhang

    Abstract: This paper presents a novel approach, termed {\em Temporal Latent Residual Network (TLRN)}, to predict a sequence of deformation fields in time-series image registration. The challenge of registering time-series images often lies in the occurrence of large motions, especially when images differ significantly from a reference (e.g., the start of a cardiac cycle compared to the peak stretching phase… ▽ More

    Submitted 23 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 10 pages. Accepted by MICCAI 2024

  6. arXiv:2407.08555  [pdf, other

    eess.IV cs.CV

    SLoRD: Structural Low-Rank Descriptors for Shape Consistency in Vertebrae Segmentation

    Authors: Xin You, Yixin Lou, Minghui Zhang, Chuyan Zhang, Jie Yang, Yun Gu

    Abstract: Automatic and precise segmentation of vertebrae from CT images is crucial for various clinical applications. However, due to a lack of explicit and strict constraints, existing methods especially for single-stage methods, still suffer from the challenge of intra-vertebrae segmentation inconsistency, which refers to multiple label predictions inside a singular vertebra. For multi-stage methods, ver… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Under review

  7. arXiv:2407.07306  [pdf

    physics.med-ph eess.SY

    Electrical Impedance Tomography Based Closed-loop Tumor Treating Fields in Dynamic Lung Tumors

    Authors: Minmin Wang, Xu Xie, Yuxi Guo, Liying Zhu, Yue Lan, Haitang Yang, Yun Pan, Guangdi Chen, Shaomin Zhang, Maomao Zhang

    Abstract: Tumor Treating Fields (TTFields) is a non-invasive anticancer modality that utilizes alternating electric fields to disrupt cancer cell division and growth. While generally well-tolerated with minimal side effects, traditional TTFields therapy for lung tumors faces challenges due to the influence of respiratory motion. We design a novel closed-loop TTFields strategy for lung tumors by incorporatin… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 7 pages, 5 figures

  8. arXiv:2407.05310  [pdf, other

    eess.SP cs.NE cs.SD eess.AS

    Ternary Spike-based Neuromorphic Signal Processing System

    Authors: Shuai Wang, Dehao Zhang, Ammar Belatreche, Yichen Xiao, Hongyu Qing, Wenjie We, Malu Zhang, Yang Yang

    Abstract: Deep Neural Networks (DNNs) have been successfully implemented across various signal processing fields, resulting in significant enhancements in performance. However, DNNs generally require substantial computational resources, leading to significant economic costs and posing challenges for their deployment on resource-constrained edge devices. In this study, we take advantage of spiking neural net… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  9. arXiv:2406.18088  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    LLM-Driven Multimodal Opinion Expression Identification

    Authors: Bonian Jia, Huiyao Chen, Yueheng Sun, Meishan Zhang, Min Zhang

    Abstract: Opinion Expression Identification (OEI) is essential in NLP for applications ranging from voice assistants to depression diagnosis. This study extends OEI to encompass multimodal inputs, underlining the significance of auditory cues in delivering emotional subtleties beyond the capabilities of text. We introduce a novel multimodal OEI (MOEI) task, integrating text and speech to mirror real-world s… ▽ More

    Submitted 29 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 Figures, Accept by Interspeech 2024

  10. arXiv:2406.17784  [pdf, other

    eess.SP

    Scalable Near-Field Localization Based on Partitioned Large-Scale Antenna Array

    Authors: Xiaojun Yuan, Yuqing Zheng, Mingchen Zhang, Boyu Teng, Wenjun Jiang

    Abstract: This paper studies a passive localization system, where an extremely large-scale antenna array (ELAA) is deployed at the base station (BS) to locate a user equipment (UE) residing in its near-field (Fresnel) region. We propose a novel algorithm, named array partitioning-based location estimation (APLE), for scalable near-field localization. The APLE algorithm is developed based on the basic assump… ▽ More

    Submitted 13 May, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2312.12342

  11. arXiv:2406.16871  [pdf, other

    eess.SY

    Neural network based model predictive control of voltage for a polymer electrolyte fuel cell system with constraints

    Authors: Xiufei Li, Miao Yang, Yuanxin Qi, Miao Zhang

    Abstract: A fuel cell system must output a steady voltage as a power source in practical use. A neural network (NN) based model predictive control (MPC) approach is developed in this work to regulate the fuel cell output voltage with safety constraints. The developed NN MPC controller stabilizes the polymer electrolyte fuel cell system's output voltage by controlling the hydrogen and air flow rates at the s… ▽ More

    Submitted 24 March, 2024; originally announced June 2024.

  12. arXiv:2406.16326  [pdf, other

    eess.AS

    RefXVC: Cross-Lingual Voice Conversion with Enhanced Reference Leveraging

    Authors: Mingyang Zhang, Yi Zhou, Yi Ren, Chen Zhang, Xiang Yin, Haizhou Li

    Abstract: This paper proposes RefXVC, a method for cross-lingual voice conversion (XVC) that leverages reference information to improve conversion performance. Previous XVC works generally take an average speaker embedding to condition the speaker identity, which does not account for the changing timbre of speech that occurs with different pronunciations. To address this, our method uses both global and loc… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Manuscript under review by TASLP

  13. arXiv:2406.14186  [pdf, other

    eess.IV cs.CV

    CriDiff: Criss-cross Injection Diffusion Framework via Generative Pre-train for Prostate Segmentation

    Authors: Tingwei Liu, Miao Zhang, Leiye Liu, Jialong Zhong, Shuyao Wang, Yongri Piao, Huchuan Lu

    Abstract: Recently, the Diffusion Probabilistic Model (DPM)-based methods have achieved substantial success in the field of medical image segmentation. However, most of these methods fail to enable the diffusion model to learn edge features and non-edge features effectively and to inject them efficiently into the diffusion backbone. Additionally, the domain gap between the images features and the diffusion… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted in MICCAI 2024

  14. arXiv:2406.13179  [pdf, other

    cs.SD cs.AI cs.NE eess.AS

    Global-Local Convolution with Spiking Neural Networks for Energy-efficient Keyword Spotting

    Authors: Shuai Wang, Dehao Zhang, Kexin Shi, Yuchen Wang, Wenjie Wei, Jibin Wu, Malu Zhang

    Abstract: Thanks to Deep Neural Networks (DNNs), the accuracy of Keyword Spotting (KWS) has made substantial progress. However, as KWS systems are usually implemented on edge devices, energy efficiency becomes a critical requirement besides performance. Here, we take advantage of spiking neural networks' energy efficiency and propose an end-to-end lightweight KWS model. The model consists of two innovative… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  15. arXiv:2406.10844  [pdf, other

    eess.AS cs.SD

    Multi-Scale Accent Modeling with Disentangling for Multi-Speaker Multi-Accent TTS Synthesis

    Authors: Xuehao Zhou, Mingyang Zhang, Yi Zhou, Zhizheng Wu, Haizhou Li

    Abstract: Synthesizing speech across different accents while preserving the speaker identity is essential for various real-world customer applications. However, the individual and accurate modeling of accents and speakers in a text-to-speech (TTS) system is challenging due to the complexity of accent variations and the intrinsic entanglement between the accent and speaker identity. In this paper, we present… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  16. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, Jinming Guo, Xiaolin Chen, Jingcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  17. arXiv:2406.07330  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    CTC-based Non-autoregressive Textless Speech-to-Speech Translation

    Authors: Qingkai Fang, Zhengrui Ma, Yan Zhou, Min Zhang, Yang Feng

    Abstract: Direct speech-to-speech translation (S2ST) has achieved impressive translation quality, but it often faces the challenge of slow decoding due to the considerable length of speech sequences. Recently, some research has turned to non-autoregressive (NAR) models to expedite decoding, yet the translation quality typically lags behind autoregressive (AR) models significantly. In this paper, we investig… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

    ACM Class: I.2.7

  18. arXiv:2406.07289  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?

    Authors: Qingkai Fang, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng

    Abstract: Recently proposed two-pass direct speech-to-speech translation (S2ST) models decompose the task into speech-to-text translation (S2TT) and text-to-speech (TTS) within an end-to-end model, yielding promising results. However, the training of these models still relies on parallel speech data, which is extremely challenging to collect. In contrast, S2TT and TTS have accumulated a large amount of data… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024 main conference. Project Page: https://ictnlp.github.io/ComSpeech-Site/

    ACM Class: I.2.7

  19. arXiv:2406.06937  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation

    Authors: Zhengrui Ma, Qingkai Fang, Shaolei Zhang, Shoutao Guo, Yang Feng, Min Zhang

    Abstract: Simultaneous translation models play a crucial role in facilitating communication. However, existing research primarily focuses on text-to-text or speech-to-text models, necessitating additional cascade components to achieve speech-to-speech translation. These pipeline methods suffer from error propagation and accumulate delays in each cascade component, resulting in reduced synchronization betwee… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024; Codes and demos are at https://github.com/ictnlp/NAST-S2x

  20. arXiv:2406.03049  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

    Authors: Shaolei Zhang, Qingkai Fang, Shoutao Guo, Zhengrui Ma, Min Zhang, Yang Feng

    Abstract: Simultaneous speech-to-speech translation (Simul-S2ST, a.k.a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication. Beyond accomplishing translation between speech, Simul-S2ST requires a policy to control the model to generate corresponding target speech at the opportune moment within speech inputs, thereby posing… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 main conference, Project Page: https://ictnlp.github.io/StreamSpeech-site/

  21. arXiv:2405.18791  [pdf, other

    eess.SY math.DS

    A new platooning model for connected and autonomous vehicles to improve string stability

    Authors: Shouwei Hui, Michael Zhang

    Abstract: This paper introduces a novel idea of coordinated vehicle platooning such that platoon followers inside the platoon communicates only to the platoon leader. A novel dynamic model is proposed to take driving safety into account when there is communication delay. Some general results of linear stability are proved mathematically, and numerical simulations are conducted to show the effect of model pa… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: preprint submitted to Physica A

  22. arXiv:2405.17441  [pdf, other

    cs.NI cs.AI cs.CL eess.SY

    When Large Language Models Meet Optical Networks: Paving the Way for Automation

    Authors: Danshi Wang, Yidi Wang, Xiaotian Jiang, Yao Zhang, Yue Pang, Min Zhang

    Abstract: Since the advent of GPT, large language models (LLMs) have brought about revolutionary advancements in all walks of life. As a superior natural language processing (NLP) technology, LLMs have consistently achieved state-of-the-art performance on numerous areas. However, LLMs are considered to be general-purpose models for NLP tasks, which may encounter challenges when applied to complex tasks in s… ▽ More

    Submitted 24 June, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  23. arXiv:2405.04253  [pdf

    eess.SP

    Fermat Number Transform Based Chromatic Dispersion Compensation and Adaptive Equalization Algorithm

    Authors: Siyu Chen, Zheli Liu, Weihao Li, Zihe Hu, Mingming Zhang, Sheng Cui, Ming Tang

    Abstract: By introducing the Fermat number transform into chromatic dispersion compensation and adaptive equalization, the computational complexity has been reduced by 68% compared with the con?ventional implementation. Experimental results validate its transmission performance with only 0.8 dB receiver sensitivity penalty in a 75 km-40 GBaud-PDM-16QAM system.

    Submitted 7 May, 2024; originally announced May 2024.

  24. arXiv:2405.00734  [pdf, other

    eess.SP cs.AI cs.LG

    EEG-MACS: Manifold Attention and Confidence Stratification for EEG-based Cross-Center Brain Disease Diagnosis under Unreliable Annotations

    Authors: Zhenxi Song, Ruihan Qin, Huixia Ren, Zhen Liang, Yi Guo, Min Zhang, Zhiguo Zhang

    Abstract: Cross-center data heterogeneity and annotation unreliability significantly challenge the intelligent diagnosis of diseases using brain signals. A notable example is the EEG-based diagnosis of neurodegenerative diseases, which features subtler abnormal neural dynamics typically observed in small-group settings. To advance this area, in this work, we introduce a transferable framework employing Mani… ▽ More

    Submitted 13 August, 2024; v1 submitted 29 April, 2024; originally announced May 2024.

  25. arXiv:2404.18096  [pdf, other

    eess.IV cs.CV

    Snake with Shifted Window: Learning to Adapt Vessel Pattern for OCTA Segmentation

    Authors: Xinrun Chen, Mei Shen, Haojian Ning, Mengzhan Zhang, Chengliang Wang, Shiying Li

    Abstract: Segmenting specific targets or structures in optical coherence tomography angiography (OCTA) images is fundamental for conducting further pathological studies. The retinal vascular layers are rich and intricate, and such vascular with complex shapes can be captured by the widely-studied OCTA images. In this paper, we thus study how to use OCTA images with projection vascular layers to segment reti… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  26. arXiv:2404.17280  [pdf, other

    cs.SD eess.AS

    Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks

    Authors: Mingrui He, Longting Xu, Han Wang, Mingjun Zhang, Rohan Kumar Das

    Abstract: The most common spoofing attacks on automatic speaker verification systems are replay speech attacks. Detection of replay speech heavily relies on replay configuration information. Previous studies have shown that graph Fourier transform-derived features can effectively detect replay speech but ignore device and environmental noise effects. In this work, we propose a new feature, the graph frequen… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  27. Cepstral Analysis Based Artifact Detection, Recognition and Removal for Prefrontal EEG

    Authors: Siqi Han, Chao Zhang, Jiaxin Lei, Qingquan Han, Yuhui Du, Anhe Wang, Shuo Bai, Milin Zhang

    Abstract: This paper proposes to use cepstrum for artifact detection, recognition and removal in prefrontal EEG. This work focuses on the artifact caused by eye movement. A database containing artifact-free EEG and eye movement contaminated EEG from different subjects is established. A cepstral analysis-based feature extraction with support vector machine (SVM) based classifier is designed to identify the a… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 5 pages, 4 figures, published by TCAS-II

    Journal ref: IEEE Transactions on Circuits and Systems II: Express Briefs, 2023

  28. arXiv:2404.04904  [pdf, other

    cs.SD cs.AI eess.AS

    Cross-Domain Audio Deepfake Detection: Dataset and Analysis

    Authors: Yuang Li, Min Zhang, Mengxin Ren, Miaomiao Ma, Daimeng Wei, Hao Yang

    Abstract: Audio deepfake detection (ADD) is essential for preventing the misuse of synthetic voices that may infringe on personal rights and privacy. Recent zero-shot text-to-speech (TTS) models pose higher risks as they can clone voices with a single utterance. However, the existing ADD datasets are outdated, leading to suboptimal generalization of detection models. In this paper, we construct a new cross-… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  29. arXiv:2404.00132  [pdf, other

    eess.IV cs.CV

    FetalDiffusion: Pose-Controllable 3D Fetal MRI Synthesis with Conditional Diffusion Model

    Authors: Molin Zhang, Polina Golland, Patricia Ellen Grant, Elfar Adalsteinsson

    Abstract: The quality of fetal MRI is significantly affected by unpredictable and substantial fetal motion, leading to the introduction of artifacts even when fast acquisition sequences are employed. The development of 3D real-time fetal pose estimation approaches on volumetric EPI fetal MRI opens up a promising avenue for fetal motion monitoring and prediction. Challenges arise in fetal pose estimation due… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: 8 pages, 3 figures, 2 tables, submitted to MICCAI 2024, code available if accepted

  30. arXiv:2403.16170  [pdf, other

    eess.SY

    Voltage Regulation in Polymer Electrolyte Fuel Cell Systems Using Gaussian Process Model Predictive Control

    Authors: Xiufei Li, Miao Zhang, Yuanxin Qi, Miao Yang

    Abstract: This study introduces a novel approach utilizing Gaussian process model predictive control (MPC) to stabilize the output voltage of a polymer electrolyte fuel cell (PEFC) system by simultaneously regulating hydrogen and airflow rates. Two Gaussian process models are developed to capture PEFC dynamics, taking into account constraints including hydrogen pressure and input change rates, thereby aidin… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  31. arXiv:2403.13615  [pdf, other

    cs.IT eess.SP

    MIMO Channel as a Neural Function: Implicit Neural Representations for Extreme CSI Compression in Massive MIMO Systems

    Authors: Haotian Wu, Maojun Zhang, Yulin Shao, Krystian Mikolajczyk, Deniz Gündüz

    Abstract: Acquiring and utilizing accurate channel state information (CSI) can significantly improve transmission performance, thereby holding a crucial role in realizing the potential advantages of massive multiple-input multiple-output (MIMO) technology. Current prevailing CSI feedback approaches improve precision by employing advanced deep-learning methods to learn representative CSI features for a subse… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    MSC Class: 94A24 ACM Class: E.4

  32. arXiv:2403.11693  [pdf, other

    cs.IT eess.SP

    Beamforming Design for Semantic-Bit Coexisting Communication System

    Authors: Maojun Zhang, Guangxu Zhu, Richeng Jin, Xiaoming Chen, Qingjiang Shi, Caijun Zhong, Kaibin Huang

    Abstract: Semantic communication (SemCom) is emerging as a key technology for future sixth-generation (6G) systems. Unlike traditional bit-level communication (BitCom), SemCom directly optimizes performance at the semantic level, leading to superior communication efficiency. Nevertheless, the task-oriented nature of SemCom renders it challenging to completely replace BitCom. Consequently, it is desired to c… ▽ More

    Submitted 22 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE for possible publication

  33. arXiv:2403.09157  [pdf, ps, other

    eess.IV cs.CV

    VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation

    Authors: Mingya Zhang, Yue Yu, Limei Gu, Tingsheng Lin, Xianping Tao

    Abstract: In the field of medical image segmentation, models based on both CNN and Transformer have been thoroughly investigated. However, CNNs have limited modeling capabilities for long-range dependencies, making it challenging to exploit the semantic information within images fully. On the other hand, the quadratic computational complexity poses a challenge for Transformers. Recently, State Space Models… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 12 pages, 4 figures

  34. arXiv:2403.04945  [pdf, other

    cs.CL cs.LG eess.SP

    MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation

    Authors: Zhongwei Wan, Che Liu, Xin Wang, Chaofan Tao, Hui Shen, Zhenwu Peng, Jie Fu, Rossella Arcucci, Huaxiu Yao, Mi Zhang

    Abstract: Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions and is crucial in assisting clinicians. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation, which is time-consuming and requires clinical expertise. To automate ECG report generation and ensure its versatility, we propose the… ▽ More

    Submitted 18 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Under review

  35. arXiv:2403.04374  [pdf, other

    eess.SY cs.AI

    Model-Free Load Frequency Control of Nonlinear Power Systems Based on Deep Reinforcement Learning

    Authors: Xiaodi Chen, Meng Zhang, Zhengguang Wu, Ligang Wu, Xiaohong Guan

    Abstract: Load frequency control (LFC) is widely employed in power systems to stabilize frequency fluctuation and guarantee power quality. However, most existing LFC methods rely on accurate power system modeling and usually ignore the nonlinear characteristics of the system, limiting controllers' performance. To solve these problems, this paper proposes a model-free LFC method for nonlinear power systems b… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  36. arXiv:2402.16907  [pdf, other

    eess.IV cs.CV cs.LG

    Diffusion Posterior Proximal Sampling for Image Restoration

    Authors: Hongjie Wu, Linchao He, Mingqin Zhang, Dongdong Chen, Kunming Luo, Mengting Luo, Ji-Zhe Zhou, Hu Chen, Jiancheng Lv

    Abstract: Diffusion models have demonstrated remarkable efficacy in generating high-quality samples. Existing diffusion-based image restoration algorithms exploit pre-trained diffusion models to leverage data priors, yet they still preserve elements inherited from the unconditional generation paradigm. These strategies initiate the denoising process with pure white noise and incorporate random noise at each… ▽ More

    Submitted 6 August, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Comments: ACM Multimedia 2024 Oral

  37. arXiv:2402.14401  [pdf, other

    cs.CV cs.LG eess.IV

    Diffusion Model Based Visual Compensation Guidance and Visual Difference Analysis for No-Reference Image Quality Assessment

    Authors: Zhaoyang Wang, Bo Hu, Mingyang Zhang, Jie Li, Leida Li, Maoguo Gong, Xinbo Gao

    Abstract: Existing free-energy guided No-Reference Image Quality Assessment (NR-IQA) methods still suffer from finding a balance between learning feature information at the pixel level of the image and capturing high-level feature information and the efficient utilization of the obtained high-level feature information remains a challenge. As a novel class of state-of-the-art (SOTA) generative model, the dif… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  38. arXiv:2402.03465  [pdf, other

    cs.NI eess.SP

    Stitching the Spectrum: Semantic Spectrum Segmentation with Wideband Signal Stitching

    Authors: Daniel Uvaydov, Milin Zhang, Clifton Paul Robinson, Salvatore D'Oro, Tommaso Melodia, Francesco Restuccia

    Abstract: Spectrum has become an extremely scarce and congested resource. As a consequence, spectrum sensing enables the coexistence of different wireless technologies in shared spectrum bands. Most existing work requires spectrograms to classify signals. Ultimately, this implies that images need to be continuously created from I/Q samples, thus creating unacceptable latency for real-time operations. In add… ▽ More

    Submitted 7 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  39. arXiv:2402.03414  [pdf, other

    eess.IV cs.CV cs.LG

    An end-to-end deep learning pipeline to derive blood input with partial volume corrections for automated parametric brain PET mapping

    Authors: Rugved Chavan, Gabriel Hyman, Zoraiz Qureshi, Nivetha Jayakumar, William Terrell, Stuart Berr, David Schiff, Megan Wardius, Nathan Fountain, Thomas Muttikkal, Mark Quigg, Miaomiao Zhang, Bijoy Kundu

    Abstract: Dynamic 2-[18F] fluoro-2-deoxy-D-glucose positron emission tomography (dFDG-PET) for human brain imaging has considerable clinical potential, yet its utilization remains limited. A key challenge in the quantitative analysis of dFDG-PET is characterizing a patient-specific blood input function, traditionally reliant on invasive arterial blood sampling. This research introduces a novel approach empl… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  40. arXiv:2402.02950  [pdf, other

    cs.CR eess.SP

    Semantic Entropy Can Simultaneously Benefit Transmission Efficiency and Channel Security of Wireless Semantic Communications

    Authors: Yankai Rong, Guoshun Nan, Minwei Zhang, Sihan Chen, Songtao Wang, Xuefei Zhang, Nan Ma, Shixun Gong, Zhaohui Yang, Qimei Cui, Xiaofeng Tao, Tony Q. S. Quek

    Abstract: Recently proliferated deep learning-based semantic communications (DLSC) focus on how transmitted symbols efficiently convey a desired meaning to the destination. However, the sensitivity of neural models and the openness of wireless channels cause the DLSC system to be extremely fragile to various malicious attacks. This inspires us to ask a question: "Can we further exploit the advantages of tra… ▽ More

    Submitted 6 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: 13 pages, 12 figures

  41. arXiv:2401.15636  [pdf, other

    cs.CV eess.IV

    FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models

    Authors: Feihong He, Gang Li, Mengyuan Zhang, Leilei Yan, Lingyu Si, Fanzhang Li, Li Shen

    Abstract: The rapid development of generative diffusion models has significantly advanced the field of style transfer. However, most current style transfer methods based on diffusion models typically involve a slow iterative optimization process, e.g., model fine-tuning and textual inversion of style concept. In this paper, we introduce FreeStyle, an innovative style transfer method built upon a pre-trained… ▽ More

    Submitted 18 July, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  42. LightSleepNet: Design of a Personalized Portable Sleep Staging System Based on Single-Channel EEG

    Authors: Yiqiao Liao, Chao Zhang, Milin Zhang, Zhihua Wang, Xiang Xie

    Abstract: This paper proposed LightSleepNet - a light-weight, 1-d Convolutional Neural Network (CNN) based personalized architecture for real-time sleep staging, which can be implemented on various mobile platforms with limited hardware resources. The proposed architecture only requires an input of 30s single-channel EEG signal for the classification. Two residual blocks consisting of group 1-d convolution… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 5 pages, 3 figures, published by IEEE TCAS-II

    Journal ref: IEEE Transactions on Circuits and Systems II: Express Briefs, 2021, 69(1): 224-228

  43. arXiv:2401.12264  [pdf, other

    eess.AS cs.MM cs.SD eess.IV

    CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

    Authors: Xianghu Yue, Xiaohai Tian, Lu Lu, Malu Zhang, Zhizheng Wu, Haizhou Li

    Abstract: There has been a long-standing quest for a unified audio-visual-text model to enable various multimodal understanding tasks, which mimics the listening, seeing and reading process of human beings. Humans tends to represent knowledge using two separate systems: one for representing verbal (textual) information and one for representing non-verbal (visual and auditory) information. These two systems… ▽ More

    Submitted 21 February, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

  44. arXiv:2401.06780  [pdf, other

    eess.IV cs.AI cs.CV

    HA-HI: Synergising fMRI and DTI through Hierarchical Alignments and Hierarchical Interactions for Mild Cognitive Impairment Diagnosis

    Authors: Xiongri Shen, Zhenxi Song, Linling Li, Min Zhang, Lingyan Liang Honghai Liu, Demao Deng, Zhiguo Zhang

    Abstract: Early diagnosis of mild cognitive impairment (MCI) and subjective cognitive decline (SCD) utilizing multi-modal magnetic resonance imaging (MRI) is a pivotal area of research. While various regional and connectivity features from functional MRI (fMRI) and diffusion tensor imaging (DTI) have been employed to develop diagnosis models, most studies integrate these features without adequately addressi… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  45. UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction

    Authors: Jiaxin Guo, Minghan Wang, Xiaosong Qiao, Daimeng Wei, Hengchao Shang, Zongyao Li, Zhengzhe Yu, Yinglu Li, Chang Su, Min Zhang, Shimin Tao, Hao Yang

    Abstract: Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER). Previous works usually adopt end-to-end models and has strong dependency on Pseudo Paired Data and Original Paired Data. But when only pre-training on Pseudo Paired Data, previous models have negative effect on correction. While fine-tu… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted in ICASSP 2023

  46. arXiv:2401.03538  [pdf, other

    cs.CL cs.SD eess.AS

    Transfer the linguistic representations from TTS to accent conversion with non-parallel data

    Authors: Xi Chen, Jiakun Pei, Liumeng Xue, Mingyang Zhang

    Abstract: Accent conversion aims to convert the accent of a source speech to a target accent, meanwhile preserving the speaker's identity. This paper introduces a novel non-autoregressive framework for accent conversion that learns accent-agnostic linguistic representations and employs them to convert the accent in the source speech. Specifically, the proposed system aligns speech representations with lingu… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

  47. Multi-Channel Multi-Domain based Knowledge Distillation Algorithm for Sleep Staging with Single-Channel EEG

    Authors: Chao Zhang, Yiqiao Liao, Siqi Han, Milin Zhang, Zhihua Wang, Xiang Xie

    Abstract: This paper proposed a Multi-Channel Multi-Domain (MCMD) based knowledge distillation algorithm for sleep staging using single-channel EEG. Both knowledge from different domains and different channels are learnt in the proposed algorithm, simultaneously. A multi-channel pre-training and single-channel fine-tuning scheme is used in the proposed work. The knowledge from different channels in the sour… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 5 pages, 2 figures, published by IEEE TCAS-II

    Journal ref: IEEE Transactions on Circuits and Systems II: Express Briefs, 2022, 69(11): 4608-4612

  48. arXiv:2401.03396  [pdf

    eess.SP

    A Closed-loop Brain-Machine Interface SoC Featuring a 0.2$μ$J/class Multiplexer Based Neural Network

    Authors: Chao Zhang, Yongxiang Guo, Dawid Sheng, Zhixiong Ma, Chao Sun, Yuwei Zhang, Wenxin Zhao, Fenyan Zhang, Tongfei Wang, Xing Sheng, Milin Zhang

    Abstract: This work presents the first fabricated electrophysiology-optogenetic closed-loop bidirectional brain-machine interface (CL-BBMI) system-on-chip (SoC) with electrical neural signal recording, on-chip sleep staging and optogenetic stimulation. The first multiplexer with static assignment based table lookup solution (MUXnet) for multiplier-free NN processor was proposed. A state-of-the-art average a… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 2 pages, 6 figures. Accepted by IEEE Custom Integrated Circuits Conference (CICC) 2024. The codes for the MUXnet (constructing neural networks using multiplexers instead of multipliers) will be open-sourced after the Journal version of this work is accepted

  49. arXiv:2312.13752  [pdf

    eess.IV cs.AI cs.CV

    Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge

    Authors: Yang Nan, Xiaodan Xing, Shiyi Wang, Zeyu Tang, Federico N Felder, Sheng Zhang, Roberta Eufrasia Ledda, Xiaoliu Ding, Ruiqi Yu, Weiping Liu, Feng Shi, Tianyang Sun, Zehong Cao, Minghui Zhang, Yun Gu, Hanxiao Zhang, Jian Gao, Pingyu Wang, Wen Tang, Pengxin Yu, Han Kang, Junqiang Chen, Xing Lu, Boyu Zhang, Michail Mamalakis , et al. (16 additional authors not shown)

    Abstract: Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intric… ▽ More

    Submitted 16 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 19 pages

  50. arXiv:2312.13319  [pdf, other

    eess.IV cs.CV

    In2SET: Intra-Inter Similarity Exploiting Transformer for Dual-Camera Compressive Hyperspectral Imaging

    Authors: Xin Wang, Lizhi Wang, Xiangtian Ma, Maoqing Zhang, Lin Zhu, Hua Huang

    Abstract: Dual-Camera Compressed Hyperspectral Imaging (DCCHI) offers the capability to reconstruct 3D Hyperspectral Image (HSI) by fusing compressive and Panchromatic (PAN) image, which has shown great potential for snapshot hyperspectral imaging in practice. In this paper, we introduce a novel DCCHI reconstruction network, the Intra-Inter Similarity Exploiting Transformer (In2SET). Our key insight is to m… ▽ More

    Submitted 8 June, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: CVPR 2024