Zum Hauptinhalt springen

Showing 1–50 of 1,660 results for author: Wang, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.16564  [pdf, other

    cs.MM cs.SD eess.AS

    Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing

    Authors: Qianhui Liu, Jiadong Wang, Yang Wang, Xin Yang, Gang Pan, Haizhou Li

    Abstract: Humans naturally perform audiovisual speech recognition (AVSR), enhancing the accuracy and robustness by integrating auditory and visual information. Spiking neural networks (SNNs), which mimic the brain's information-processing mechanisms, are well-suited for emulating the human capability of AVSR. Despite their potential, research on SNNs for AVSR is scarce, with most existing audio-visual multi… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  2. arXiv:2408.16197  [pdf, other

    eess.SY

    Economic Optimal Power Management of Second-Life Battery Energy Storage Systems

    Authors: Amir Farakhor, Di Wu, Pingen Chen, Junmin Wang, Yebin Wang, Huazhen Fang

    Abstract: Second-life battery energy storage systems (SL-BESS) are an economical means of long-duration grid energy storage. They utilize retired battery packs from electric vehicles to store and provide electrical energy at the utility scale. However, they pose critical challenges in achieving optimal utilization and extending their remaining useful life. These complications primarily result from the const… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  3. arXiv:2408.15176  [pdf, other

    cs.SD cs.CL eess.AS

    Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement

    Authors: Longshen Ou, Jingwei Zhao, Ziyu Wang, Gus Xia, Ye Wang

    Abstract: Large language models have shown significant capabilities across various domains, including symbolic music generation. However, leveraging these pre-trained models for controllable music arrangement tasks, each requiring different forms of musical information as control, remains a novel challenge. In this paper, we propose a unified sequence-to-sequence framework that enables the fine-tuning of a… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Submitted to AAAI 2025

  4. arXiv:2408.14472  [pdf, other

    cs.RO cs.AI eess.SY

    Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning

    Authors: Xinyang Gu, Yen-Jen Wang, Xiang Zhu, Chengming Shi, Yanjiang Guo, Yichen Liu, Jianyu Chen

    Abstract: Humanoid robots, with their human-like skeletal structure, are especially suited for tasks in human-centric environments. However, this structure is accompanied by additional challenges in locomotion controller design, especially in complex real-world environments. As a result, existing humanoid robots are limited to relatively simple terrains, either with model-based control or model-free reinfor… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Robotics: Science and Systems (RSS), 2024. (Best Paper Award Finalist)

  5. arXiv:2408.13975  [pdf

    physics.med-ph eess.IV

    Cross-sectional imaging of speed-of-sound distribution using photoacoustic reversal beacons

    Authors: Yang Wang, Danni Wang, Liting Zhong, Yi Zhou, Qing Wang, Wufan Chen, Li Qi

    Abstract: Photoacoustic tomography (PAT) enables non-invasive cross-sectional imaging of biological tissues, but it fails to map the spatial variation of speed-of-sound (SOS) within tissues. While SOS is intimately linked to density and elastic modulus of tissues, the imaging of SOS distri-bution serves as a complementary imaging modality to PAT. Moreover, an accurate SOS map can be leveraged to correct for… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  6. arXiv:2408.13893  [pdf, other

    cs.SD cs.CL eess.AS

    SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models

    Authors: Dongchao Yang, Rongjie Huang, Yuanyuan Wang, Haohan Guo, Dading Chong, Songxiang Liu, Xixin Wu, Helen Meng

    Abstract: Scaling Text-to-speech (TTS) to large-scale datasets has been demonstrated as an effective method for improving the diversity and naturalness of synthesized speech. At the high level, previous large-scale TTS models can be categorized into either Auto-regressive (AR) based (\textit{e.g.}, VALL-E) or Non-auto-regressive (NAR) based models (\textit{e.g.}, NaturalSpeech 2/3). Although these works dem… ▽ More

    Submitted 28 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: Submit to TASLP

  7. arXiv:2408.13470  [pdf, other

    eess.SP

    Performance Analysis of Photon-Limited Free-Space Optical Communications with Practical Photon-Counting Receivers

    Authors: Chen Wang, Zhiyong Xu, Jingyuan Wang, Jianhua Li, Weifeng Mou, Huatao Zhu, Jiyong Zhao, Yang Su, Yimin Wang, Ailin Qi

    Abstract: The non-perfect factors of practical photon-counting receiver are recognized as a significant challenge for long-distance photon-limited free-space optical (FSO) communication systems. This paper presents a comprehensive analytical framework for modeling the statistical properties of time-gated single-photon avalanche diode (TG-SPAD) based photon-counting receivers in presence of dead time, non-ph… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  8. arXiv:2408.13290  [pdf, ps, other

    eess.IV cs.CV

    Multi-modal Intermediate Feature Interaction AutoEncoder for Overall Survival Prediction of Esophageal Squamous Cell Cancer

    Authors: Chengyu Wu, Yatao Zhang, Yaqi Wang, Qifeng Wang, Shuai Wang

    Abstract: Survival prediction for esophageal squamous cell cancer (ESCC) is crucial for doctors to assess a patient's condition and tailor treatment plans. The application and development of multi-modal deep learning in this field have attracted attention in recent years. However, the prognostically relevant features between cross-modalities have not been further explored in previous studies, which could hi… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by ISBI 2024

  9. arXiv:2408.13040  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks

    Authors: Kai-Wei Chang, Haibin Wu, Yu-Kai Wang, Yuan-Kuei Wu, Hua Shen, Wei-Cheng Tseng, Iu-thing Kang, Shang-Wen Li, Hung-yi Lee

    Abstract: Prompting has become a practical method for utilizing pre-trained language models (LMs). This approach offers several advantages. It allows an LM to adapt to new tasks with minimal training and parameter updates, thus achieving efficiency in both storage and computation. Additionally, prompting modifies only the LM's inputs and harnesses the generative capabilities of language models to address va… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

    Journal ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 3730-3744, 2024

  10. arXiv:2408.11787  [pdf, other

    eess.IV cs.CV

    NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation

    Authors: Zhenye Lou, Qing Xu, Zekun Jiang, Xiangjian He, Zhen Chen, Yi Wang, Chenxin Li, Maggie M. He, Wenting Duan

    Abstract: Domain-generalized nuclei segmentation refers to the generalizability of models to unseen domains based on knowledge learned from source domains and is challenged by various image conditions, cell types, and stain strategies. Recently, the Segment Anything Model (SAM) has made great success in universal image segmentation by interactive prompt modes (e.g., point and box). Despite its strengths, th… ▽ More

    Submitted 24 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: Under Reivew

  11. arXiv:2408.10800  [pdf, other

    eess.SP

    A Novel Signal Detection Method for Photon-Counting Communications with Nonlinear Distortion Effects

    Authors: Chen Wang, Zhiyong Xu, Jingyuan Wang, Jianhua Li, Weifeng Mou, Huatao Zhu, Jiyong Zhao, Yang Su, Yimin Wang, Ailin Qi

    Abstract: This paper proposes a method for estimating and detecting optical signals in practical photon-counting receivers. There are two important aspects of non-perfect photon-counting receivers, namely, (i) dead time which results in blocking loss, and (ii) non-photon-number-resolving, which leads to counting loss during the gate-ON interval. These factors introduce nonlinear distortion to the detected p… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  12. arXiv:2408.09554  [pdf, other

    q-bio.QM cs.CV eess.IV

    Screen Them All: High-Throughput Pan-Cancer Genetic and Phenotypic Biomarker Screening from H&E Whole Slide Images

    Authors: Yi Kan Wang, Ludmila Tydlitatova, Jeremy D. Kunz, Gerard Oakley, Ran A. Godrich, Matthew C. H. Lee, Chad Vanderbilt, Razik Yousfi, Thomas Fuchs, David S. Klimstra, Siqi Liu

    Abstract: Many molecular alterations serve as clinically prognostic or therapy-predictive biomarkers, typically detected using single or multi-gene molecular assays. However, these assays are expensive, tissue destructive and often take weeks to complete. Using AI on routine H&E WSIs offers a fast and economical approach to screen for multiple molecular biomarkers. We present a high-throughput AI-based syst… ▽ More

    Submitted 20 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

  13. arXiv:2408.09534  [pdf, ps, other

    eess.SY

    Safe Adaptive Control for Uncertain Systems with Complex Input Constraints

    Authors: Yaosheng Deng, Yang Bai, Yujie Wang, Masaki Ogura, Mir Feroskhan

    Abstract: In this paper, we propose a novel adaptive Control Barrier Function (CBF) based controller for nonlinear systems with complex, time-varying input constraints. Conventional CBF approaches often struggle with feasibility issues and stringent assumptions when addressing input constraints. Unlike these methods, our approach converts the input-constraint problem into an output-constraint CBF design. Th… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 8 pages, 2 figures

  14. arXiv:2408.09278  [pdf, other

    eess.IV cs.CV

    Cross-Species Data Integration for Enhanced Layer Segmentation in Kidney Pathology

    Authors: Junchao Zhu, Mengmeng Yin, Ruining Deng, Yitian Long, Yu Wang, Yaohong Wang, Shilin Zhao, Haichun Yang, Yuankai Huo

    Abstract: Accurate delineation of the boundaries between the renal cortex and medulla is crucial for subsequent functional structural analysis and disease diagnosis. Training high-quality deep-learning models for layer segmentation relies on the availability of large amounts of annotated data. However, due to the patient's privacy of medical data and scarce clinical cases, constructing pathological datasets… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  15. arXiv:2408.08669  [pdf, other

    cs.SD eess.AS

    HSDreport: Heart Sound Diagnosis with Echocardiography Reports

    Authors: Zihan Zhao, Pingjie Wang, Liudan Zhao, Yuchen Yang, Ya Zhang, Kun Sun, Xin Sun, Xin Zhou, Yu Wang, Yanfeng Wang

    Abstract: Heart sound auscultation holds significant importance in the diagnosis of congenital heart disease. However, existing methods for Heart Sound Diagnosis (HSD) tasks are predominantly limited to a few fixed categories, framing the HSD task as a rigid classification problem that does not fully align with medical practice and offers only limited information to physicians. Besides, such methods do not… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  16. arXiv:2408.07592  [pdf, other

    eess.SP

    Multi-periodicity dependency Transformer based on spectrum offset for radio frequency fingerprint identification

    Authors: Jing Xiao, Wenrui Ding, Zeqi Shao, Duona Zhang, Yanan Ma, Yufeng Wang, Jian Wang

    Abstract: Radio Frequency Fingerprint Identification (RFFI) has emerged as a pivotal task for reliable device authentication. Despite advancements in RFFI methods, background noise and intentional modulation features result in weak energy and subtle differences in the RFF features. These challenges diminish the capability of RFFI methods in feature representation, complicating the effective identification o… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  17. arXiv:2408.05934  [pdf, other

    physics.med-ph eess.SP

    A Mathematical Model for Skin Sympathetic Nerve Activity Simulation

    Authors: Runwei Lin, Frank Halfwerk, Dirk Donker, Gozewijn Dirk Laverman, Ying Wang

    Abstract: Autonomic nervous system is important for cardiac function regulation. Modeling of autonomic cardiac regulation can contribute to health tracking and disease management. This study proposed a mathematical model that simulates autonomic cardiac regulation response to Valsalva Maneuver, which is a commonly used test that provokes the autonomic nervous system. Dataset containing skin sympathetic nerv… ▽ More

    Submitted 26 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: Correction: units of aSKNA should be $μ$V

  18. arXiv:2408.05719  [pdf

    cs.RO eess.SP

    MR-ULINS: A Tightly-Coupled UWB-LiDAR-Inertial Estimator with Multi-Epoch Outlier Rejection

    Authors: Tisheng Zhang, Man Yuan, Linfu Wei, Yan Wang, Hailiang Tang, Xiaoji Niu

    Abstract: The LiDAR-inertial odometry (LIO) and the ultra-wideband (UWB) have been integrated together to achieve driftless positioning in global navigation satellite system (GNSS)-denied environments. However, the UWB may be affected by systematic range errors (such as the clock drift and the antenna phase center offset) and non-line-of-sight (NLOS) signals, resulting in reduced robustness. In this study,… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 8 pages, 9 figures

  19. arXiv:2408.05614  [pdf, other

    cs.AR cs.ET eess.SY

    ICGMM: CXL-enabled Memory Expansion with Intelligent Caching Using Gaussian Mixture Model

    Authors: Hanqiu Chen, Yitu Wang, Luis Vitorio Cargnini, Mohammadreza Soltaniyeh, Dongyang Li, Gongjin Sun, Pradeep Subedi, Andrew Chang, Yiran Chen, Cong Hao

    Abstract: Compute Express Link (CXL) emerges as a solution for wide gap between computational speed and data communication rates among host and multiple devices. It fosters a unified and coherent memory space between host and CXL storage devices such as such as Solid-state drive (SSD) for memory expansion, with a corresponding DRAM implemented as the device cache. However, this introduces challenges such as… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: This paper is accepted by DAC2024

  20. arXiv:2408.05438  [pdf, other

    cs.RO eess.SY

    Convergence Guarantee of Dynamic Programming for LTL Surrogate Reward

    Authors: Zetong Xuan, Yu Wang

    Abstract: Linear Temporal Logic (LTL) is a formal way of specifying complex objectives for planning problems modeled as Markov Decision Processes (MDPs). The planning problem aims to find the optimal policy that maximizes the satisfaction probability of the LTL objective. One way to solve the planning problem is to use the surrogate reward with two discount factors and dynamic programming, which bypasses th… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: Accepted for the 2024 Conference on Decision and Control (CDC)

  21. arXiv:2408.04188  [pdf, ps, other

    eess.SP eess.SY

    Trustworthy Semantic-Enabled 6G Communication: A Task-oriented and Privacy-preserving Perspective

    Authors: Shuaishuai Guo, Anbang Zhang, Yanhu Wang, Chenyuan Feng, Tony Q. S. Quek

    Abstract: Trustworthy task-oriented semantic communication (ToSC) emerges as an innovative approach in the 6G landscape, characterized by the transmission of only vital information that is directly pertinent to a specific task. While ToSC offers an efficient mode of communication, it concurrently raises concerns regarding privacy, as sophisticated adversaries might possess the capability to reconstruct the… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  22. arXiv:2408.03651  [pdf, other

    eess.IV cs.CV

    SAM2-PATH: A better segment anything model for semantic segmentation in digital pathology

    Authors: Mingya Zhang, Liang Wang, Limei Gu, Zhao Li, Yaohui Wang, Tingshen Ling, Xianping Tao

    Abstract: The semantic segmentation task in pathology plays an indispensable role in assisting physicians in determining the condition of tissue lesions. Foundation models, such as the SAM (Segment Anything Model) and SAM2, exhibit exceptional performance in instance segmentation within everyday natural scenes. SAM-PATH has also achieved impressive results in semantic segmentation within the field of pathol… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 6 pages , 3 figures

  23. arXiv:2408.03194  [pdf, other

    eess.IV cs.CV

    SGSR: Structure-Guided Multi-Contrast MRI Super-Resolution via Spatio-Frequency Co-Query Attention

    Authors: Shaoming Zheng, Yinsong Wang, Siyi Du, Chen Qin

    Abstract: Magnetic Resonance Imaging (MRI) is a leading diagnostic modality for a wide range of exams, where multiple contrast images are often acquired for characterizing different tissues. However, acquiring high-resolution MRI typically extends scan time, which can introduce motion artifacts. Super-resolution of MRI therefore emerges as a promising approach to mitigate these challenges. Earlier studies h… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: The 15th International Workshop on Machine Learning in Medical Imaging (MLMI 2024)

  24. arXiv:2408.02622  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Language Model Can Listen While Speaking

    Authors: Ziyang Ma, Yakun Song, Chenpeng Du, Jian Cong, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen

    Abstract: Dialogue serves as the most natural manner of human-computer interaction (HCI). Recent advancements in speech language models (SLM) have significantly enhanced speech-based conversational AI. However, these models are limited to turn-based conversation, lacking the ability to interact with humans in real-time spoken scenarios, for example, being interrupted when the generated content is not satisf… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Demo can be found at https://ddlbojack.github.io/LSLM

  25. arXiv:2408.02178  [pdf, other

    eess.AS cs.SD

    StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion

    Authors: Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Lei Xie, Yuping Wang

    Abstract: StreamVoice has recently pushed the boundaries of zero-shot voice conversion (VC) in the streaming domain. It uses a streamable language model (LM) with a context-aware approach to convert semantic features from automatic speech recognition (ASR) into acoustic features with the desired speaker timbre. Despite its innovations, StreamVoice faces challenges due to its dependency on a streaming ASR wi… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  26. arXiv:2408.01808  [pdf, other

    cs.CR cs.AI cs.SD eess.AS

    ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features

    Authors: Peng Cheng, Yuwei Wang, Peng Huang, Zhongjie Ba, Xiaodong Lin, Feng Lin, Li Lu, Kui Ren

    Abstract: Extensive research has revealed that adversarial examples (AE) pose a significant threat to voice-controllable smart devices. Recent studies have proposed black-box adversarial attacks that require only the final transcription from an automatic speech recognition (ASR) system. However, these attacks typically involve many queries to the ASR, resulting in substantial costs. Moreover, AE-based adver… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: Published in the 2024 IEEE Symposium on Security and Privacy (SP)

  27. arXiv:2408.00470  [pdf, other

    cs.CV cs.AI eess.IV

    Image Super-Resolution with Taylor Expansion Approximation and Large Field Reception

    Authors: Jiancong Feng, Yuan-Gen Wang, Mingjie Li, Fengchuang Xing

    Abstract: Self-similarity techniques are booming in blind super-resolution (SR) due to accurate estimation of the degradation types involved in low-resolution images. However, high-dimensional matrix multiplication within self-similarity computation prohibitively consumes massive computational costs. We find that the high-dimensional attention map is derived from the matrix multiplication between Query and… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  28. arXiv:2408.00284  [pdf, other

    cs.CL cs.SD eess.AS

    Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation

    Authors: Xinhan Di, Zihao Chen, Yunming Liang, Junjie Zheng, Yihua Wang, Chaofan Ding

    Abstract: Large-scale text-to-speech (TTS) models have made significant progress recently.However, they still fall short in the generation of Chinese dialectal speech. Toaddress this, we propose Bailing-TTS, a family of large-scale TTS models capable of generating high-quality Chinese dialectal speech. Bailing-TTS serves as a foundation model for Chinese dialectal speech generation. First, continual semi-su… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 8 pages, 2 figures

  29. arXiv:2407.21600  [pdf, other

    eess.IV cs.AI cs.CV eess.SP physics.med-ph

    Robust Simultaneous Multislice MRI Reconstruction Using Deep Generative Priors

    Authors: Shoujin Huang, Guanxiong Luo, Yuwan Wang, Kexin Yang, Lingyan Zhang, Jingzhe Liu, Hua Guo, Min Wang, Mengye Lyu

    Abstract: Simultaneous multislice (SMS) imaging is a powerful technique for accelerating magnetic resonance imaging (MRI) acquisitions. However, SMS reconstruction remains challenging due to the complex signal interactions between and within the excited slices. This study presents a robust SMS MRI reconstruction method using deep generative priors. Starting from Gaussian noise, we leverage denoising diffusi… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  30. arXiv:2407.20962  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

    Authors: Xiaowei Chi, Yatian Wang, Aosong Cheng, Pengjun Fang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, Mengfei Li, Ruibin Yuan, Yanbing Jiang, Wei Xue, Wenhan Luo, Qifeng Chen, Shanghang Zhang, Qifeng Liu, Yike Guo

    Abstract: Massive multi-modality datasets play a significant role in facilitating the success of large video-language models. However, current video-language datasets primarily provide text descriptions for visual frames, considering audio to be weakly related information. They usually overlook exploring the potential of inherent audio-visual correlation, leading to monotonous annotation within each modalit… ▽ More

    Submitted 6 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: 15 Pages. Dataset report

  31. arXiv:2407.20893  [pdf, other

    cs.LG cs.AI eess.SP

    MambaCapsule: Towards Transparent Cardiac Disease Diagnosis with Electrocardiography Using Mamba Capsule Network

    Authors: Yinlong Xu, Xiaoqiang Liu, Zitai Kong, Yixuan Wu, Yue Wang, Yingzhou Lu, Honghao Gao, Jian Wu, Hongxia Xu

    Abstract: Cardiac arrhythmia, a condition characterized by irregular heartbeats, often serves as an early indication of various heart ailments. With the advent of deep learning, numerous innovative models have been introduced for diagnosing arrhythmias using Electrocardiogram (ECG) signals. However, recent studies solely focus on the performance of models, neglecting the interpretation of their results. Thi… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  32. arXiv:2407.20878  [pdf

    eess.IV cs.CV

    S3PET: Semi-supervised Standard-dose PET Image Reconstruction via Dose-aware Token Swap

    Authors: Jiaqi Cui, Pinxian Zeng, Yuanyuan Xu, Xi Wu, Jiliu Zhou, Yan Wang

    Abstract: To acquire high-quality positron emission tomography (PET) images while reducing the radiation tracer dose, numerous efforts have been devoted to reconstructing standard-dose PET (SPET) images from low-dose PET (LPET). However, the success of current fully-supervised approaches relies on abundant paired LPET and SPET images, which are often unavailable in clinic. Moreover, these methods often mix… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  33. arXiv:2407.20622  [pdf, other

    cs.CL cs.SD eess.AS

    Decoding Linguistic Representations of Human Brain

    Authors: Yu Wang, Heyang Liu, Yuhao Wang, Chuan Xuan, Yixuan Hou, Sheng Feng, Hongcheng Liu, Yusheng Liao, Yanfeng Wang

    Abstract: Language, as an information medium created by advanced organisms, has always been a concern of neuroscience regarding how it is represented in the brain. Decoding linguistic representations in the evoked brain has shown groundbreaking achievements, thanks to the rapid improvement of neuroimaging, medical technology, life sciences and artificial intelligence. In this work, we present a taxonomy of… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  34. arXiv:2407.20111  [pdf, other

    cs.SD eess.AS eess.SP

    Enhancing Anti-spoofing Countermeasures Robustness through Joint Optimization and Transfer Learning

    Authors: Yikang Wang, Xingming Wang, Hiromitsu Nishizaki, Ming Li

    Abstract: Current research in synthesized speech detection primarily focuses on the generalization of detection systems to unknown spoofing methods of noise-free speech. However, the performance of anti-spoofing countermeasures (CM) system is often don't work as well in more challenging scenarios, such as those involving noise and reverberation. To address the problem of enhancing the robustness of CM syste… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 29 pages, 4 figures, Journal Papers

  35. arXiv:2407.19130  [pdf

    physics.optics eess.IV

    Panoramic single-pixel imaging with megapixel resolution based on rotational subdivision

    Authors: Huan Cui, Jie Cao, Haoyu Zhang, Chang Zhou, Haifeng Yao, Yingbo Wang, Qun Hao

    Abstract: Single-pixel imaging (SPI) using a single-pixel detector is an unconventional imaging method, which has great application prospects in many fields to realize high-performance imaging. In especial, the recent proposed catadioptric panoramic ghost imaging (CPGI) extends the application potential of SPI to high-performance imaging at a wide field of view (FOV) with recent growing demands. However, th… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  36. arXiv:2407.18967  [pdf, other

    eess.IV eess.SP

    GroupCDL: Interpretable Denoising and Compressed Sensing MRI via Learned Group-Sparsity and Circulant Attention

    Authors: Nikola Janjusevic, Amirhossein Khalilian-Gourtani, Adeen Flinker, Li Feng, Yao Wang

    Abstract: Nonlocal self-similarity within images has become an increasingly popular prior in deep-learning models. Despite their successful image restoration performance, such models remain largely uninterpretable due to their black-box construction. Our previous studies have shown that interpretable construction of a fully convolutional denoiser (CDLNet), with performance on par with state-of-the-art black… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 13 pages, 8 figures. arXiv admin note: substantial text overlap with arXiv:2306.01950

  37. arXiv:2407.18613  [pdf

    cs.CV eess.IV

    Dilated Strip Attention Network for Image Restoration

    Authors: Fangwei Hao, Jiesheng Wu, Ji Du, Yinjie Wang, Jing Xu

    Abstract: Image restoration is a long-standing task that seeks to recover the latent sharp image from its deteriorated counterpart. Due to the robust capacity of self-attention to capture long-range dependencies, transformer-based methods or some attention-based convolutional neural networks have demonstrated promising results on many image restoration tasks in recent years. However, existing attention modu… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  38. arXiv:2407.18449  [pdf, other

    eess.IV cs.CV cs.LG

    Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation

    Authors: Jiabo Ma, Zhengrui Guo, Fengtao Zhou, Yihui Wang, Yingxue Xu, Yu Cai, Zhengjie Zhu, Cheng Jin, Yi Lin, Xinrui Jiang, Anjia Han, Li Liang, Ronald Cheong Kin Chan, Jiguang Wang, Kwang-Ting Cheng, Hao Chen

    Abstract: Foundation models pretrained on large-scale datasets are revolutionizing the field of computational pathology (CPath). The generalization ability of foundation models is crucial for the success in various downstream clinical tasks. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability and overall performance unclear.… ▽ More

    Submitted 3 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Report number: I.2.10

  39. arXiv:2407.18390  [pdf, other

    eess.IV cs.CV

    Adapting Mouse Pathological Model to Human Glomerular Lesion Segmentation

    Authors: Lining Yu, Mengmeng Yin, Ruining Deng, Quan Liu, Tianyuan Yao, Can Cui, Yu Wang, Yaohong Wang, Shilin Zhao, Haichun Yang, Yuankai Huo

    Abstract: Moving from animal models to human applications in preclinical research encompasses a broad spectrum of disciplines in medical science. A fundamental element in the development of new drugs, treatments, diagnostic methods, and in deepening our understanding of disease processes is the accurate measurement of kidney tissues. Past studies have demonstrated the viability of translating glomeruli segm… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  40. arXiv:2407.17882  [pdf, other

    eess.IV

    Artificial Immunofluorescence in a Flash: Rapid Synthetic Imaging from Brightfield Through Residual Diffusion

    Authors: Xiaodan Xing, Chunling Tang, Siofra Murdoch, Giorgos Papanastasiou, Yunzhe Guo, Xianglu Xiao, Jan Cross-Zamirski, Carola-Bibiane Schönlieb, Kristina Xiao Liang, Zhangming Niu, Evandro Fei Fang, Yinhai Wang, Guang Yang

    Abstract: Immunofluorescent (IF) imaging is crucial for visualizing biomarker expressions, cell morphology and assessing the effects of drug treatments on sub-cellular components. IF imaging needs extra staining process and often requiring cell fixation, therefore it may also introduce artefects and alter endogenouous cell morphology. Some IF stains are expensive or not readily available hence hindering exp… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  41. arXiv:2407.17758  [pdf, other

    eess.SP

    Speed-enhanced Subdomain Adaptation Regression for Long-term Stable Neural Decoding in Brain-computer Interfaces

    Authors: Jiyu Wei, Dazhong Rong, Xinyun Zhu, Qinming He, Yueming Wang

    Abstract: Brain-computer interfaces (BCIs) offer a means to convert neural signals into control signals, providing a potential restoration of movement for people with paralysis. Despite their promise, BCIs face a significant challenge in maintaining decoding accuracy over time due to neural nonstationarities. However, the decoding accuracy of BCI drops severely across days due to the neural data drift. Whil… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  42. arXiv:2407.16684  [pdf, other

    eess.IV cs.CV q-bio.NC

    AutoRG-Brain: Grounded Report Generation for Brain MRI

    Authors: Jiayu Lei, Xiaoman Zhang, Chaoyi Wu, Lisong Dai, Ya Zhang, Yanyong Zhang, Yanfeng Wang, Weidi Xie, Yuehua Li

    Abstract: Radiologists are tasked with interpreting a large number of images in a daily base, with the responsibility of generating corresponding reports. This demanding workload elevates the risk of human error, potentially leading to treatment delays, increased healthcare costs, revenue loss, and operational inefficiencies. To address these challenges, we initiate a series of work on grounded Automatic Re… ▽ More

    Submitted 29 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

  43. arXiv:2407.16634  [pdf, other

    eess.IV cs.AI cs.CV cs.HC

    Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

    Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, Liwei Wang

    Abstract: Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifical… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  44. arXiv:2407.16165  [pdf, other

    eess.IV cs.CV cs.LG

    Advanced AI Framework for Enhanced Detection and Assessment of Abdominal Trauma: Integrating 3D Segmentation with 2D CNN and RNN Models

    Authors: Liheng Jiang, Xuechun yang, Chang Yu, Zhizhong Wu, Yuting Wang

    Abstract: Trauma is a significant cause of mortality and disability, particularly among individuals under forty. Traditional diagnostic methods for traumatic injuries, such as X-rays, CT scans, and MRI, are often time-consuming and dependent on medical expertise, which can delay critical interventions. This study explores the application of artificial intelligence (AI) and machine learning (ML) to improve t… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 6 Pages

  45. arXiv:2407.15395  [pdf, other

    eess.SP

    FAST-GSC: Fast and Adaptive Semantic Transmission for Generative Semantic Communication

    Authors: Yiru Wang, Wanting Yang, Zehui Xiong, Yuping Zhao, Shiwen Mao, Tony Q. S. Quek, H. Vincent Poor

    Abstract: The rapidly evolving field of generative artificial intelligence technology has introduced innovative approaches for developing semantic communication (SemCom) frameworks, leading to the emergence of a new paradigm-generative SemCom (GSC). However, the complex processes involved in semantic extraction and generative inference may result in considerable latency in resource-constrained scenarios. To… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  46. arXiv:2407.14616  [pdf, other

    eess.IV cs.CV

    Deep Learning-based 3D Coronary Tree Reconstruction from Two 2D Non-simultaneous X-ray Angiography Projections

    Authors: Yiying Wang, Abhirup Banerjee, Robin P. Choudhury, Vicente Grau

    Abstract: Cardiovascular diseases (CVDs) are the most common cause of death worldwide. Invasive x-ray coronary angiography (ICA) is one of the most important imaging modalities for the diagnosis of CVDs. ICA typically acquires only two 2D projections, which makes the 3D geometry of coronary vessels difficult to interpret, thus requiring 3D coronary tree reconstruction from two projections. State-of-the-art… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 16 pages, 13 figures, 3 tables

  47. arXiv:2407.14153  [pdf, other

    eess.IV cs.CV

    ESP-MedSAM: Efficient Self-Prompting SAM for Universal Domain-Generalized Medical Image Segmentation

    Authors: Qing Xu, Jiaxuan Li, Xiangjian He, Ziyu Liu, Zhen Chen, Wenting Duan, Chenxin Li, Maggie M. He, Fiseha B. Tesema, Wooi P. Cheah, Yi Wang, Rong Qu, Jonathan M. Garibaldi

    Abstract: The universality of deep neural networks across different modalities and their generalization capabilities to unseen domains play an essential role in medical image segmentation. The recent Segment Anything Model (SAM) has demonstrated its potential in both settings. However, the huge computational costs, demand for manual annotations as prompts and conflict-prone decoding process of SAM degrade i… ▽ More

    Submitted 17 August, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

    Comments: Under Review

  48. arXiv:2407.13782  [pdf, other

    eess.AS cs.AI cs.SD

    Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition

    Authors: Shujie Hu, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan Li, Yi Wang, Mingyu Cui, Tianzi Wang, Helen Meng, Xunying Liu

    Abstract: Self-supervised learning (SSL) based speech foundation models have been applied to a wide range of ASR tasks. However, their application to dysarthric and elderly speech via data-intensive parameter fine-tuning is confronted by in-domain data scarcity and mismatch. To this end, this paper explores a series of approaches to integrate domain fine-tuned SSL pre-trained models and their features into… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

  49. arXiv:2407.11413  [pdf, other

    math.OC eess.SY

    Distributed Prescribed-Time Convex Optimization: Cascade Design and Time-Varying Gain Approach

    Authors: Gewei Zuo, Lijun Zhu, Yujuan Wang, Zhiyong Chen

    Abstract: In this paper, we address the distributed prescribed-time convex optimization (DPTCO) problem for a class of nonlinear multi-agent systems (MASs) under undirected connected graph. A cascade design framework is proposed such that the DPTCO implementation is divided into two parts: distributed optimal trajectory generator design and local reference trajectory tracking controller design. The DPTCO pr… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  50. arXiv:2407.11408  [pdf, other

    eess.SY

    Prescribed-time Cooperative Output Regulation of Linear Heterogeneous Multi-agent Systems

    Authors: Gewei Zuo, Lijun Zhu, Yujuan Wang, Zhiyong Chen

    Abstract: A finite-time protocol for a multi-agent systems (MASs) can guarantee the convergence of every agent in a finite time interval in contrast to the asymptotic convergence, but the settling time depends on the initial condition and design parameters and is inconsistent across the agents. In this paper, we study the prescribed-time cooperative output regulation (PTCOR) problem for a class of linear he… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.