Zum Hauptinhalt springen

Showing 1–50 of 869 results for author: Wang, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.12829  [pdf, other

    cs.LG cs.SD eess.AS

    Uncertainty-Aware Mean Opinion Score Prediction

    Authors: Hui Wang, Shiwan Zhao, Jiaming Zhou, Xiguang Zheng, Haoqin Sun, Xuechen Wang, Yong Qin

    Abstract: Mean Opinion Score (MOS) prediction has made significant progress in specific domains. However, the unstable performance of MOS prediction models across diverse samples presents ongoing challenges in the practical application of these systems. In this paper, we point out that the absence of uncertainty modeling is a significant limitation hindering MOS prediction systems from applying to the real… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by Interspeech 2024, oral

  2. arXiv:2408.12615  [pdf, other

    eess.IV cs.CV cs.LG

    Pediatric TSC-Related Epilepsy Classification from Clinical MR Images Using Quantum Neural Network

    Authors: Ling Lin, Yihang Zhou, Zhanqi Hu, Dian Jiang, Congcong Liu, Shuo Zhou, Yanjie Zhu, Jianxiang Liao, Dong Liang, Hairong Zheng, Haifeng Wang

    Abstract: Tuberous sclerosis complex (TSC) manifests as a multisystem disorder with significant neurological implications. This study addresses the critical need for robust classification models tailored to TSC in pediatric patients, introducing QResNet,a novel deep learning model seamlessly integrating conventional convolutional neural networks with quantum neural networks. The model incorporates a two-lay… ▽ More

    Submitted 26 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 5 pages,4 figures,2 tables,presented at ISBI 2024

  3. arXiv:2408.12102  [pdf, other

    cs.LG cs.CV cs.SD eess.AS

    Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization

    Authors: Luyao Cheng, Hui Wang, Siqi Zheng, Yafeng Chen, Rongjie Huang, Qinglin Zhang, Qian Chen, Xihao Li

    Abstract: Speaker diarization, the process of segmenting an audio stream or transcribed speech content into homogenous partitions based on speaker identity, plays a crucial role in the interpretation and analysis of human speech. Most existing speaker diarization systems rely exclusively on unimodal acoustic information, making the task particularly challenging due to the innate ambiguities of audio signals… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  4. arXiv:2408.11982  [pdf, other

    eess.IV cs.CV cs.MM

    AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

    Authors: Maksim Smirnov, Aleksandr Gushchin, Anastasia Antsiferova, Dmitry Vatolin, Radu Timofte, Ziheng Jia, Zicheng Zhang, Wei Sun, Jiaying Qian, Yuqin Cao, Yinan Sun, Yuxin Zhu, Xiongkuo Min, Guangtao Zhai, Kanjar De, Qing Luo, Ao-Xiang Zhang, Peng Zhang, Haibo Lei, Linyan Jiang, Yaqing Li, Wenhui Meng, Xiaoheng Tan, Haiqiang Wang, Xiaozhong Xu , et al. (11 additional authors not shown)

    Abstract: Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dat… ▽ More

    Submitted 28 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  5. arXiv:2408.11837  [pdf, other

    cs.LG cs.AI cs.HC eess.SP

    MicroXercise: A Micro-Level Comparative and Explainable System for Remote Physical Therapy

    Authors: Hanchen David Wang, Nibraas Khan, Anna Chen, Nilanjan Sarkar, Pamela Wisniewski, Meiyi Ma

    Abstract: Recent global estimates suggest that as many as 2.41 billion individuals have health conditions that would benefit from rehabilitation services. Home-based Physical Therapy (PT) faces significant challenges in providing interactive feedback and meaningful observation for therapists and patients. To fill this gap, we present MicroXercise, which integrates micro-motion analysis with wearable sensors… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE/ACM CHASE 2024

  6. arXiv:2408.11828  [pdf, other

    eess.SP cs.AI cs.LG

    Online Electric Vehicle Charging Detection Based on Memory-based Transformer using Smart Meter Data

    Authors: Ammar Mansoor Kamoona, Hui Song, Mahdi Jalili, Hao Wang, Reza Razzaghi, Xinghuo Yu

    Abstract: The growing popularity of Electric Vehicles (EVs) poses unique challenges for grid operators and infrastructure, which requires effectively managing these vehicles' integration into the grid. Identification of EVs charging is essential to electricity Distribution Network Operators (DNOs) for better planning and managing the distribution grid. One critical aspect is the ability to accurately identi… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  7. Reinforcement learning-based adaptive speed controllers in mixed autonomy condition

    Authors: Han Wang, Hossein Nick Zinat Matin, Maria Laura Delle Monache

    Abstract: The integration of Automated Vehicles (AVs) into traffic flow holds the potential to significantly improve traffic congestion by enabling AVs to function as actuators within the flow. This paper introduces an adaptive speed controller tailored for scenarios of mixed autonomy, where AVs interact with human-driven vehicles. We model the traffic dynamics using a system of strongly coupled Partial and… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  8. arXiv:2408.08883  [pdf

    eess.IV

    MR Optimized Reconstruction of Simultaneous Multi-Slice Imaging Using Diffusion Model

    Authors: Ting Zhao, Zhuoxu Cui, Sen Jia, Qingyong Zhu, Congcong Liu, Yihang Zhou, Yanjie Zhu, Dong Liang, Haifeng Wang

    Abstract: Diffusion model has been successfully applied to MRI reconstruction, including single and multi-coil acquisition of MRI data. Simultaneous multi-slice imaging (SMS), as a method for accelerating MR acquisition, can significantly reduce scanning time, but further optimization of reconstruction results is still possible. In order to optimize the reconstruction of SMS, we proposed a method to use dif… ▽ More

    Submitted 21 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted as ISMRM 2024 Digital Poster 4024

    Journal ref: ISMRM 2024 Digital poster 4024

  9. arXiv:2408.06645  [pdf

    eess.SY

    Dynamic Pricing of Electric Vehicle Charging Station Alliances Under Information Asymmetry

    Authors: Zeyu Liu, Yun Zhou, Donghan Feng, Shaolun Xu, Yin Yi, Hengjie Li, Haojing Wang

    Abstract: Due to the centralization of charging stations (CSs), CSs are organized as charging station alliances (CSAs) in the commercial competition. Under this situation, this paper studies the profit-oriented dynamic pricing strategy of CSAs. As the practicability basis, a privacy-protected bidirectional real-time information interaction framework is designed, under which the status of EVs is utilized as… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  10. arXiv:2408.06597  [pdf, ps, other

    eess.SP

    Line Spectral Estimation with Unlimited Sensing

    Authors: Hongwei Wang, Jun Fang, Hongbin Li, Geert Leus

    Abstract: In the paper, we consider the line spectral estimation problem in an unlimited sensing framework (USF), where a modulo analog-to-digital converter (ADC) is employed to fold the input signal back into a bounded interval before quantization. Such an operation is mathematically equivalent to taking the modulo of the input signal with respect to the interval. To overcome the noise sensitivity of highe… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  11. arXiv:2408.02095  [pdf, other

    cs.IT eess.SP

    Secure Semantic Communications: From Perspective of Physical Layer Security

    Authors: Yongkang Li, Zheng Shi, Han Hu, Yaru Fu, Hong Wang, Hongjiang Lei

    Abstract: Semantic communications have been envisioned as a potential technique that goes beyond Shannon paradigm. Unlike modern communications that provide bit-level security, the eaves-dropping of semantic communications poses a significant risk of potentially exposing intention of legitimate user. To address this challenge, a novel deep neural network (DNN) enabled secure semantic communication (DeepSSC)… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  12. arXiv:2408.01956  [pdf, ps, other

    eess.SP

    Enhancing Spatial Multiplexing and Interference Suppression for Near- and Far-Field Communications with Sparse MIMO

    Authors: Huizhi Wang, Chao Feng, Yong Zeng, Shi Jin, Chau Yuen, Bruno Clerckx, Rui Zhang

    Abstract: Multiple-input multiple-output has been a key technology for wireless systems for decades. For typical MIMO communication systems, antenna array elements are usually separated by half of the carrier wavelength, thus termed as conventional MIMO. In this paper, we investigate the performance of multi-user MIMO communication, with sparse arrays at both the transmitter and receiver side, i.e., the arr… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: 13 pages

  13. arXiv:2408.01696  [pdf, other

    cs.SD cs.AI eess.AS

    Generating High-quality Symbolic Music Using Fine-grained Discriminators

    Authors: Zhedong Zhang, Liang Li, Jiehua Zhang, Zhenghui Hu, Hongkui Wang, Chenggang Yan, Jian Yang, Yuankai Qi

    Abstract: Existing symbolic music generation methods usually utilize discriminator to improve the quality of generated music via global perception of music. However, considering the complexity of information in music, such as rhythm and melody, a single discriminator cannot fully reflect the differences in these two primary dimensions of music. In this work, we propose to decouple the melody and rhythm from… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: Accepted by ICPR2024

  14. arXiv:2408.00325  [pdf, other

    cs.SD eess.AS

    Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition

    Authors: Haoqin Sun, Shiwan Zhao, Xiangyu Kong, Xuechen Wang, Hui Wang, Jiaming Zhou, Yong Qin

    Abstract: Recognizing emotions from speech is a daunting task due to the subtlety and ambiguity of expressions. Traditional speech emotion recognition (SER) systems, which typically rely on a singular, precise emotion label, struggle with this complexity. Therefore, modeling the inherent ambiguity of emotions is an urgent problem. In this paper, we propose an iterative prototype refinement framework (IPR) f… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  15. arXiv:2407.20607  [pdf, other

    eess.SP

    Efficient Channel Estimation for Millimeter Wave and Terahertz Systems Enabled by Integrated Super-resolution Sensing and Communication

    Authors: Jingran Xu, Huizhi Wang, Yong Zeng, Xiaoli Xu, Qingqing Wu, Fei Yang, Yan Chen, Abbas Jamalipour

    Abstract: Integrated super-resolution sensing and communication (ISSAC) has emerged as a promising technology to achieve extremely high precision sensing for those key parameters, such as the angles of the sensing targets. In this paper, we propose an efficient channel estimation scheme enabled by ISSAC for millimeter wave (mmWave) and TeraHertz (THz) systems with a hybrid analog/digital beamforming archite… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 13 pages, 8 figures

  16. arXiv:2407.19841  [pdf, other

    eess.SP cs.AR

    RRAM-Based Bio-Inspired Circuits for Mobile Epileptic Correlation Extraction and Seizure Prediction

    Authors: Hao Wang, Lingfeng Zhang, Erjia Xiao, Xin Wang, Zhongrui Wang, Renjing Xu

    Abstract: Non-invasive mobile electroencephalography (EEG) acquisition systems have been utilized for long-term monitoring of seizures, yet they suffer from limited battery life. Resistive random access memory (RRAM) is widely used in computing-in-memory(CIM) systems, which offers an ideal platform for reducing the computational energy consumption of seizure prediction algorithms, potentially solving the en… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 7 pages, 5 figures

  17. Multi-Channel Factor Analysis: Identifiability and Asymptotics

    Authors: Gray Stanton, David Ramírez, Ignacio Santamaria, Louis Scharf, Haonan Wang

    Abstract: Recent work by Ramírez et al. [2] has introduced Multi-Channel Factor Analysis (MFA) as an extension of factor analysis to multi-channel data that allows for latent factors common to all channels as well as factors specific to each channel. This paper validates the MFA covariance model and analyzes the statistical properties of the MFA estimators. In particular, a thorough investigation of model i… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Journal ref: IEEE Transactions on Signal Processing (2024)

  18. arXiv:2407.17983  [pdf, other

    eess.SP

    Explain EEG-based End-to-end Deep Learning Models in the Frequency Domain

    Authors: Hanqi Wang, Kun Yang, Jingyu Zhang, Tao Chen, Liang Song

    Abstract: The recent rise of EEG-based end-to-end deep learning models presents a significant challenge in elucidating how these models process raw EEG signals and generate predictions in the frequency domain. This challenge limits the transparency and credibility of EEG-based end-to-end models, hindering their application in security-sensitive areas. To address this issue, we propose a mask perturbation me… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  19. arXiv:2407.14564  [pdf, ps, other

    eess.IV cs.AI cs.CV cs.LG

    APS-USCT: Ultrasound Computed Tomography on Sparse Data via AI-Physic Synergy

    Authors: Yi Sheng, Hanchen Wang, Yipei Liu, Junhuan Yang, Weiwen Jiang, Youzuo Lin, Lei Yang

    Abstract: Ultrasound computed tomography (USCT) is a promising technique that achieves superior medical imaging reconstruction resolution by fully leveraging waveform information, outperforming conventional ultrasound methods. Despite its advantages, high-quality USCT reconstruction relies on extensive data acquisition by a large number of transducers, leading to increased costs, computational demands, exte… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: MICCAI

  20. arXiv:2407.13566  [pdf

    cs.CY cs.SI eess.SY

    Decentralised Governance for Autonomous Cyber-Physical Systems

    Authors: Kelsie Nabben, Hongyang Wang, Michael Zargham

    Abstract: This paper examines the potential for Cyber-Physical Systems (CPS) to be governed in a decentralised manner, whereby blockchain-based infrastructure facilitates the communication between digital and physical domains through self-governing and self-organising principles. Decentralised governance paradigms that integrate computation in physical domains (such as 'Decentralised Autonomous Organisation… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Report number: Dawo/2024/20

  21. arXiv:2407.12271  [pdf, other

    cs.CV eess.IV

    RBAD: A Dataset and Benchmark for Retinal Vessels Branching Angle Detection

    Authors: Hao Wang, Wenhui Zhu, Jiayou Qin, Xin Li, Oana Dumitrascu, Xiwen Chen, Peijie Qiu, Abolfazl Razi

    Abstract: Detecting retinal image analysis, particularly the geometrical features of branching points, plays an essential role in diagnosing eye diseases. However, existing methods used for this purpose often are coarse-level and lack fine-grained analysis for efficient annotation. To mitigate these issues, this paper proposes a novel method for detecting retinal branching angles using a self-configured ima… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  22. arXiv:2407.11084  [pdf, other

    eess.IV cs.CV

    A Survey of Distance-Based Vessel Trajectory Clustering: Data Pre-processing, Methodologies, Applications, and Experimental Evaluation

    Authors: Maohan Liang, Ryan Wen Liu, Ruobin Gao, Zhe Xiao, Xiaocai Zhang, Hua Wang

    Abstract: Vessel trajectory clustering, a crucial component of the maritime intelligent transportation systems, provides valuable insights for applications such as anomaly detection and trajectory prediction. This paper presents a comprehensive survey of the most prevalent distance-based vessel trajectory clustering methods, which encompass two main steps: trajectory similarity measurement and clustering. I… ▽ More

    Submitted 19 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

  23. arXiv:2407.10325  [pdf, other

    eess.IV cs.CV

    Light Field Compression Based on Implicit Neural Representation

    Authors: Henan Wang, Hanxin Zhu, Zhibo Chen

    Abstract: Light field, as a new data representation format in multimedia, has the ability to capture both intensity and direction of light rays. However, the additional angular information also brings a large volume of data. Classical coding methods are not effective to describe the relationship between different views, leading to redundancy left. To address this problem, we propose a novel light field comp… ▽ More

    Submitted 7 May, 2024; originally announced July 2024.

    Comments: PCS2022

  24. arXiv:2407.07419  [pdf, other

    eess.SP

    Timing Recovery for Non-Orthogonal Multiple Access with Asynchronous Clock

    Authors: Qingxin Lu, Haide Wang, Wenxuan Mo, Ji Zhou, Weiping Liu, Changyuan Yu

    Abstract: A passive optical network (PON) based on non-orthogonal multiple access (NOMA) meets low latency and high capacity. In the NOMA-PON, the asynchronous clock between the strong and weak optical network units (ONUs) causes the timing error and phase noise on the signal of the weak ONU. The theoretical derivation shows that the timing error and phase noise can be independently compensated. In this Let… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: The Letter has been submitted to the IEEE Photonics Technology Letters

  25. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  26. arXiv:2407.04051  [pdf, other

    cs.SD cs.AI eess.AS

    FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

    Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

    Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress. Authors are listed in alphabetical order by family name

  27. arXiv:2407.03772  [pdf, other

    eess.IV cs.CV q-bio.QM

    CS3: Cascade SAM for Sperm Segmentation

    Authors: Yi Shi, Xu-Peng Tian, Yun-Kai Wang, Tie-Yi Zhang, Bin Yao, Hui Wang, Yong Shao, Cen-Cen Wang, Rong Zeng, De-Chuan Zhan

    Abstract: Automated sperm morphology analysis plays a crucial role in the assessment of male fertility, yet its efficacy is often compromised by the challenges in accurately segmenting sperm images. Existing segmentation techniques, including the Segment Anything Model(SAM), are notably inadequate in addressing the complex issue of sperm overlap-a frequent occurrence in clinical samples. Our exploratory stu… ▽ More

    Submitted 9 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Early accepted by MICCAI2024

  28. arXiv:2407.03308  [pdf, other

    physics.med-ph cs.AI eess.IV

    Accelerated Proton Resonance Frequency-based Magnetic Resonance Thermometry by Optimized Deep Learning Method

    Authors: Sijie Xu, Shenyan Zong, Chang-Sheng Mei, Guofeng Shen, Yueran Zhao, He Wang

    Abstract: Proton resonance frequency (PRF) based MR thermometry is essential for focused ultrasound (FUS) thermal ablation therapies. This work aims to enhance temporal resolution in dynamic MR temperature map reconstruction using an improved deep learning method. The training-optimized methods and five classical neural networks were applied on the 2-fold and 4-fold under-sampling k-space data to reconstruc… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  29. arXiv:2407.01911  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    Investigating the Effects of Large-Scale Pseudo-Stereo Data and Different Speech Foundation Model on Dialogue Generative Spoken Language Model

    Authors: Yu-Kuan Fu, Cheng-Kuang Lee, Hsiu-Hsuan Wang, Hung-yi Lee

    Abstract: Recent efforts in Spoken Dialogue Modeling aim to synthesize spoken dialogue without the need for direct transcription, thereby preserving the wealth of non-textual information inherent in speech. However, this approach faces a challenge when speakers talk simultaneously, requiring stereo dialogue data with speakers recorded on separate channels, a notably scarce resource. To address this, we have… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: submitted to interspeech 2024

  30. arXiv:2407.01419  [pdf, other

    eess.IV cs.CV cs.LG

    Neurovascular Segmentation in sOCT with Deep Learning and Synthetic Training Data

    Authors: Etienne Chollet, Yaël Balbastre, Chiara Mauri, Caroline Magnain, Bruce Fischl, Hui Wang

    Abstract: Microvascular anatomy is known to be involved in various neurological disorders. However, understanding these disorders is hindered by the lack of imaging modalities capable of capturing the comprehensive three-dimensional vascular network structure at microscopic resolution. With a lateral resolution of $<=$20 {\textmu}m and ability to reconstruct large tissue blocks up to tens of cubic centimete… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 10 pages, 10 figures

  31. arXiv:2407.01090  [pdf, other

    eess.IV cs.CV

    Learning 3D Gaussians for Extremely Sparse-View Cone-Beam CT Reconstruction

    Authors: Yiqun Lin, Hualiang Wang, Jixiang Chen, Xiaomeng Li

    Abstract: Cone-Beam Computed Tomography (CBCT) is an indispensable technique in medical imaging, yet the associated radiation exposure raises concerns in clinical practice. To mitigate these risks, sparse-view reconstruction has emerged as an essential research direction, aiming to reduce the radiation dose by utilizing fewer projections for CT reconstruction. Although implicit neural representations have b… ▽ More

    Submitted 7 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024. Project link: https://github.com/xmed-lab/DIF-Gaussian

  32. arXiv:2407.00678  [pdf, other

    eess.IV cs.CV

    A Review of Image Processing Methods in Prostate Ultrasound

    Authors: Haiqiao Wang, Hong Wu, Zhuoyuan Wang, Peiyan Yue, Dong Ni, Pheng-Ann Heng, Yi Wang

    Abstract: Prostate cancer (PCa) poses a significant threat to men's health, with early diagnosis being crucial for improving prognosis and reducing mortality rates. Transrectal ultrasound (TRUS) plays a vital role in the diagnosis and image-guided intervention of PCa.To facilitate physicians with more accurate and efficient computer-assisted diagnosis and interventions, many image processing algorithms in T… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  33. arXiv:2407.00414  [pdf, ps, other

    eess.SY math.OC

    Safe and Stable Filter Design Using a Relaxed Compatibitlity Control Barrier -- Lyapunov Condition

    Authors: Han Wang, Kostas Margellos, Antonis Papachristodoulou

    Abstract: In this paper, we propose a quadratic programming-based filter for safe and stable controller design, via a Control Barrier Function (CBF) and a Control Lyapunov Function (CLF). Our method guarantees safety and local asymptotic stability without the need for an asymptotically stabilizing control law. Feasibility of the proposed program is ensured under a mild regularity condition, termed relaxed c… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  34. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Yajing Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, Jing Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  35. arXiv:2406.16928  [pdf, other

    eess.SP cs.LG

    A Multi-Resolution Mutual Learning Network for Multi-Label ECG Classification

    Authors: Wei Huang, Ning Wang, Panpan Feng, Haiyan Wang, Zongmin Wang, Bing Zhou

    Abstract: Electrocardiograms (ECG), which record the electrophysiological activity of the heart, have become a crucial tool for diagnosing these diseases. In recent years, the application of deep learning techniques has significantly improved the performance of ECG signal classification. Multi-resolution feature analysis, which captures and processes information at different time scales, can extract subtle… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  36. arXiv:2406.16314  [pdf, other

    eess.AS

    DreamVoice: Text-Guided Voice Conversion

    Authors: Jiarui Hai, Karan Thakkar, Helin Wang, Zengyi Qin, Mounya Elhilali

    Abstract: Generative voice technologies are rapidly evolving, offering opportunities for more personalized and inclusive experiences. Traditional one-shot voice conversion (VC) requires a target recording during inference, limiting ease of usage in generating desired voice timbres. Text-guided generation offers an intuitive solution to convert voices to desired "DreamVoices" according to the users' needs. O… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  37. arXiv:2406.13645  [pdf, other

    eess.IV cs.CV

    Advancing UWF-SLO Vessel Segmentation with Source-Free Active Domain Adaptation and a Novel Multi-Center Dataset

    Authors: Hongqiu Wang, Xiangde Luo, Wu Chen, Qingqing Tang, Mei Xin, Qiong Wang, Lei Zhu

    Abstract: Accurate vessel segmentation in Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images is crucial for diagnosing retinal diseases. Although recent techniques have shown encouraging outcomes in vessel segmentation, models trained on one medical dataset often underperform on others due to domain shifts. Meanwhile, manually labeling high-resolution UWF-SLO images is an extremely challenging,… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024 Early Accept

  38. arXiv:2406.12703  [pdf, other

    eess.IV cs.CV

    Coarse-Fine Spectral-Aware Deformable Convolution For Hyperspectral Image Reconstruction

    Authors: Jincheng Yang, Lishun Wang, Miao Cao, Huan Wang, Yinping Zhao, Xin Yuan

    Abstract: We study the inverse problem of Coded Aperture Snapshot Spectral Imaging (CASSI), which captures a spatial-spectral data cube using snapshot 2D measurements and uses algorithms to reconstruct 3D hyperspectral images (HSI). However, current methods based on Convolutional Neural Networks (CNNs) struggle to capture long-range dependencies and non-local similarities. The recently popular Transformer-b… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures, Accepted by ICIP2024

  39. arXiv:2406.12019  [pdf

    eess.SY cs.CR cs.ET eess.SP

    Hacking Encrypted Wireless Power: Cyber-Security of Dynamic Charging

    Authors: Hui Wang, Nima Tashakor, Wei Jiang, Wei Liu, C. Q. Jiang, Stefan M. Goetz

    Abstract: Recently, energy encryption for wireless power transfer has been developed for energy safety, which is important in public places to suppress unauthorized energy extraction. Most techniques vary the frequency so that unauthorized receivers cannot extract energy because of non-resonance. However, this strategy is unreliable. To stimulate the progress of energy encryption technology and point out se… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 10 pages, 17 figures

  40. arXiv:2406.11169   

    eess.AS cs.SD

    Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Wen Wang

    Abstract: Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persisting challenge. In this paper, we propose a new self-supervised speaker verification approach, Self-Distillation Prototypes Network (SDPN), which effectively facilitates self-supervised speaker representation learning. SDPN assigns the representation of the augmented views of an… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: We update this paper to an earlier paper

  41. arXiv:2406.10052  [pdf, other

    cs.SD cs.CL eess.AS

    Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection

    Authors: Haoyu Wang, Guoqiang Hu, Guodong Lin, Wei-Qiang Zhang, Jian Li

    Abstract: As a robust and large-scale multilingual speech recognition model, Whisper has demonstrated impressive results in many low-resource and out-of-distribution scenarios. However, its encoder-decoder structure hinders its application to streaming speech recognition. In this paper, we introduce Simul-Whisper, which uses the time alignment embedded in Whisper's cross-attention to guide auto-regressive d… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  42. arXiv:2406.08445  [pdf, other

    eess.AS cs.LG cs.SD

    SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models

    Authors: Chun Yin, Tai-Shih Chi, Yu Tsao, Hsin-Min Wang

    Abstract: Representations from pre-trained speech foundation models (SFMs) have shown impressive performance in many downstream tasks. However, the potential benefits of incorporating pre-trained SFM representations into speaker voice similarity assessment have not been thoroughly investigated. In this paper, we propose SVSNet+, a model that integrates pre-trained SFM representations to improve performance… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  43. arXiv:2406.08203  [pdf, other

    eess.AS cs.SD

    LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation

    Authors: Wenhao Guan, Kaidi Wang, Wangjin Zhou, Yang Wang, Feng Deng, Hui Wang, Lin Li, Qingyang Hong, Yong Qin

    Abstract: Recently, the application of diffusion models has facilitated the significant development of speech and audio generation. Nevertheless, the quality of samples generated by diffusion models still needs improvement. And the effectiveness of the method is accompanied by the extensive number of sampling steps, leading to an extended synthesis time necessary for generating high-quality audio. Previous… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech2024

  44. arXiv:2406.07871  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Flexible Music-Conditioned Dance Generation with Style Description Prompts

    Authors: Hongsong Wang, Yin Zhu, Xin Geng

    Abstract: Dance plays an important role as an artistic form and expression in human culture, yet the creation of dance remains a challenging task. Most dance generation methods primarily rely solely on music, seldom taking into consideration intrinsic attributes such as music style or genre. In this work, we introduce Flexible Dance Generation with Style Description Prompts (DGSDP), a diffusion-based framew… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  45. arXiv:2406.07461  [pdf, other

    eess.AS

    Noise-robust Speech Separation with Fast Generative Correction

    Authors: Helin Wang, Jesus Villalba, Laureano Moro-Velazquez, Jiarui Hai, Thomas Thebaud, Najim Dehak

    Abstract: Speech separation, the task of isolating multiple speech sources from a mixed audio signal, remains challenging in noisy environments. In this paper, we propose a generative correction method to enhance the output of a discriminative separator. By leveraging a generative corrector based on a diffusion model, we refine the separation process for single-channel mixture speech by removing noises and… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  46. arXiv:2406.06744  [pdf

    cs.LG cs.CR eess.SY

    A Multi-module Robust Method for Transient Stability Assessment against False Label Injection Cyberattacks

    Authors: Hanxuan Wang, Na Lu, Yinhong Liu, Zhuqing Wang, Zixuan Wang

    Abstract: The success of deep learning in transient stability assessment (TSA) heavily relies on high-quality training data. However, the label information in TSA datasets is vulnerable to contamination through false label injection (FLI) cyberattacks, resulting in degraded performance of deep TSA models. To address this challenge, a Multi-Module Robust TSA method (MMR) is proposed to rectify the supervised… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  47. arXiv:2406.05954  [pdf, other

    cs.AI cs.LG eess.SY

    Aligning Large Language Models with Representation Editing: A Control Perspective

    Authors: Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang

    Abstract: Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabi… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: fix typos

  48. arXiv:2406.05359  [pdf, other

    eess.AS cs.SD

    Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization

    Authors: Bei Liu, Haoyu Wang, Yanmin Qian

    Abstract: Modern speaker verification (SV) systems typically demand expensive storage and computing resources, thereby hindering their deployment on mobile devices. In this paper, we explore adaptive neural network quantization for lightweight speaker verification. Firstly, we propose a novel adaptive uniform precision quantization method which enables the dynamic generation of quantization centroids custom… ▽ More

    Submitted 21 July, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

  49. arXiv:2406.04105  [pdf, other

    cs.LG eess.IV

    From Tissue Plane to Organ World: A Benchmark Dataset for Multimodal Biomedical Image Registration using Deep Co-Attention Networks

    Authors: Yifeng Wang, Weipeng Li, Thomas Pearce, Haohan Wang

    Abstract: Correlating neuropathology with neuroimaging findings provides a multiscale view of pathologic changes in the human organ spanning the meso- to micro-scales, and is an emerging methodology expected to shed light on numerous disease states. To gain the most information from this multimodal, multiscale approach, it is desirable to identify precisely where a histologic tissue section was taken from w… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  50. arXiv:2406.03961  [pdf, ps, other

    eess.IV cs.CV

    LDM-RSIC: Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression

    Authors: Junhui Li, Jutao Li, Xingsong Hou, Huake Wang, Yutao Zhang, Yujie Dun, Wenke Sun

    Abstract: Deep learning-based image compression algorithms typically focus on designing encoding and decoding networks and improving the accuracy of entropy model estimation to enhance the rate-distortion (RD) performance. However, few algorithms leverage the compression distortion prior from existing compression algorithms to improve RD performance. In this paper, we propose a latent diffusion model-based… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.