Zum Hauptinhalt springen

Showing 1–50 of 464 results for author: Chen, W

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.14340  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Foundation Models for Music: A Survey

    Authors: Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elio Quinton, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg , et al. (18 additional authors not shown)

    Abstract: In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the signifi… ▽ More

    Submitted 27 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  2. arXiv:2408.13978  [pdf, other

    eess.IV cs.CV

    Histology Virtual Staining with Mask-Guided Adversarial Transfer Learning for Tertiary Lymphoid Structure Detection

    Authors: Qiuli Wang, Yongxu Liu, Li Ma, Xianqi Wang, Wei Chen, Xiaohong Yao

    Abstract: Histological Tertiary Lymphoid Structures (TLSs) are increasingly recognized for their correlation with the efficacy of immunotherapy in various solid tumors. Traditionally, the identification and characterization of TLSs rely on immunohistochemistry (IHC) staining techniques, utilizing markers such as CD20 for B cells. Despite the specificity of IHC, Hematoxylin-Eosin (H&E) staining offers a more… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 8 pages, 8 figures

  3. arXiv:2408.13975  [pdf

    physics.med-ph eess.IV

    Cross-sectional imaging of speed-of-sound distribution using photoacoustic reversal beacons

    Authors: Yang Wang, Danni Wang, Liting Zhong, Yi Zhou, Qing Wang, Wufan Chen, Li Qi

    Abstract: Photoacoustic tomography (PAT) enables non-invasive cross-sectional imaging of biological tissues, but it fails to map the spatial variation of speed-of-sound (SOS) within tissues. While SOS is intimately linked to density and elastic modulus of tissues, the imaging of SOS distri-bution serves as a complementary imaging modality to PAT. Moreover, an accurate SOS map can be leveraged to correct for… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  4. arXiv:2408.13483  [pdf, other

    eess.SP cs.IT

    Transmissive RIS Enabled Transceiver Systems:Architecture, Design Issues and Opportunities

    Authors: Zhendong Li, Wen Chen, Qingqing Wu, Ziwei Liu, Chong He, Xudong Bai, Jun Li

    Abstract: Reconfigurable intelligent surface (RIS) is anticipated to augment the performance of beyond fifth-generation (B5G) and sixth-generation (6G) networks by intelligently manipulating the state of its components. Rather than employing reflective RIS for aided communications, this paper proposes an innovative transmissive RIS-enabled transceiver (TRTC) architecture that can accomplish the functions of… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Journal ref: IEEE VTM, 2024

  5. arXiv:2408.12069  [pdf, other

    cs.IT eess.SP

    Rotatable Block-Controlled RIS: Bridging the Performance Gap to Element-Controlled Systems

    Authors: Weicong Chen, Xinyi Yang, Chao-Kai Wen, Wankai Tang, Jinghe Wang, Yifei Yuan, Xiao Li, Shi Jin

    Abstract: The passive reconfigurable intelligent surface (RIS) requires numerous elements to achieve adequate array gain, which linearly increases power consumption (PC) with the number of reflection phases. To address this, this letter introduces a rotatable block-controlled RIS (BC-RIS) that preserves spectral efficiency (SE) while reducing power costs. Unlike the element-controlled RIS (EC-RIS), which ne… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  6. arXiv:2408.07665  [pdf, ps, other

    cs.CL eess.AS

    Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models

    Authors: Yi-Cheng Lin, Wei-Chih Chen, Hung-yi Lee

    Abstract: Warning: This paper may contain texts with uncomfortable content. Large Language Models (LLMs) have achieved remarkable performance in various tasks, including those involving multimodal data like speech. However, these models often exhibit biases due to the nature of their training data. Recently, more Speech Large Language Models (SLLMs) have emerged, underscoring the urgent need to address th… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  7. arXiv:2408.02025  [pdf, other

    cs.SD cs.AI eess.AS

    Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face Association

    Authors: Wuyang Chen, Yanjie Sun, Kele Xu, Yong Dou

    Abstract: The innate correlation between a person's face and voice has recently emerged as a compelling area of study, especially within the context of multilingual environments. This paper introduces our novel solution to the Face-Voice Association in Multilingual Environments (FAME) 2024 challenge, focusing on a contrastive learning-based chaining-cluster method to enhance face-voice association. This tas… ▽ More

    Submitted 19 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

  8. arXiv:2407.19511  [pdf, ps, other

    cs.IT eess.SP

    Suppressing Beam Squint Effect For Near-Field Wideband Communication Through Movable Antennas

    Authors: Yanze Zhu, Qingqing Wu, Yang Liu, Qingjiang Shi, Wen Chen

    Abstract: In this correspondence, we study deploying movable antenna (MA) array in a wideband multiple-input-single-output (MISO) communication system, where near-field (NF) channel model is considered. To alleviate beam squint effect, we propose to maximize the minimum analog beamforming gain across the entire wideband spectrum by appropriately adjusting MAs' positions, which is a highly challenging task.… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures, submitted to IEEE journal

  9. arXiv:2407.18986  [pdf

    eess.SY

    TERIME: An improved RIME algorithm with enhanced exploration and exploitation for robust parameter extraction of photovoltaic models

    Authors: Shi-Shun Chen, Yu-Tong Jiang, Wen-Bin Chen, Xiao-Yang Li

    Abstract: Parameter extraction of photovoltaic (PV) models is crucial for the planning, optimization, and control of PV systems. Although some methods using meta-heuristic algorithms have been proposed to determine these parameters, the robustness of solutions obtained by these methods faces great challenges when the complexity of the PV model increases. The unstable results will affect the reliable operati… ▽ More

    Submitted 1 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  10. arXiv:2407.15139  [pdf, ps, other

    eess.SY

    An Interface Method for Co-simulation of EMT Model and Shifted Frequency EMT Model Based on Rotational Invariance Techniques

    Authors: Shilin Gao, Ying Chen, Zhitong Yu, Wensheng Chen, Yankan Song

    Abstract: The shifted frequency-based electromagnetic transient (SFEMT) simulation has greatly improved the computational efficiency of traditional electromagnetic transient (EMT) simulation for the ac grid. This letter proposes a novel interface for the co-simulation of the SFEMT model and the traditional EMT model. The general form of SFEMT modeling and the principle of analytical signal construction are… ▽ More

    Submitted 27 August, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

  11. arXiv:2407.13123  [pdf, other

    cs.LG cs.DC cs.NI eess.SP

    Reconfigurable Intelligent Surface Aided Vehicular Edge Computing: Joint Phase-shift Optimization and Multi-User Power Allocation

    Authors: Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief

    Abstract: Vehicular edge computing (VEC) is an emerging technology with significant potential in the field of internet of vehicles (IoV), enabling vehicles to perform intensive computational tasks locally or offload them to nearby edge devices. However, the quality of communication links may be severely deteriorated due to obstacles such as buildings, impeding the offloading process. To address this challen… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at https://github.com/qiongwu86/DDPG-RIS-MADDPG-POWER. arXiv admin note: text overlap with arXiv:2406.11318

  12. arXiv:2407.12472  [pdf, other

    eess.SP cs.IT eess.SY

    Energy-Aware UAV-Enabled Target Tracking: Online Optimization with Location Constraints

    Authors: Yifan Jiang, Qingqing Wu, Wen Chen, Hongxun Hui

    Abstract: For unmanned aerial vehicle (UAV) trajectory design, the total propulsion energy consumption and initial-final location constraints are practical factors to consider. However, unlike traditional offline designs, these two constraints are non-trivial to concurrently satisfy in online UAV trajectory designs for real-time target tracking, due to the undetermined information. To address this issue, we… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  13. arXiv:2407.11875  [pdf, ps, other

    eess.SP

    Cramer-Rao Bound Minimization for Movable Antenna-Assisted Multiuser Integrated Sensing and Communications

    Authors: Haoran Qin, Wen Chen, Qingqing Wu, Ziheng Zhang, Zhendong Li, Nan Cheng

    Abstract: This paper investigates a movable antenna (MA)-assisted multiuser integrated sensing and communication (ISAC) system, where the base station (BS) and communication users are all equipped with MA for improving both the sensing and communication performance. We employ the Cramer-Rao bound (CRB) as the performance metric of sensing, thus a joint beamforming design and MAs' position optimizing problem… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  14. arXiv:2407.11079  [pdf, ps, other

    eess.SP cs.IT

    One-Bit MIMO Detection: From Global Maximum-Likelihood Detector to Amplitude Retrieval Approach

    Authors: Mingjie Shao, Wei-Kun Chen, Cheng-Yang Yu, Ya-Feng Liu, Wing-Kin Ma

    Abstract: As communication systems advance towards the future 6G era, the incorporation of large-scale antenna arrays in base stations (BSs) presents challenges such as increased hardware costs and energy consumption. To address these issues, the use of one-bit analog-to-digital converters (ADCs)/digital-to-analog converters (DACs) has gained significant attentions. This paper focuses on one-bit multiple-in… ▽ More

    Submitted 16 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

  15. arXiv:2407.08961  [pdf

    eess.IV cs.CV

    Tissue-Contrastive Semi-Masked Autoencoders for Segmentation Pretraining on Chest CT

    Authors: Jie Zheng, Ru Wen, Haiqin Hu, Lina Wei, Kui Su, Wei Chen, Chen Liu, Jun Wang

    Abstract: Existing Masked Image Modeling (MIM) depends on a spatial patch-based masking-reconstruction strategy to perceive objects'features from unlabeled images, which may face two limitations when applied to chest CT: 1) inefficient feature learning due to complex anatomical details presented in CT images, and 2) suboptimal knowledge transfer owing to input disparity between upstream and downstream model… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  16. arXiv:2407.06957  [pdf, other

    eess.AS cs.CL cs.CY

    Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models

    Authors: Yi-Cheng Lin, Tzu-Quan Lin, Chih-Kai Yang, Ke-Han Lu, Wei-Chih Chen, Chun-Yi Kuan, Hung-yi Lee

    Abstract: Speech Integrated Large Language Models (SILLMs) combine large language models with speech perception to perform diverse tasks, such as emotion recognition to speaker verification, demonstrating universal audio understanding capability. However, these models may amplify biases present in training data, potentially leading to biased access to information for marginalized groups. This work introduce… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  17. arXiv:2407.06767  [pdf, other

    cs.IT eess.SP

    Enhancing Robustness and Security in ISAC Network Design: Leveraging Transmissive Reconfigurable Intelligent Surface with RSMA

    Authors: Ziwei Liu, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, Qiong Wu, Nan Cheng

    Abstract: In this paper, we propose a novel transmissive reconfigurable intelligent surface transceiver-enhanced robust and secure integrated sensing and communication network. A time-division sensing communication mechanism is designed for the scenario, which enables communication and sensing to share wireless resources. To address the interference management problem and hinder eavesdropping, we implement… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  18. arXiv:2407.05331  [pdf, ps, other

    eess.SY

    Channel Characterization of IRS-assisted Resonant Beam Communication Systems

    Authors: Wen Fang, Wen Chen, Qingqing Wu, Xusheng Zhu, Qiong Wu, Nan Cheng

    Abstract: To meet the growing demand for data traffic, spectrum-rich optical wireless communication (OWC) has emerged as a key technological driver for the development of 6G. The resonant beam communication (RBC) system, which employs spatially separated laser cavities as the transmitter and receiver, is a high-speed OWC technology capable of self-alignment without tracking. However, its transmission throug… ▽ More

    Submitted 15 August, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

  19. arXiv:2407.05052  [pdf, other

    eess.SP

    A Marginal Distributionally Robust Kalman Filter for Centralized Fusion

    Authors: Weizhi Chen

    Abstract: State estimation is a fundamental problem for multi-sensor information fusion, essential in applications such as target tracking, power systems, and control automation. Previous research mostly ignores the correlation between sensors and assumes independent or known distributions. However, in practice, these distributions are often correlated and difAcult to estimate. This paper proposes a novel m… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  20. arXiv:2407.04888  [pdf, other

    eess.IV cs.CV

    Unraveling Radiomics Complexity: Strategies for Optimal Simplicity in Predictive Modeling

    Authors: Mahdi Ait Lhaj Loutfi, Teodora Boblea Podasca, Alex Zwanenburg, Taman Upadhaya, Jorge Barrios, David R. Raleigh, William C. Chen, Dante P. I. Capaldi, Hong Zheng, Olivier Gevaert, Jing Wu, Alvin C. Silva, Paul J. Zhang, Harrison X. Bai, Jan Seuntjens, Steffen Löck, Patrick O. Richard, Olivier Morin, Caroline Reinhold, Martin Lepage, Martin Vallières

    Abstract: Background: The high dimensionality of radiomic feature sets, the variability in radiomic feature types and potentially high computational requirements all underscore the need for an effective method to identify the smallest set of predictive features for a given clinical problem. Purpose: Develop a methodology and tools to identify and explain the smallest set of predictive radiomic features. Mat… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  21. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  22. arXiv:2407.04336  [pdf, ps, other

    eess.SP cs.AI

    AI-Based Beam-Level and Cell-Level Mobility Management for High Speed Railway Communications

    Authors: Wen Li, Wei Chen, Shiyue Wang, Yuanyuan Zhang, Michail Matthaiou, Bo Ai

    Abstract: High-speed railway (HSR) communications are pivotal for ensuring rail safety, operations, maintenance, and delivering passenger information services. The high speed of trains creates rapidly time-varying wireless channels, increases the signaling overhead, and reduces the system throughput, making it difficult to meet the growing and stringent needs of HSR applications. In this article, we explore… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  23. arXiv:2407.03992  [pdf, other

    eess.IV

    Performance of Medical Image Fusion in High-level Analysis Tasks: A Mutual Enhancement Framework for Unaligned PAT and MRI Image Fusion

    Authors: Yutian Zhong, Jinchuan He, Zhichao Liang, Shuangyang Zhang, Qianjin Feng, Wufan Chen, Li Qi

    Abstract: Photoacoustic tomography (PAT) offers optical contrast, whereas magnetic resonance imaging (MRI) excels in imaging soft tissue and organ anatomy. The fusion of PAT with MRI holds promising application prospects due to their complementary advantages. Existing image fusion have made considerable progress in pre-registered images, yet spatial deformations are difficult to avoid in medical imaging sce… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  24. arXiv:2407.02631  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Nollywood: Let's Go to the Movies!

    Authors: John E. Ortega, Ibrahim Said Ahmad, William Chen

    Abstract: Nollywood, based on the idea of Bollywood from India, is a series of outstanding movies that originate from Nigeria. Unfortunately, while the movies are in English, they are hard to understand for many native speakers due to the dialect of English that is spoken. In this article, we accomplish two goals: (1) create a phonetic sub-title model that is able to translate Nigerian English speech to Ame… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures, 2 tables

  25. arXiv:2407.01006  [pdf, other

    eess.SP

    Multi-Functional Beamforming Design for Integrated Sensing, Communication, and Computation

    Authors: Yapeng Zhao, Qingqing Wu, Wen Chen, Yong Zeng, Ruiqi Liu, Weidong Mei, Fen Hou, Shaodan Ma

    Abstract: Integrated sensing and communication (ISAC) systems may face a heavy computation burden since the sensory data needs to be further processed. This paper studies a novel system that integrates sensing, communication, and computation, aiming to provide services for different objectives efficiently. This system consists of a multi-antenna multi-functional base station (BS), an edge server, a target,… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  26. arXiv:2407.00837  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Towards Robust Speech Representation Learning for Thousands of Languages

    Authors: William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe

    Abstract: Self-supervised learning (SSL) has helped extend speech technologies to more languages by reducing the need for labeled data. However, models are still far from supporting the world's 7000+ languages. We propose XEUS, a Cross-lingual Encoder for Universal Speech, trained on over 1 million hours of data across 4057 languages, extending the language coverage of SSL models 4-fold. We combine 1 millio… ▽ More

    Submitted 2 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: Updated affiliations; 20 pages

  27. arXiv:2407.00614  [pdf, other

    cs.RO cs.CV eess.IV

    Learning Granularity-Aware Affordances from Human-Object Interaction for Tool-Based Functional Grasping in Dexterous Robotics

    Authors: Fan Yang, Wenrui Chen, Kailun Yang, Haoran Lin, DongSheng Luo, Conghui Tang, Zhiyong Li, Yaonan Wang

    Abstract: To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool grasping remains unresolved. To address this, we pr… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: The source code and the established dataset will be made publicly available at https://github.com/yangfan293/GAAF-DEX

  28. arXiv:2406.18935  [pdf

    eess.SY

    Generalized Averaging Method for Power Electronics Modeling from DC to above Half the Switching Frequency

    Authors: Hongchang Li, Kangping Wang, Jingyang Fang, Wenjie Chen, Xu Yang

    Abstract: Modeling power electronic converters at frequencies close to or above half the switching frequency has been difficult due to the time-variant and discontinuous switching actions. This paper uses the properties of moving Fourier coefficients to develop the generalized averaging method, breaking though the limit of half the switching frequency. The paper also proposes the generalized average model f… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  29. arXiv:2406.18538  [pdf, other

    cs.CV cs.AI eess.IV

    VideoQA-SC: Adaptive Semantic Communication for Video Question Answering

    Authors: Jiangyuan Guo, Wei Chen, Yuxuan Sun, Jialong Xu, Bo Ai

    Abstract: Although semantic communication (SC) has shown its potential in efficiently transmitting multi-modal data such as text, speeches and images, SC for videos has focused primarily on pixel-level reconstruction. However, these SC systems may be suboptimal for downstream intelligent tasks. Moreover, SC systems without pixel-level video reconstruction present advantages by achieving higher bandwidth eff… ▽ More

    Submitted 17 May, 2024; originally announced June 2024.

  30. arXiv:2406.18310  [pdf, other

    cs.CV cs.LG eess.IV

    Spatial-temporal Hierarchical Reinforcement Learning for Interpretable Pathology Image Super-Resolution

    Authors: Wenting Chen, Jie Liu, Tommy W. S. Chow, Yixuan Yuan

    Abstract: Pathology image are essential for accurately interpreting lesion cells in cytopathology screening, but acquiring high-resolution digital slides requires specialized equipment and long scanning times. Though super-resolution (SR) techniques can alleviate this problem, existing deep learning models recover pathology image in a black-box manner, which can lead to untruthful biological details and mis… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to IEEE TRANSACTIONS ON MEDICAL IMAGING (TMI)

  31. arXiv:2406.18026  [pdf

    eess.SY

    USLC: Universal Self-Learning Control via Physical Performance Policy-Optimization Neural Network

    Authors: Yanhui Zhang, Weifang Chen

    Abstract: This study addresses the challenge of achieving real-time Universal Self-Learning Control (USLC) in nonlinear dynamic systems with uncertain models. The proposed control method incorporates a Universal Self-Learning module, which introduces a model-free online executor-evaluator framework to enable controller adaptation in the presence of unknown disturbances. By leveraging a neural network model… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 15 Pages

  32. arXiv:2406.14939  [pdf, other

    cs.IT eess.SP

    RIS-aided MIMO Beamforming: Piece-Wise Near-field Channel Model

    Authors: Weijian Chen, Zai Yang, Zhiqiang Wei, Derrick Wing Kwan Ng, Michail Matthaiou

    Abstract: This paper proposes a joint active and passive beamforming design for reconfigurable intelligent surface (RIS)-aided wireless communication systems, adopting a piece-wise near-field channel model. While a traditional near-field channel model, applied without any approximations, offers higher modeling accuracy than a far-field model, it renders the system design more sensitive to channel estimation… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 28pages

  33. arXiv:2406.13645  [pdf, other

    eess.IV cs.CV

    Advancing UWF-SLO Vessel Segmentation with Source-Free Active Domain Adaptation and a Novel Multi-Center Dataset

    Authors: Hongqiu Wang, Xiangde Luo, Wu Chen, Qingqing Tang, Mei Xin, Qiong Wang, Lei Zhu

    Abstract: Accurate vessel segmentation in Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images is crucial for diagnosing retinal diseases. Although recent techniques have shown encouraging outcomes in vessel segmentation, models trained on one medical dataset often underperform on others due to domain shifts. Meanwhile, manually labeling high-resolution UWF-SLO images is an extremely challenging,… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024 Early Accept

  34. arXiv:2406.12699  [pdf, other

    cs.SD eess.AS eess.SP

    Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition

    Authors: Kuan-Chen Wang, You-Jin Li, Wei-Lun Chen, Yu-Wen Chen, Yi-Ching Wang, Ping-Cheng Yeh, Chao Zhang, Yu Tsao

    Abstract: Noise robustness is critical when applying automatic speech recognition (ASR) in real-world scenarios. One solution involves the used of speech enhancement (SE) models as the front end of ASR. However, neural network-based (NN-based) SE often introduces artifacts into the enhanced signals and harms ASR performance, particularly when SE and ASR are independently trained. Therefore, this study intro… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  35. arXiv:2406.11245  [pdf, other

    cs.LG cs.DC cs.NI eess.SP

    Deep-Reinforcement-Learning-Based AoI-Aware Resource Allocation for RIS-Aided IoV Networks

    Authors: Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief

    Abstract: Reconfigurable Intelligent Surface (RIS) is a pivotal technology in communication, offering an alternative path that significantly enhances the link quality in wireless communication environments. In this paper, we propose a RIS-assisted internet of vehicles (IoV) network, considering the vehicle-to-everything (V2X) communication method. In addition, in order to improve the timeliness of vehicle-t… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at https://github.com/qiongwu86/RIS-RB-AoI-V2X-DRL.git

  36. arXiv:2406.10083  [pdf, other

    cs.CL cs.SD eess.AS

    On the Evaluation of Speech Foundation Models for Spoken Language Understanding

    Authors: Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe

    Abstract: The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking of complex spoken language understanding (SLU) tasks, including both classification and sequence generation tasks, on natural speech. The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for th… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL Findings 2024

  37. arXiv:2406.09846  [pdf, ps, other

    cs.IT eess.SP

    Multiple Intelligent Reflecting Surfaces Collaborative Wireless Localization System

    Authors: Ziheng Zhang, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, Jingfeng Chen, Nan Cheng

    Abstract: This paper studies a multiple intelligent reflecting surfaces (IRSs) collaborative localization system where multiple semi-passive IRSs are deployed in the network to locate one or more targets based on time-of-arrival. It is assumed that each semi-passive IRS is equipped with reflective elements and sensors, which are used to establish the line-of-sight links from the base station (BS) to multipl… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 13 pages, 8 figures

  38. arXiv:2406.09627  [pdf, other

    cs.CV cs.AI eess.IV

    RobustSAM: Segment Anything Robustly on Degraded Images

    Authors: Wei-Ting Chen, Yu-Jiet Vong, Sy-Yen Kuo, Sizhuo Ma, Jian Wang

    Abstract: Segment Anything Model (SAM) has emerged as a transformative approach in image segmentation, acclaimed for its robust zero-shot segmentation capabilities and flexible prompting system. Nonetheless, its performance is challenged by images with degraded quality. Addressing this limitation, we propose the Robust Segment Anything Model (RobustSAM), which enhances SAM's performance on low-quality image… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR2024 (Highlight); Project Page: https://robustsam.github.io/

  39. arXiv:2406.09622  [pdf, other

    cs.CV cs.AI eess.IV

    DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer

    Authors: Wei-Ting Chen, Gurunandan Krishnan, Qiang Gao, Sy-Yen Kuo, Sizhuo Ma, Jian Wang

    Abstract: Generic Face Image Quality Assessment (GFIQA) evaluates the perceptual quality of facial images, which is crucial in improving image restoration algorithms and selecting high-quality face images for downstream tasks. We present a novel transformer-based method for GFIQA, which is aided by two unique mechanisms. First, a Dual-Set Degradation Representation Learning (DSL) mechanism uses facial image… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR 2024, Project Page: https://dsl-fiqa.github.io/

  40. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, Jinming Guo, Xiaolin Chen, Jingcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  41. arXiv:2406.09282  [pdf, other

    cs.CL cs.SD eess.AS

    On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models

    Authors: Jinchuan Tian, Yifan Peng, William Chen, Kwanghee Choi, Karen Livescu, Shinji Watanabe

    Abstract: The Open Whisper-style Speech Model (OWSM) series was introduced to achieve full transparency in building advanced speech-to-text (S2T) foundation models. To this end, OWSM models are trained on 25 public speech datasets, which are heterogeneous in multiple ways. In this study, we advance the OWSM series by introducing OWSM v3.2, which improves on prior models by investigating and addressing the i… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  42. arXiv:2406.08641  [pdf, ps, other

    cs.SD cs.CL eess.AS

    ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

    Authors: Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, Shinji Watanabe

    Abstract: ML-SUPERB evaluates self-supervised learning (SSL) models on the tasks of language identification and automatic speech recognition (ASR). This benchmark treats the models as feature extractors and uses a single shallow downstream model, which can be fine-tuned for a downstream task. However, real-world use cases may require different configurations. This paper presents ML-SUPERB~2.0, which is a ne… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  43. arXiv:2406.08081  [pdf

    eess.SP

    CLDTA: Contrastive Learning based on Diagonal Transformer Autoencoder for Cross-Dataset EEG Emotion Recognition

    Authors: Yuan Liao, Yuhong Zhang, Shenghuan Wang, Xiruo Zhang, Yiling Zhang, Wei Chen, Yuzhe Gu, Liya Huang

    Abstract: Recent advances in non-invasive EEG technology have broadened its application in emotion recognition, yielding a multitude of related datasets. Yet, deep learning models struggle to generalize across these datasets due to variations in acquisition equipment and emotional stimulus materials. To address the pressing need for a universal model that fluidly accommodates diverse EEG dataset formats and… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  44. arXiv:2406.07162  [pdf, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

    Authors: Ziyang Ma, Mingjie Chen, Hezhao Zhang, Zhisheng Zheng, Wenxi Chen, Xiquan Li, Jiaxin Ye, Xie Chen, Thomas Hain

    Abstract: Speech emotion recognition (SER) is an important part of human-computer interaction, receiving extensive attention from both industry and academia. However, the current research field of SER has long suffered from the following problems: 1) There are few reasonable and universal splits of the datasets, making comparing different models and methods difficult. 2) No commonly used benchmark covers nu… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024. GitHub Repository: https://github.com/emo-box/EmoBox

  45. arXiv:2406.06998  [pdf, other

    eess.SP

    Movable Antenna Enhanced NOMA Short-Packet Transmission

    Authors: Xinyuan He, Wen Chen, Qingqing Wu, Xusheng Zhu, Nan Cheng

    Abstract: This letter investigates a short-packet downlink transmission system using non-orthogonal multiple access (NOMA) enhanced via movable antenna (MA). We focuses on maximizing the effective throughput for a core user while ensuring reliable communication for an edge user by optimizing the MAs' coordinates and the power and rate allocations from the access point (AP). The optimization challenge is app… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, 4 figures

  46. arXiv:2406.03157  [pdf, other

    eess.SP cs.LG

    A Combination Model Based on Sequential General Variational Mode Decomposition Method for Time Series Prediction

    Authors: Wei Chen, Yuanyuan Yang, Jianyu Liu

    Abstract: Accurate prediction of financial time series is a key concern for market economy makers and investors. The article selects online store sales and Australian beer sales as representatives of non-stationary, trending, and seasonal financial time series, and constructs a new SGVMD-ARIMA combination model in a non-linear combination way to predict financial time series. The ARIMA model, LSTM model, an… ▽ More

    Submitted 7 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  47. arXiv:2406.03144  [pdf, other

    eess.SP cs.LG

    A Combination Model for Time Series Prediction using LSTM via Extracting Dynamic Features Based on Spatial Smoothing and Sequential General Variational Mode Decomposition

    Authors: Jianyu Liu, Wei Chen, Yong Zhang, Zhenfeng Chen, Bin Wan, Jinwei Hu

    Abstract: In order to solve the problems such as difficult to extract effective features and low accuracy of sales volume prediction caused by complex relationships such as market sales volume in time series prediction, we proposed a time series prediction method of market sales volume based on Sequential General VMD and spatial smoothing Long short-term memory neural network (SS-LSTM) combination model. Fi… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  48. arXiv:2406.01240  [pdf, other

    eess.IV

    Arctic Sea Ice Image Super-Resolution Based on Multi-Scale Convolution and Dual-Gating Mechanism

    Authors: Zhaomin Fang, Wankun Chen, Feng Gao, Yanhai Gan, Junyu Dong, Yang Zhou

    Abstract: Arctic Sea Ice Concentration (SIC) is the ratio of ice-covered area to the total sea area of the Arctic Ocean, which is a key indicator for maritime activities. Nowadays, we often use passive microwave images to display SIC, but it has low spatial resolution, and most of the existing super-resolution methods of Arctic SIC don't take the integration of spatial and channel features into account and… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE IGARSS 2024

  49. arXiv:2406.00899  [pdf, other

    cs.CL cs.SD eess.AS

    YODAS: Youtube-Oriented Dataset for Audio and Speech

    Authors: Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, Shinji Watanabe

    Abstract: In this study, we introduce YODAS (YouTube-Oriented Dataset for Audio and Speech), a large-scale, multilingual dataset comprising currently over 500k hours of speech data in more than 100 languages, sourced from both labeled and unlabeled YouTube speech datasets. The labeled subsets, including manual or automatic subtitles, facilitate supervised model training. Conversely, the unlabeled subsets ar… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: ASRU 2023

  50. arXiv:2405.19685  [pdf

    eess.IV

    Identifying Functional Brain Networks of Spatiotemporal Wide-Field Calcium Imaging Data via a Long Short-Term Memory Autoencoder

    Authors: Xiaohui Zhang, Eric C Landsness, Lindsey M Brier, Wei Chen, Michelle J. Tang, Hanyang Miao, Jin-Moo Lee, Mark A. Anastasio, Joseph P. Culver

    Abstract: Wide-field calcium imaging (WFCI) that records neural calcium dynamics allows for identification of functional brain networks (FBNs) in mice that express genetically encoded calcium indicators. Estimating FBNs from WFCI data is commonly achieved by use of seed-based correlation (SBC) analysis and independent component analysis (ICA). These two methods are conceptually distinct and each possesses l… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.