Zum Hauptinhalt springen

Showing 1–50 of 330 results for author: Chen, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.15667  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Towards reliable respiratory disease diagnosis based on cough sounds and vision transformers

    Authors: Qian Wang, Zhaoyang Bu, Jiaxuan Mao, Wenyu Zhu, Jingya Zhao, Wei Du, Guochao Shi, Min Zhou, Si Chen, Jieming Qu

    Abstract: Recent advancements in deep learning techniques have sparked performance boosts in various real-world applications including disease diagnosis based on multi-modal medical data. Cough sound data-based respiratory disease (e.g., COVID-19 and Chronic Obstructive Pulmonary Disease) diagnosis has also attracted much attention. However, existing works usually utilise traditional machine learning or dee… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  2. arXiv:2408.14954  [pdf, other

    cs.NI eess.SP

    Stochastic Geometry Based Modelling and Analysis of Uplink Cooperative Satellite-Aerial-Terrestrial Networks for Nomadic Communications with Weak Satellite Coverage

    Authors: Wen-Yu Dong, Shaoshi Yang, Ping Zhang, Sheng Chen

    Abstract: Cooperative satellite-aerial-terrestrial networks (CSATNs), where unmanned aerial vehicles (UAVs) are utilized as nomadic aerial relays (A), are highly valuable for many important applications, such as post-disaster urban reconstruction. In this scenario, direct communication between terrestrial terminals (T) and satellites (S) is often unavailable due to poor propagation conditions for satellite… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 17 pages, 16 pages, 2 tables, accepted to appear on IEEE Journal on Selected Areas in Communications, Aug. 2024

  3. arXiv:2408.14027  [pdf, other

    eess.SP

    UAV-Enabled Integrated Sensing and Communication in Maritime Emergency Networks

    Authors: Bohan Li, Jiahao Liu, Yifeng Xiong, Junsheng Mu, Pei Xiao, Sheng Chen

    Abstract: With line-of-sight mode deployment and fast response, unmanned aerial vehicle (UAV), equipped with the cutting-edge integrated sensing and communication (ISAC) technique, is poised to deliver high-quality communication and sensing services in maritime emergency scenarios. In practice, however, the real-time transmission of ISAC signals at the UAV side cannot be realized unless the reliable wireles… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  4. arXiv:2408.12354  [pdf, other

    eess.AS cs.SD

    LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation

    Authors: Shihao Chen, Yu Gu, Jianwei Cui, Jie Zhang, Rilin Chen, Lirong Dai

    Abstract: Any-to-any singing voice conversion (SVC) aims to transfer a target singer's timbre to other songs using a short voice sample. However many diffusion model based any-to-any SVC methods, which have achieved impressive results, usually suffered from low efficiency caused by a mass of inference steps. In this paper, we propose LCM-SVC, a latent consistency distillation (LCD) based latent diffusion mo… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted to ISCSLP 2024. arXiv admin note: text overlap with arXiv:2406.05325

  5. arXiv:2408.09132  [pdf, other

    eess.SP

    RIS-based Over-the-air Diffractional Channel Coding

    Authors: Yingzhe Hui, Shuyi Chen, Yifan Qin, Weixiao Meng, Qiushi Zhang, Wei Jin

    Abstract: Reconfigurable Intelligent Surfaces (RIS) are programmable metasurfaces utilizing sub-wavelength meta-atoms and a controller for precise electromagnetic wave manipulation. This work introduces an innovative channel coding scheme, termed RIS-based diffractional channel coding (DCC), which capitalizes on diffraction between two RIS layers for signal-level encoding. Contrary to traditional methods, D… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 17 pages, 6 figures, accepted by IEEE

  6. arXiv:2408.09113  [pdf, other

    math.OC eess.SY

    Planning of Off-Grid Renewable Power to Ammonia Systems with Heterogeneous Flexibility: A Multistakeholder Equilibrium Perspective

    Authors: Yangjun Zeng, Yiwei Qiu, Jie Zhu, Shi Chen, Tianlei Zang, Buxiang Zhou, Ge He, Xu Ji

    Abstract: Off-grid renewable power to ammonia (ReP2A) systems present a promising pathway toward carbon neutrality in both the energy and chemical industries. However, due to chemical safety requirements, the limited flexibility of ammonia synthesis poses a challenge when attempting to align with the variable hydrogen flow produced from renewable power. This necessitates the optimal sizing of equipment capa… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  7. arXiv:2408.06109  [pdf

    eess.SP q-bio.QM

    Inferring directed spectral information flow between mixed-frequency time series

    Authors: Qiqi Xian, Zhe Sage Chen

    Abstract: Identifying directed spectral information flow between multivariate time series is important for many applications in finance, climate, geophysics and neuroscience. Spectral Granger causality (SGC) is a prediction-based measure characterizing directed information flow at specific oscillatory frequencies. However, traditional vector autoregressive (VAR) approaches are insufficient to assess SGC whe… ▽ More

    Submitted 17 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  8. arXiv:2407.18986  [pdf

    eess.SY

    TERIME: An improved RIME algorithm with enhanced exploration and exploitation for robust parameter extraction of photovoltaic models

    Authors: Shi-Shun Chen, Yu-Tong Jiang, Wen-Bin Chen, Xiao-Yang Li

    Abstract: Parameter extraction of photovoltaic (PV) models is crucial for the planning, optimization, and control of PV systems. Although some methods using meta-heuristic algorithms have been proposed to determine these parameters, the robustness of solutions obtained by these methods faces great challenges when the complexity of the PV model increases. The unstable results will affect the reliable operati… ▽ More

    Submitted 1 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  9. arXiv:2407.15870  [pdf

    eess.IV cs.CV cs.LG

    CIC: Circular Image Compression

    Authors: Honggui Li, Sinan Chen, Nahid Md Lokman Hossain, Maria Trocan, Beata Mikovicova, Muhammad Fahimullah, Dimitri Galayko, Mohamad Sawan

    Abstract: Learned image compression (LIC) is currently the cutting-edge method. However, the inherent difference between testing and training images of LIC results in performance degradation to some extent. Especially for out-of-sample, out-of-distribution, or out-of-domain testing images, the performance of LIC dramatically degraded. Classical LIC is a serial image compression (SIC) approach that utilizes… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  10. arXiv:2407.08551  [pdf, other

    cs.CL cs.SD eess.AS

    Autoregressive Speech Synthesis without Vector Quantization

    Authors: Lingwei Meng, Long Zhou, Shujie Liu, Sanyuan Chen, Bing Han, Shujie Hu, Yanqing Liu, Jinyu Li, Sheng Zhao, Xixin Wu, Helen Meng, Furu Wei

    Abstract: We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which are originally designed for audio compression and sacrifice fidelity compared to mel-spectrograms. Specifically, (i) instead of cross… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  11. arXiv:2407.06508  [pdf, other

    eess.IV cs.CV

    A Clinical Benchmark of Public Self-Supervised Pathology Foundation Models

    Authors: Gabriele Campanella, Shengjia Chen, Ruchika Verma, Jennifer Zeng, Aryeh Stock, Matt Croken, Brandon Veremis, Abdulkadir Elmas, Kuan-lin Huang, Ricky Kwan, Jane Houldsworth, Adam J. Schoenfeld, Chad Vanderbilt

    Abstract: The use of self-supervised learning (SSL) to train pathology foundation models has increased substantially in the past few years. Notably, several models trained on large quantities of clinical data have been made publicly available in recent months. This will significantly enhance scientific research in computational pathology and help bridge the gap between research and clinical deployment. With… ▽ More

    Submitted 11 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.07033

  12. arXiv:2407.02675  [pdf, other

    eess.IV cs.CV

    Depth-Aware Endoscopic Video Inpainting

    Authors: Francis Xiatian Zhang, Shuang Chen, Xianghua Xie, Hubert P. H. Shum

    Abstract: Video inpainting fills in corrupted video content with plausible replacements. While recent advances in endoscopic video inpainting have shown potential for enhancing the quality of endoscopic videos, they mainly repair 2D visual information without effectively preserving crucial 3D spatial details for clinical reference. Depth-aware inpainting methods attempt to preserve these details by incorpor… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI 2024

  13. arXiv:2407.02318  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023

    Authors: Yurui Huang, Yang Yang, Shou Chen, Xiangyu Wu, Qingguo Chen, Jianfeng Lu

    Abstract: In this paper, we propose a solution for improving the quality of temporal sound localization. We employ a multimodal fusion approach to combine visual and audio features. High-quality visual features are extracted using a state-of-the-art self-supervised pre-training network, resulting in efficient video feature representations. At the same time, audio features serve as complementary information… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  14. arXiv:2406.18021  [pdf, other

    cs.SD cs.LG eess.AS

    SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR

    Authors: Shuaishuai Ye, Shunfei Chen, Xinhui Hu, Xinkang Xu

    Abstract: In this work, we propose a Switch-Conformer-based MoE system named SC-MoE for unified streaming and non-streaming code-switching (CS) automatic speech recognition (ASR), where we design a streaming MoE layer consisting of three language experts, which correspond to Mandarin, English, and blank, respectively, and equipped with a language identification (LID) network with a Connectionist Temporal Cl… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024; 5 pages, 2 figures

  15. arXiv:2406.14052  [pdf, other

    eess.IV cs.CV

    Perspective+ Unet: Enhancing Segmentation with Bi-Path Fusion and Efficient Non-Local Attention for Superior Receptive Fields

    Authors: Jintong Hu, Siyan Chen, Zhiyi Pan, Sen Zeng, Wenming Yang

    Abstract: Precise segmentation of medical images is fundamental for extracting critical clinical information, which plays a pivotal role in enhancing the accuracy of diagnoses, formulating effective treatment plans, and improving patient outcomes. Although Convolutional Neural Networks (CNNs) and non-local attention methods have achieved notable success in medical image segmentation, they either struggle to… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 13 pages, 5 figures

  16. arXiv:2406.07855  [pdf, other

    cs.CL cs.SD eess.AS

    VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

    Authors: Bing Han, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Yanming Qian, Yanqing Liu, Sheng Zhao, Jinyu Li, Furu Wei

    Abstract: With the help of discrete neural audio codecs, large language models (LLM) have increasingly been recognized as a promising methodology for zero-shot Text-to-Speech (TTS) synthesis. However, sampling based decoding strategies bring astonishing diversity to generation, but also pose robustness issues such as typos, omissions and repetition. In addition, the high sampling rate of audio also brings h… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 15 pages, 5 figures

  17. arXiv:2406.05370  [pdf, other

    cs.CL cs.SD eess.AS

    VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

    Authors: Sanyuan Chen, Shujie Liu, Long Zhou, Yanqing Liu, Xu Tan, Jinyu Li, Sheng Zhao, Yao Qian, Furu Wei

    Abstract: This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, the new iteration introduces two significant enhancements: Repetition Aware Sampling refines the original nucleus sampling process by accounting for token repetition in… ▽ More

    Submitted 17 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: Demo posted

  18. arXiv:2406.05325  [pdf, other

    eess.AS cs.SD

    LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance

    Authors: Shihao Chen, Yu Gu, Jie Zhang, Na Li, Rilin Chen, Liping Chen, Lirong Dai

    Abstract: Any-to-any singing voice conversion (SVC) is an interesting audio editing technique, aiming to convert the singing voice of one singer into that of another, given only a few seconds of singing data. However, during the conversion process, the issue of timbre leakage is inevitable: the converted singing voice still sounds like the original singer's voice. To tackle this, we propose a latent diffusi… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  19. arXiv:2406.03438  [pdf, other

    cs.IT eess.SP

    CSI-GPT: Integrating Generative Pre-Trained Transformer with Federated-Tuning to Acquire Downlink Massive MIMO Channels

    Authors: Ye Zeng, Li Qiao, Zhen Gao, Tong Qin, Zhonghuai Wu, Sheng Chen, Mohsen Guizani

    Abstract: In massive multiple-input multiple-output (MIMO) systems, how to reliably acquire downlink channel state information (CSI) with low overhead is challenging. In this work, by integrating the generative pre-trained Transformer (GPT) with federated-tuning, we propose a CSI-GPT approach to realize efficient downlink CSI acquisition. Specifically, we first propose a Swin Transformer-based channel acqui… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  20. arXiv:2406.02925  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition

    Authors: Hsuan Su, Hua Farn, Fan-Yun Sun, Shang-Tse Chen, Hung-yi Lee

    Abstract: Synthetic data is widely used in speech recognition due to the availability of text-to-speech models, which facilitate adapting models to previously unseen text domains. However, existing methods suffer in performance when they fine-tune an automatic speech recognition (ASR) model on synthetic data as they suffer from the distributional shift commonly referred to as the synthetic-to-real gap. In t… ▽ More

    Submitted 15 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  21. arXiv:2405.20595  [pdf, other

    eess.SP

    Multi-Beam Integrated Sensing and Communication: State-of-the-Art, Challenges and Opportunities

    Authors: Yinxiao Zhuo, Tianqi Mao, Haojin Li, Chen Sun, Zhaocheng Wang, Zhu Han, Sheng Chen

    Abstract: Integrated sensing and communication (ISAC) has been envisioned as a critical enabling technology for the next-generation wireless communication, which can realize location/motion detection of surroundings with communication devices. This additional sensing capability leads to a substantial network quality gain and expansion of the service scenarios. As the system evolves to millimeter wave (mmWav… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  22. arXiv:2405.19889  [pdf, other

    eess.SP cs.IT cs.LG cs.MM

    Deep Joint Semantic Coding and Beamforming for Near-Space Airship-Borne Massive MIMO Network

    Authors: Minghui Wu, Zhen Gao, Zhaocheng Wang, Dusit Niyato, George K. Karagiannidis, Sheng Chen

    Abstract: Near-space airship-borne communication network is recognized to be an indispensable component of the future integrated ground-air-space network thanks to airships' advantage of long-term residency at stratospheric altitudes, but it urgently needs reliable and efficient Airship-to-X link. To improve the transmission efficiency and capacity, this paper proposes to integrate semantic communication wi… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Major Revision by IEEE JSAC

  23. arXiv:2405.17167  [pdf

    eess.IV cs.CV

    Partitioned Hankel-based Diffusion Models for Few-shot Low-dose CT Reconstruction

    Authors: Wenhao Zhang, Bin Huang, Shuyue Chen, Xiaoling Xu, Weiwen Wu, Qiegen Liu

    Abstract: Low-dose computed tomography (LDCT) plays a vital role in clinical applications by mitigating radiation risks. Nevertheless, reducing radiation doses significantly degrades image quality. Concurrently, common deep learning methods demand extensive data, posing concerns about privacy, cost, and time constraints. Consequently, we propose a few-shot low-dose CT reconstruction method using Partitioned… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  24. arXiv:2405.16889  [pdf

    eess.SP

    Extraction of In-Phase and Quadrature Components by Time-Encoding Sampling

    Authors: Y. H. Shao, S. Y. Chen, H. Z. Yang, F. Xi, H. Hong, Z. Liu

    Abstract: Time encoding machine (TEM) is a biologically-inspired scheme to perform signal sampling using timing. In this paper, we study its application to the sampling of bandpass signals. We propose an integrate-and-fire TEM scheme by which the in-phase (I) and quadrature (Q) components are extracted through reconstruction. We design the TEM according to the signal bandwidth and amplitude instead of upper… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 30 pages, 8 figures

  25. arXiv:2405.13661  [pdf, ps, other

    cs.SD eess.AS

    Timbre Perception, Representation, and its Neuroscientific Exploration: A Comprehensive Review

    Authors: Hong Zhang, Jie Lin, Shengxuan Chen

    Abstract: Timbre, the sound's unique "color", is fundamental to how we perceive and appreciate music. This review explores the multifaceted world of timbre perception and representation. It begins by tracing the word's origin, offering an intuitive grasp of the concept. Building upon this foundation, the article delves into the complexities of defining and measuring timbre. It then explores the concept and… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  26. Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System

    Authors: Vimal Manohar, Szu-Jui Chen, Zhiqi Wang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur

    Abstract: This paper summarizes our acoustic modeling efforts in the Johns Hopkins University speech recognition system for the CHiME-5 challenge to recognize highly-overlapped dinner party speech recorded by multiple microphone arrays. We explore data augmentation approaches, neural network architectures, front-end speech dereverberation, beamforming and robust i-vector extraction with comparisons of our i… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Published in: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    Journal ref: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, pp. 6665-6669

  27. arXiv:2405.09787  [pdf, other

    eess.IV cs.CV cs.LG

    Analysis of the BraTS 2023 Intracranial Meningioma Segmentation Challenge

    Authors: Dominic LaBella, Ujjwal Baid, Omaditya Khanna, Shan McBurney-Lin, Ryan McLean, Pierre Nedelec, Arif Rashid, Nourel Hoda Tahon, Talissa Altes, Radhika Bhalerao, Yaseen Dhemesh, Devon Godfrey, Fathi Hilal, Scott Floyd, Anastasia Janas, Anahita Fathi Kazerooni, John Kirkpatrick, Collin Kent, Florian Kofler, Kevin Leu, Nazanin Maleki, Bjoern Menze, Maxence Pajot, Zachary J. Reitman, Jeffrey D. Rudie , et al. (96 additional authors not shown)

    Abstract: We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 16 pages, 11 tables, 10 figures, MICCAI

  28. arXiv:2405.07717  [pdf, other

    eess.IV

    On the Adversarial Robustness of Learning-based Image Compression Against Rate-Distortion Attacks

    Authors: Chenhao Wu, Qingbo Wu, Haoran Wei, Shuai Chen, Lei Wang, King Ngi Ngan, Fanman Meng, Hongliang Li

    Abstract: Despite demonstrating superior rate-distortion (RD) performance, learning-based image compression (LIC) algorithms have been found to be vulnerable to malicious perturbations in recent studies. However, the adversarial attacks considered in existing literature remain divergent from real-world scenarios, both in terms of the attack direction and bitrate. Additionally, existing methods focus solely… ▽ More

    Submitted 4 July, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  29. arXiv:2405.05498  [pdf, other

    cs.SD eess.AS

    The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge

    Authors: Jingguang Tian, Shuaishuai Ye, Shunfei Chen, Yang Xiang, Zhaohui Yin, Xinhui Hu, Xinkang Xu

    Abstract: This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58\% compared to the official baseline on t… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  30. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  31. arXiv:2405.04253  [pdf

    eess.SP

    Fermat Number Transform Based Chromatic Dispersion Compensation and Adaptive Equalization Algorithm

    Authors: Siyu Chen, Zheli Liu, Weihao Li, Zihe Hu, Mingming Zhang, Sheng Cui, Ming Tang

    Abstract: By introducing the Fermat number transform into chromatic dispersion compensation and adaptive equalization, the computational complexity has been reduced by 68% compared with the con?ventional implementation. Experimental results validate its transmission performance with only 0.8 dB receiver sensitivity penalty in a 75 km-40 GBaud-PDM-16QAM system.

    Submitted 7 May, 2024; originally announced May 2024.

  32. arXiv:2405.02784  [pdf, other

    eess.IV cs.CV

    MR-Transformer: Vision Transformer for Total Knee Replacement Prediction Using Magnetic Resonance Imaging

    Authors: Chaojie Zhang, Shengjia Chen, Ozkan Cigdem, Haresh Rengaraj Rajamohan, Kyunghyun Cho, Richard Kijowski, Cem M. Deniz

    Abstract: A transformer-based deep learning model, MR-Transformer, was developed for total knee replacement (TKR) prediction using magnetic resonance imaging (MRI). The model incorporates the ImageNet pre-training and captures three-dimensional (3D) spatial correlation from the MR images. The performance of the proposed model was compared to existing state-of-the-art deep learning models for knee injury dia… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  33. arXiv:2405.00069  [pdf, other

    eess.IV

    Estimation of Time-to-Total Knee Replacement Surgery

    Authors: Ozkan Cigdem, Shengjia Chen, Chaojie Zhang, Kyunghyun Cho, Richard Kijowski, Cem M. Deniz

    Abstract: A survival analysis model for predicting time-to-total knee replacement (TKR) was developed using features from medical images and clinical measurements. Supervised and self-supervised deep learning approaches were utilized to extract features from radiographs and magnetic resonance images. Extracted features were combined with clinical and image assessments for survival analysis using random surv… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

    Comments: 11 pages, 3 figures,4 tables, submitted to a conference

  34. arXiv:2404.17138  [pdf, other

    eess.SP

    Sub-6GHz Assisted mmWave Hybrid Beamforming with Heterogeneous Graph Neural Network

    Authors: Zhaohui Huang, Zhaocheng Wang, Sheng Chen

    Abstract: In next-generation communications, sub-6GHz and millimeter-wave (mmWave) links typically coexist, with the sub-6GHz link always active and the mmWave link active when high-rate transmission is required. Due to the spatial similarities between sub-6GHz and mmWave channels, sub-6GHz channel information can be utilized to support hybrid beamforming in mmWave communications to reduce overhead costs. W… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: This paper has been submitted to IEEE Transactions on Communications (IEEE TCOM)

  35. arXiv:2404.13603  [pdf, other

    cs.IT eess.SP

    Beyond MMSE: Rank-1 Subspace Channel Estimator for Massive MIMO Systems

    Authors: Bin Li, Ziping Wei, Shaoshi Yang, Yang Zhang, Jun Zhang, Chenglin Zhao, Sheng Chen

    Abstract: To glean the benefits offered by massive multi-input multi-output (MIMO) systems, channel state information must be accurately acquired. Despite the high accuracy, the computational complexity of classical linear minimum mean squared error (MMSE) estimator becomes prohibitively high in the context of massive MIMO, while the other low-complexity methods degrade the estimation accuracy seriously. In… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 15 pages, 12 figures, accepted to appear on IEEE Transactions on Communications, Apr. 2024

  36. arXiv:2404.04879  [pdf, other

    cs.RO eess.SY

    Multi-Type Map Construction via Semantics-Aware Autonomous Exploration in Unknown Indoor Environments

    Authors: Jianfang Mao, Yuheng Xie, Si Chen, Zhixiong Nan, Xiao Wang

    Abstract: This paper proposes a novel semantics-aware autonomous exploration model to handle the long-standing issue: the mainstream RRT (Rapid-exploration Random Tree) based exploration models usually make the mobile robot switch frequently between different regions, leading to the excessively-repeated explorations for the same region. Our proposed semantics-aware model encourages a mobile robot to fully e… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  37. arXiv:2404.00656  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    WavLLM: Towards Robust and Adaptive Speech Large Language Model

    Authors: Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Hongkun Hao, Jing Pan, Xunying Liu, Jinyu Li, Sunit Sivasankaran, Linquan Liu, Furu Wei

    Abstract: The recent advancements in large language models (LLMs) have revolutionized the field of natural language processing, progressively broadening their scope to multimodal perception and generation. However, effectively integrating listening capabilities into LLMs poses significant challenges, particularly with respect to generalizing across varied contexts and executing complex auditory tasks. In th… ▽ More

    Submitted 14 August, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

  38. arXiv:2403.12813  [pdf, other

    cs.IT eess.SP

    Knowledge and Data Dual-Driven Channel Estimation and Feedback for Ultra-Massive MIMO Systems under Hybrid Field Beam Squint Effect

    Authors: Kuiyu Wang, Zhen Gao, Sheng Chen, Boyu Ning, Gaojie Chen, Yu Su, Zhaocheng Wang, H. Vincent Poor

    Abstract: Acquiring accurate channel state information (CSI) at an access point (AP) is challenging for wideband millimeter wave (mmWave) ultra-massive multiple-input and multiple-output (UMMIMO) systems, due to the high-dimensional channel matrices, hybrid near- and far- field channel feature, beam squint effects, and imperfect hardware constraints, such as low-resolution analog-to-digital converters, and… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 17 pages, 22 figures, 3 tables

  39. arXiv:2403.12521  [pdf

    eess.SY

    Multi-mode Fault Diagnosis Datasets of Gearbox Under Variable Working Conditions

    Authors: Shijin Chen, Zeyi Liu, Xiao He, Dongliang Zou, Donghua Zhou

    Abstract: The gearbox is a critical component of electromechanical systems. The occurrence of multiple faults can significantly impact system accuracy and service life. The vibration signal of the gearbox is an effective indicator of its operational status and fault information. However, gearboxes in real industrial settings often operate under variable working conditions, such as varying speeds and loads.… ▽ More

    Submitted 8 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 10 pages, 12 figures

  40. A GNN Approach for Cell-Free Massive MIMO

    Authors: Lou Salaun, Hong Yang, Shashwat Mishra, Chung Shue Chen

    Abstract: Beyond 5G wireless technology Cell-Free Massive MIMO (CFmMIMO) downlink relies on carefully designed precoders and power control to attain uniformly high rate coverage. Many such power control problems can be calculated via second order cone programming (SOCP). In practice, several order of magnitude faster numerical procedure is required because power control has to be rapidly updated to adapt to… ▽ More

    Submitted 8 February, 2024; originally announced March 2024.

    Journal ref: GLOBECOM 2022 - 2022 IEEE Global Communications Conference, Dec 2022, Rio de Janeiro, France. pp.3053-3058

  41. arXiv:2403.09188  [pdf

    cs.LG eess.SP

    Design of an basis-projected layer for sparse datasets in deep learning training using gc-ms spectra as a case study

    Authors: Yu Tang Chang, Shih Fang Chen

    Abstract: Deep learning (DL) models encompass millions or even billions of parameters and learn complex patterns from big data. However, not all data are initially stored in a suitable formation to effectively train a DL model, e.g., gas chromatography-mass spectrometry (GC-MS) spectra and DNA sequence. These datasets commonly contain many zero values, and the sparse data formation causes difficulties in op… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 5 pages, 2 figures, 2 tables, conference

    MSC Class: 68-06 ACM Class: I.2.4; J.2

  42. arXiv:2403.08337  [pdf, other

    eess.SY cs.AI cs.LG

    LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments

    Authors: Maonan Wang, Aoyu Pang, Yuheng Kan, Man-On Pun, Chung Shue Chen, Bo Huang

    Abstract: Traffic congestion in metropolitan areas presents a formidable challenge with far-reaching economic, environmental, and societal ramifications. Therefore, effective congestion management is imperative, with traffic signal control (TSC) systems being pivotal in this endeavor. Conventional TSC systems, designed upon rule-based algorithms or reinforcement learning (RL), frequently exhibit deficiencie… ▽ More

    Submitted 12 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: 20 pages, 11 figures

  43. arXiv:2403.07308  [pdf, other

    cs.LG cs.AI eess.SY

    Verification-Aided Learning of Neural Network Barrier Functions with Termination Guarantees

    Authors: Shaoru Chen, Lekan Molu, Mahyar Fazlyab

    Abstract: Barrier functions are a general framework for establishing a safety guarantee for a system. However, there is no general method for finding these functions. To address this shortcoming, recent approaches use self-supervised learning techniques to learn these functions using training data that are periodically generated by a verification procedure, leading to a verification-aided learning framework… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: This is an online extended version of the same paper accepted to American Control Conference 2024

  44. arXiv:2403.04245  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

    Authors: Yusheng Dai, Hang Chen, Jun Du, Ruoyu Wang, Shihao Chen, Jiefeng Ma, Haotian Wang, Chin-Hui Lee

    Abstract: Advanced Audio-Visual Speech Recognition (AVSR) systems have been observed to be sensitive to missing video frames, performing even worse than single-modality models. While applying the dropout technique to the video modality enhances robustness to missing frames, it simultaneously results in a performance loss when dealing with complete data input. In this paper, we investigate this contrasting p… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: the paper is accepted by CVPR2024

  45. arXiv:2402.10055  [pdf

    eess.IV cs.AI cs.CV

    Robust semi-automatic vessel tracing in the human retinal image by an instance segmentation neural network

    Authors: Siyi Chen, Amir H. Kashani, Ji Yi

    Abstract: The morphology and hierarchy of the vascular systems are essential for perfusion in supporting metabolism. In human retina, one of the most energy-demanding organs, retinal circulation nourishes the entire inner retina by an intricate vasculature emerging and remerging at the optic nerve head (ONH). Thus, tracing the vascular branching from ONH through the vascular tree can illustrate vascular hie… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  46. arXiv:2402.09747  [pdf, other

    eess.IV cs.CV cs.LG

    Less is more: Ensemble Learning for Retinal Disease Recognition Under Limited Resources

    Authors: Jiahao Wang, Hong Peng, Shengchao Chen, Sufen Ren

    Abstract: Retinal optical coherence tomography (OCT) images provide crucial insights into the health of the posterior ocular segment. Therefore, the advancement of automated image analysis methods is imperative to equip clinicians and researchers with quantitative data, thereby facilitating informed decision-making. The application of deep learning (DL)-based approaches has gained extensive traction for exe… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Ongoing work

  47. arXiv:2402.09434  [pdf, other

    eess.SP cs.LG

    Disentangling Imperfect: A Wavelet-Infused Multilevel Heterogeneous Network for Human Activity Recognition in Flawed Wearable Sensor Data

    Authors: Mengna Liu, Dong Xiang, Xu Cheng, Xiufeng Liu, Dalin Zhang, Shengyong Chen, Christian S. Jensen

    Abstract: The popularity and diffusion of wearable devices provides new opportunities for sensor-based human activity recognition that leverages deep learning-based algorithms. Although impressive advances have been made, two major challenges remain. First, sensor data is often incomplete or noisy due to sensor placement and other issues as well as data transmission failure, calling for imputation of missin… ▽ More

    Submitted 26 January, 2024; originally announced February 2024.

    Comments: 14 pages, 7 figures

  48. arXiv:2402.05569  [pdf, other

    cs.LG cs.AI eess.SP stat.ML

    Simplifying Hypergraph Neural Networks

    Authors: Bohan Tang, Zexi Liu, Keyue Jiang, Siheng Chen, Xiaowen Dong

    Abstract: Hypergraphs are crucial for modeling higher-order interactions in real-world data. Hypergraph neural networks (HNNs) effectively utilise these structures by message passing to generate informative node features for various downstream tasks like node classification. However, the message passing block in existing HNNs typically requires a computationally intensive training process, which limits thei… ▽ More

    Submitted 22 May, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  49. arXiv:2402.03048  [pdf, other

    cs.MA cs.LG eess.SY

    Cooperative Learning with Gaussian Processes for Euler-Lagrange Systems Tracking Control under Switching Topologies

    Authors: Zewen Yang, Songbo Dong, Armin Lederer, Xiaobing Dai, Siyu Chen, Stefan Sosnowski, Georges Hattab, Sandra Hirche

    Abstract: This work presents an innovative learning-based approach to tackle the tracking control problem of Euler-Lagrange multi-agent systems with partially unknown dynamics operating under switching communication topologies. The approach leverages a correlation-aware cooperative algorithm framework built upon Gaussian process regression, which adeptly captures inter-agent correlations for uncertainty pre… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 8 pages

  50. arXiv:2402.02950  [pdf, other

    cs.CR eess.SP

    Semantic Entropy Can Simultaneously Benefit Transmission Efficiency and Channel Security of Wireless Semantic Communications

    Authors: Yankai Rong, Guoshun Nan, Minwei Zhang, Sihan Chen, Songtao Wang, Xuefei Zhang, Nan Ma, Shixun Gong, Zhaohui Yang, Qimei Cui, Xiaofeng Tao, Tony Q. S. Quek

    Abstract: Recently proliferated deep learning-based semantic communications (DLSC) focus on how transmitted symbols efficiently convey a desired meaning to the destination. However, the sensitivity of neural models and the openness of wireless channels cause the DLSC system to be extremely fragile to various malicious attacks. This inspires us to ask a question: "Can we further exploit the advantages of tra… ▽ More

    Submitted 6 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: 13 pages, 12 figures