Zum Hauptinhalt springen

Showing 1–50 of 345 results for author: Xu, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.16707  [pdf, other

    cs.LG eess.SP

    Enhanced forecasting of stock prices based on variational mode decomposition, PatchTST, and adaptive scale-weighted layer

    Authors: Xiaorui Xue, Shaofang Li, Xiaonan Wang

    Abstract: The significant fluctuations in stock index prices in recent years highlight the critical need for accurate forecasting to guide investment and financial strategies. This study introduces a novel composite forecasting framework that integrates variational mode decomposition (VMD), PatchTST, and adaptive scale-weighted layer (ASWL) to address these challenges. Utilizing datasets of four major stock… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  2. arXiv:2408.15218  [pdf, other

    eess.IV cs.CV

    Histo-Diffusion: A Diffusion Super-Resolution Method for Digital Pathology with Comprehensive Quality Assessment

    Authors: Xuan Xu, Saarthak Kapse, Prateek Prasanna

    Abstract: Digital pathology has advanced significantly over the last decade, with Whole Slide Images (WSIs) encompassing vast amounts of data essential for accurate disease diagnosis. High-resolution WSIs are essential for precise diagnosis but technical limitations in scanning equipment and variablity in slide preparation can hinder obtaining these images. Super-resolution techniques can enhance low-resolu… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: We have submitted our paper to Medical Image Analysis and are currently awaiting feedback

  3. A systematic review: Deep learning-based methods for pneumonia region detection

    Authors: Xinmei Xu

    Abstract: Pneumonia disease is one of the leading causes of death among children and adults worldwide. In the last ten years, computer-aided pneumonia detection methods have been developed to improve the efficiency and accuracy of the diagnosis process. Among those methods, the effects of deep learning approaches surpassed that of other traditional machine learning methods. This review paper searched and ex… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 8 pages, 1 figure, published on Applied and Computational Engineering

    ACM Class: I.4.0; I.5.0

    Journal ref: ACE (2023) Vol. 22: 210-217

  4. arXiv:2408.11982  [pdf, other

    eess.IV cs.CV cs.MM

    AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

    Authors: Maksim Smirnov, Aleksandr Gushchin, Anastasia Antsiferova, Dmitry Vatolin, Radu Timofte, Ziheng Jia, Zicheng Zhang, Wei Sun, Jiaying Qian, Yuqin Cao, Yinan Sun, Yuxin Zhu, Xiongkuo Min, Guangtao Zhai, Kanjar De, Qing Luo, Ao-Xiang Zhang, Peng Zhang, Haibo Lei, Linyan Jiang, Yaqing Li, Wenhui Meng, Xiaoheng Tan, Haiqiang Wang, Xiaozhong Xu , et al. (11 additional authors not shown)

    Abstract: Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dat… ▽ More

    Submitted 28 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  5. arXiv:2408.08121  [pdf

    eess.SY

    Optimizing Highway Ramp Merge Safety and Efficiency via Spatio-Temporal Cooperative Control and Vehicle-Road Coordination

    Authors: Ting Peng, Xiaoxue Xu, Yuan Li, Jie Wu, Tao Li, Xiang Dong, Yincai Cai, Peng Wu

    Abstract: In view of existing automatic driving, it is difficult to accurately and timely obtain the status and driving intention of other vehicles. The safety risk and urgency of autonomous vehicles in the absence of collision are evaluated. To ensure safety and improve road efficiency, a method of pre-compiling the spatio-temporal trajectory of vehicles is established to eliminate conflicts between vehicl… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  6. arXiv:2408.07171  [pdf, other

    eess.IV cs.CV

    BVI-UGC: A Video Quality Database for User-Generated Content Transcoding

    Authors: Zihao Qi, Chen Feng, Fan Zhang, Xiaozhong Xu, Shan Liu, David Bull

    Abstract: In recent years, user-generated content (UGC) has become one of the major video types consumed via streaming networks. Numerous research contributions have focused on assessing its visual quality through subjective tests and objective modeling. In most cases, objective assessments are based on a no-reference scenario, where the corresponding reference content is assumed not to be available. Howeve… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 12 pages, 11 figures

  7. arXiv:2408.06164  [pdf, other

    eess.SP

    Prototyping and Experimental Results for ISAC-based Channel Knowledge Map

    Authors: Chaoyue Zhang, Zhiwen Zhou, Xiaoli Xu, Yong Zeng, Zaichen Zhang, Shi Jin

    Abstract: Channel knowledge map (CKM) is a novel approach for achieving environment-aware communication and sensing. This paper presents an integrated sensing and communication (ISAC)-based CKM prototype system, demonstrating the mutualistic relationship between ISAC and CKM. The system consists of an ISAC base station (BS), a user equipment (UE), and a server. By using a shared orthogonal frequency divisio… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  8. arXiv:2408.04541  [pdf, other

    eess.SY

    On the Asymptotic Convergence of Subgraph Generated Models

    Authors: Xinchen Xu, Francesca Parise

    Abstract: We study a family of random graph models - termed subgraph generated models (SUGMs) - initially developed by Chandrasekhar and Jackson in which higher-order structures are explicitly included in the network formation process. We use matrix concentration inequalities to show convergence of the adjacency matrix of networks realized from such SUGMs to the expected adjacency matrix as a function of th… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  9. arXiv:2408.00985  [pdf, other

    cs.LG eess.IV

    Reconstructing Richtmyer-Meshkov instabilities from noisy radiographs using low dimensional features and attention-based neural networks

    Authors: Daniel A. Serino, Marc L. Klasky, Balasubramanya T. Nadiga, Xiaojian Xu, Trevor Wilcox

    Abstract: A trained attention-based transformer network can robustly recover the complex topologies given by the Richtmyer-Meshkoff instability from a sequence of hydrodynamic features derived from radiographic images corrupted with blur, scatter, and noise. This approach is demonstrated on ICF-like double shell hydrodynamic simulations. The key component of this network is a transformer encoder that acts o… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  10. arXiv:2407.20607  [pdf, other

    eess.SP

    Efficient Channel Estimation for Millimeter Wave and Terahertz Systems Enabled by Integrated Super-resolution Sensing and Communication

    Authors: Jingran Xu, Huizhi Wang, Yong Zeng, Xiaoli Xu, Qingqing Wu, Fei Yang, Yan Chen, Abbas Jamalipour

    Abstract: Integrated super-resolution sensing and communication (ISSAC) has emerged as a promising technology to achieve extremely high precision sensing for those key parameters, such as the angles of the sensing targets. In this paper, we propose an efficient channel estimation scheme enabled by ISSAC for millimeter wave (mmWave) and TeraHertz (THz) systems with a hybrid analog/digital beamforming archite… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 13 pages, 8 figures

  11. arXiv:2407.20530  [pdf, other

    cs.SD eess.AS

    SuperCodec: A Neural Speech Codec with Selective Back-Projection Network

    Authors: Youqiang Zheng, Weiping Tu, Li Xiao, Xinmeng Xu

    Abstract: Neural speech coding is a rapidly developing topic, where state-of-the-art approaches now exhibit superior compression performance than conventional methods. Despite significant progress, existing methods still have limitations in preserving and reconstructing fine details for optimal reconstruction, especially at low bitrates. In this study, we introduce SuperCodec, a neural speech codec that ach… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by ICASSP 2024

  12. arXiv:2407.19902  [pdf, other

    cs.RO eess.SY math.OC

    A Differential Dynamic Programming Framework for Inverse Reinforcement Learning

    Authors: Kun Cao, Xinhang Xu, Wanxin Jin, Karl H. Johansson, Lihua Xie

    Abstract: A differential dynamic programming (DDP)-based framework for inverse reinforcement learning (IRL) is introduced to recover the parameters in the cost function, system dynamics, and constraints from demonstrations. Different from existing work, where DDP was used for the inner forward problem with inequality constraints, our proposed framework uses it for efficient computation of the gradient requi… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 20 pages, 15 figures; submitted to IEEE for potential publication

  13. arXiv:2407.19867  [pdf

    eess.SY

    Design and Testing for Steel Support Axial Force Servo System

    Authors: Sana Ullah, Yonghong Zhou, Maokai Lai, Xiang Dong, Tao Li, Xiaoxue Xu, Yuan Li, Ting Peng

    Abstract: Foundation excavations are deepening, expanding, and approaching structures. Steel supports measure and manage axial force. The study regulates steel support structure power during deep excavation using a novel axial force management system for safety, efficiency, and structural integrity. Closed-loop control changes actuator output to maintain axial force based on force. In deep excavation, the s… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 6 pages,7 figures, 1 table, 2 graph, conference paper

  14. arXiv:2407.16591  [pdf, other

    cs.RO eess.SY

    Real-Time Interactions Between Human Controllers and Remote Devices in Metaverse

    Authors: Kan Chen, Zhen Meng, Xiangmin Xu, Changyang She, Philip G. Zhao

    Abstract: Supporting real-time interactions between human controllers and remote devices remains a challenging goal in the Metaverse due to the stringent requirements on computing workload, communication throughput, and round-trip latency. In this paper, we establish a novel framework for real-time interactions through the virtual models in the Metaverse. Specifically, we jointly predict the motion of the h… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: This paper is accepted with minor revisions by IEEE MetroXRAINE 2024

  15. arXiv:2407.14355  [pdf, other

    cs.SD eess.AS

    Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models

    Authors: Xuenan Xu, Pingyue Zhang, Ming Yan, Ji Zhang, Mengyue Wu

    Abstract: Zero-shot audio classification aims to recognize and classify a sound class that the model has never seen during training. This paper presents a novel approach for zero-shot audio classification using automatically generated sound attribute descriptions. We propose a list of sound attributes and leverage large language model's domain knowledge to generate detailed attribute descriptions for each c… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Interspeech 2024

  16. arXiv:2407.14329  [pdf, other

    cs.SD eess.AS

    Efficient Audio Captioning with Encoder-Level Knowledge Distillation

    Authors: Xuenan Xu, Haohe Liu, Mengyue Wu, Wenwu Wang, Mark D. Plumbley

    Abstract: Significant improvement has been achieved in automated audio captioning (AAC) with recent models. However, these models have become increasingly large as their performance is enhanced. In this work, we propose a knowledge distillation (KD) framework for AAC. Our analysis shows that in the encoder-decoder based AAC models, it is more effective to distill knowledge into the encoder as compared with… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Interspeech 2024

  17. arXiv:2407.14140  [pdf, other

    eess.SP

    A Secure and Efficient Distributed Semantic Communication System for Heterogeneous Internet of Things Devices

    Authors: Weihao Zeng, Xinyu Xu, Qianyun Zhang, Jiting Shi, Zhijin Qin, Zhenyu Guan

    Abstract: Semantic communications have emerged as a promising solution to address the challenge of efficient communication in rapidly evolving and increasingly complex Internet of Things (IoT) networks. However, protecting the security of semantic communication systems within the distributed and heterogeneous IoT networks is critical issues that need to be addressed. We develop a secure and efficient distri… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  18. arXiv:2407.13198  [pdf, other

    cs.SD eess.AS

    DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation

    Authors: Baihan Li, Zeyu Xie, Xuenan Xu, Yiwei Guo, Ming Yan, Ji Zhang, Kai Yu, Mengyue Wu

    Abstract: Audio generation has attracted significant attention. Despite remarkable enhancement in audio quality, existing models overlook diversity evaluation. This is partially due to the lack of a systematic sound class diversity framework and a matching dataset. To address these issues, we propose DiveSound, a novel framework for constructing multimodal datasets with in-class diversified taxonomy, assist… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  19. arXiv:2407.10427  [pdf, other

    eess.IV cs.CV

    Transformer for Multitemporal Hyperspectral Image Unmixing

    Authors: Hang Li, Qiankun Dong, Xueshuo Xie, Xia Xu, Tao Li, Zhenwei Shi

    Abstract: Multitemporal hyperspectral image unmixing (MTHU) holds significant importance in monitoring and analyzing the dynamic changes of surface. However, compared to single-temporal unmixing, the multitemporal approach demands comprehensive consideration of information across different phases, rendering it a greater challenge. To address this challenge, we propose the Multitemporal Hyperspectral Image U… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  20. arXiv:2407.07554  [pdf, other

    cs.GR cs.SD eess.AS

    Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation

    Authors: Zikai Huang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Chenxi Zheng, Jing Qin, Shengfeng He

    Abstract: Dance, as an art form, fundamentally hinges on the precise synchronization with musical beats. However, achieving aesthetically pleasing dance sequences from music is challenging, with existing methods often falling short in controllability and beat alignment. To address these shortcomings, this paper introduces Beat-It, a novel framework for beat-specific, key pose-guided dance generation. Unlike… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  21. arXiv:2407.07245  [pdf, other

    eess.SY cs.NI eess.SP

    Accelerating Mobile Edge Generation (MEG) by Constrained Learning

    Authors: Xiaoxia Xu, Yuanwei Liu, Xidong Mu, Hong Xing, Arumugam Nallanathan

    Abstract: A novel accelerated mobile edge generation (MEG) framework is proposed for generating high-resolution images on mobile devices. Exploiting a large-scale latent diffusion model (LDM) distributed across edge server (ES) and user equipment (UE), cost-efficient artificial intelligence generated content (AIGC) is achieved by transmitting low-dimensional features between ES and UE. To reduce overheads o… ▽ More

    Submitted 6 August, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: 30 pages, 7 figures

  22. arXiv:2407.06524  [pdf, other

    cs.SD cs.MM eess.AS

    Improving Speech Enhancement by Integrating Inter-Channel and Band Features with Dual-branch Conformer

    Authors: Jizhen Li, Xinmeng Xu, Weiping Tu, Yuhong Yang, Rong Zhu

    Abstract: Recent speech enhancement methods based on convolutional neural networks (CNNs) and transformer have been demonstrated to efficaciously capture time-frequency (T-F) information on spectrogram. However, the correlation of each channels of speech features is failed to explore. Theoretically, each channel map of speech features obtained by different convolution kernels contains information with diffe… ▽ More

    Submitted 13 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  23. arXiv:2407.05873  [pdf, other

    eess.SP cs.IT

    Receiver Selection and Transmit Beamforming for Multi-static Integrated Sensing and Communications

    Authors: Dan Wang, Yuanming Tian, Chuan Huang, Hao Chen, Xiaodong Xu, Ping Zhang

    Abstract: Next-generation wireless networks are expected to develop a novel paradigm of integrated sensing and communications (ISAC) to enable both the high-accuracy sensing and high-speed communications. However, conventional mono-static ISAC systems, which simultaneously transmit and receive at the same equipment, may suffer from severe self-interference, and thus significantly degrade the system performa… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  24. arXiv:2407.03671  [pdf

    eess.SY

    Spatio-temporal cooperative control Method of Highway Ramp Merge Based on Vehicle-road Coordination

    Authors: Xiaoxue Xu, Maokai Lai, Haitao Zhang, Xiang Dong, Tao Li, Jie Wu, Yuan Li, Ting Peng

    Abstract: The merging area of highway ramps faces multiple challenges, including traffic congestion, collision risks, speed mismatches, driver behavior uncertainties, limited visibility, and bottleneck effects. However, autonomous vehicles engaging in depth coordination between vehicle and road in merging zones, by pre-planning and uploading travel trajectories, can significantly enhance the safety and effi… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  25. arXiv:2407.02869  [pdf, other

    cs.SD eess.AS

    PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

    Authors: Zeyu Xie, Xuenan Xu, Zhizheng Wu, Mengyue Wu

    Abstract: Recently, audio generation tasks have attracted considerable research interests. Precise temporal controllability is essential to integrate audio generation with real applications. In this work, we propose a temporal controlled audio generation framework, PicoAudio. PicoAudio integrates temporal information to guide audio generation through tailored model design. It leverages data crawling, segmen… ▽ More

    Submitted 17 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    MSC Class: 68Txx ACM Class: I.2

  26. arXiv:2407.02857  [pdf, other

    cs.SD eess.AS

    AudioTime: A Temporally-aligned Audio-text Benchmark Dataset

    Authors: Zeyu Xie, Xuenan Xu, Zhizheng Wu, Mengyue Wu

    Abstract: Recent advancements in audio generation have enabled the creation of high-fidelity audio clips from free-form textual descriptions. However, temporal relationships, a critical feature for audio content, are currently underrepresented in mainstream models, resulting in an imprecise temporal controllability. Specifically, users cannot accurately control the timestamps of sound events using free-form… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    MSC Class: 68Txx ACM Class: I.2

  27. arXiv:2407.02804  [pdf, other

    eess.SP eess.SY

    Mobile Edge Generation-Enabled Digital Twin: Architecture Design and Research Opportunities

    Authors: Xiaoxia Xu, Ruikang Zhong, Xidong Mu, Yuanwei Liu, Kaibin Huang

    Abstract: A novel paradigm of mobile edge generation (MEG)-enabled digital twin (DT) is proposed, which enables distributed on-device generation at mobile edge networks for real-time DT applications. First, an MEG-DT architecture is put forward to decentralize generative artificial intelligence (GAI) models onto edge servers (ESs) and user equipments (UEs), which has the advantages of low latency, privacy p… ▽ More

    Submitted 6 August, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: 7 pages, 6 figures

  28. arXiv:2406.19959  [pdf, other

    cs.SD eess.AS

    RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

    Authors: Bing Yang, Changsheng Quan, Yabo Wang, Pengyu Wang, Yujie Yang, Ying Fang, Nian Shao, Hui Bu, Xin Xu, Xiaofei Li

    Abstract: The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. However, the acoustic mismatch between simulated and real-world data could degrade the model performance when applying in real-world scenarios. To bridge t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  29. arXiv:2406.18840  [pdf

    eess.IV

    Shorter SPECT Scans Using Self-supervised Coordinate Learning to Synthesize Skipped Projection Views

    Authors: Zongyu Li, Yixuan Jia, Xiaojian Xu, Jason Hu, Jeffrey A. Fessler, Yuni K. Dewaraja

    Abstract: Purpose: This study addresses the challenge of extended SPECT imaging duration under low-count conditions, as encountered in Lu-177 SPECT imaging, by developing a self-supervised learning approach to synthesize skipped SPECT projection views, thus shortening scan times in clinical settings. Methods: We employed a self-supervised coordinate-based learning technique, adapting the neural radiance fie… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 25 pages, 5568 words

  30. arXiv:2406.18021  [pdf, other

    cs.SD cs.LG eess.AS

    SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR

    Authors: Shuaishuai Ye, Shunfei Chen, Xinhui Hu, Xinkang Xu

    Abstract: In this work, we propose a Switch-Conformer-based MoE system named SC-MoE for unified streaming and non-streaming code-switching (CS) automatic speech recognition (ASR), where we design a streaming MoE layer consisting of three language experts, which correspond to Mandarin, English, and blank, respectively, and equipped with a language identification (LID) network with a Connectionist Temporal Cl… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024; 5 pages, 2 figures

  31. arXiv:2406.09389  [pdf, other

    eess.IV cs.CV

    Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

    Authors: Baiang Li, Sizhuo Ma, Yanhong Zeng, Xiaogang Xu, Youqing Fang, Zhao Zhang, Jian Wang, Kai Chen

    Abstract: Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color mapping, which enhances the visual representation by expanding the image's color range and adjusting the brightness… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: https://sagiri0208.github.io

  32. arXiv:2406.09182  [pdf, ps, other

    eess.SP cs.LG

    Federated Contrastive Learning for Personalized Semantic Communication

    Authors: Yining Wang, Wanli Ni, Wenqiang Yi, Xiaodong Xu, Ping Zhang, Arumugam Nallanathan

    Abstract: In this letter, we design a federated contrastive learning (FedCL) framework aimed at supporting personalized semantic communication. Our FedCL enables collaborative training of local semantic encoders across multiple clients and a global semantic decoder owned by the base station. This framework supports heterogeneous semantic encoders since it does not require client-side model aggregation. Furt… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: IEEE Communications Letters

  33. arXiv:2406.08052  [pdf, other

    cs.SD eess.AS

    FakeSound: Deepfake General Audio Detection

    Authors: Zeyu Xie, Baihan Li, Xuenan Xu, Zheng Liang, Kai Yu, Mengyue Wu

    Abstract: With the advancement of audio generation, generative models can produce highly realistic audios. However, the proliferation of deepfake general audio can pose negative consequences. Therefore, we propose a new task, deepfake general audio detection, which aims to identify whether audio content is manipulated and to locate deepfake regions. Leveraging an automated manipulation pipeline, a dataset n… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

    MSC Class: 68Txx ACM Class: I.2

  34. arXiv:2406.07256  [pdf, ps, other

    cs.SD cs.AI eess.AS

    AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

    Authors: Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, Binbin Zhang, Jun Du, Jia Bin, Ming Li

    Abstract: The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the large… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  35. arXiv:2406.06295  [pdf, other

    cs.SD eess.AS

    Zero-Shot Audio Captioning Using Soft and Hard Prompts

    Authors: Yiming Zhang, Xuenan Xu, Ruoyi Du, Haohe Liu, Yuan Dong, Zheng-Hua Tan, Wenwu Wang, Zhanyu Ma

    Abstract: In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test sets from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these model… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing

  36. arXiv:2406.01922  [pdf, ps, other

    eess.SP cs.IT

    Performance Analysis of Hybrid Cellular and Cell-free MIMO Network

    Authors: Zhuoyin Dai, Jingran Xu, Xiaoli Xu, Ruoguang Li, Yong Zeng

    Abstract: Cell-free wireless communication is envisioned as one of the most promising network architectures, which can achieve stable and uniform communication performance while improving the system energy and spectrum efficiency. The deployment of cell-free networks is envisioned to be a longterm evolutionary process, in which cell-free access points (APs) will be gradually introduced into the communicatio… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  37. arXiv:2405.17167  [pdf

    eess.IV cs.CV

    Partitioned Hankel-based Diffusion Models for Few-shot Low-dose CT Reconstruction

    Authors: Wenhao Zhang, Bin Huang, Shuyue Chen, Xiaoling Xu, Weiwen Wu, Qiegen Liu

    Abstract: Low-dose computed tomography (LDCT) plays a vital role in clinical applications by mitigating radiation risks. Nevertheless, reducing radiation doses significantly degrades image quality. Concurrently, common deep learning methods demand extensive data, posing concerns about privacy, cost, and time constraints. Consequently, we propose a few-shot low-dose CT reconstruction method using Partitioned… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  38. arXiv:2405.17024  [pdf

    eess.SP

    Beware of Overestimated Decoding Performance Arising from Temporal Autocorrelations in Electroencephalogram Signals

    Authors: Xiran Xu, Bo Wang, Boda Xiao, Yadong Niu, Yiwen Wang, Xihong Wu, Jing Chen

    Abstract: Researchers have reported high decoding accuracy (>95%) using non-invasive Electroencephalogram (EEG) signals for brain-computer interface (BCI) decoding tasks like image decoding, emotion recognition, auditory spatial attention detection, etc. Since these EEG data were usually collected with well-designed paradigms in labs, the reliability and robustness of the corresponding decoding methods were… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  39. arXiv:2405.09353  [pdf, other

    eess.IV cs.CV

    Large coordinate kernel attention network for lightweight image super-resolution

    Authors: Fangwei Hao, Jiesheng Wu, Haotian Lu, Ji Du, Jing Xu, Xiaoxuan Xu

    Abstract: The multi-scale receptive field and large kernel attention (LKA) module have been shown to significantly improve performance in the lightweight image super-resolution task. However, existing lightweight super-resolution (SR) methods seldom pay attention to designing efficient building block with multi-scale receptive field for local modeling, and their LKA modules face a quadratic increase in comp… ▽ More

    Submitted 30 August, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: 13 pages

  40. arXiv:2405.05498  [pdf, other

    cs.SD eess.AS

    The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge

    Authors: Jingguang Tian, Shuaishuai Ye, Shunfei Chen, Yang Xiang, Zhaohui Yin, Xinhui Hu, Xinkang Xu

    Abstract: This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58\% compared to the official baseline on t… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  41. arXiv:2405.03854  [pdf, other

    eess.IV math.OC

    Provable Preconditioned Plug-and-Play Approach for Compressed Sensing MRI Reconstruction

    Authors: Tao Hong, Xiaojian Xu, Jason Hu, Jeffrey A. Fessler

    Abstract: Model-based methods play a key role in the reconstruction of compressed sensing (CS) MRI. Finding an effective prior to describe the statistical distribution of the image family of interest is crucial for model-based methods. Plug-and-play (PnP) is a general framework that uses denoising algorithms as the prior or regularizer. Recent work showed that PnP methods with denoisers based on pretrained… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 14 figures, 4 tables

  42. arXiv:2405.00833  [pdf, other

    eess.SP q-bio.GN

    Modelling the nanopore sequencing process with Helicase HMMs

    Authors: Xuechun Xu, Joakim Jaldén

    Abstract: Recent advancements in nanopore sequencing technology, particularly the R10 nanopore from Oxford Nanopore Technology, have necessitated the development of improved data processing methods to utilize their potential for more than 9-mer resolution fully. The processing of the ion currents predominantly utilizes neural network-based methods known for their high basecalling accuracy but face developme… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 8 pages, 7 figures and 1 table. Journal manuscript

  43. arXiv:2405.00233  [pdf, other

    cs.SD cs.AI cs.MM eess.AS eess.SP

    SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

    Authors: Haohe Liu, Xuenan Xu, Yi Yuan, Mengyue Wu, Wenwu Wang, Mark D. Plumbley

    Abstract: Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate at high bitrates or within narrow domains such as speech and lack the semantic clues required for efficient language modelling. Addressing these chal… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Demo and code: https://haoheliu.github.io/SemantiCodec/

  44. arXiv:2404.17806  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

    Authors: Yi Yuan, Zhuo Chen, Xubo Liu, Haohe Liu, Xuenan Xu, Dongya Jia, Yuanzhe Chen, Mark D. Plumbley, Wenwu Wang

    Abstract: Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal information within audio and text features, presenting substantial limitations for tasks such as audio retrieval and generation. To address this gap, we introd… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Preprint submitted to IEEE MLSP 2024

  45. arXiv:2404.14441  [pdf

    cs.CV cs.AI cs.LG eess.IV

    Optimizing Contrail Detection: A Deep Learning Approach with EfficientNet-b4 Encoding

    Authors: Qunwei Lin, Qian Leng, Zhicheng Ding, Chao Yan, Xiaonan Xu

    Abstract: In the pursuit of environmental sustainability, the aviation industry faces the challenge of minimizing its ecological footprint. Among the key solutions is contrail avoidance, targeting the linear ice-crystal clouds produced by aircraft exhaust. These contrails exacerbate global warming by trapping atmospheric heat, necessitating precise segmentation and comprehensive analysis of contrail images… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  46. arXiv:2404.13153  [pdf, other

    eess.IV cs.CV

    Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring

    Authors: Chengxu Liu, Xuan Wang, Xiangyu Xu, Ruhao Tian, Shuai Li, Xueming Qian, Ming-Hsuan Yang

    Abstract: Eliminating image blur produced by various kinds of motion has been a challenging problem. Dominant approaches rely heavily on model capacity to remove blurring by reconstructing residual from blurry observation in feature space. These practices not only prevent the capture of spatially variable motion in the real world but also ignore the tailored handling of various motions in image space. In th… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  47. arXiv:2404.12060  [pdf, other

    eess.SP

    Environment-aware UAV Communications: CKM Construction and Predictive Beamforming

    Authors: Shiqi Zeng, Xiaoli Xu, Yong Zeng

    Abstract: Predictive millimeter-wave (mmWave) beamforming is a promising technique to enable low-latency and high-rate ground-air communications for cellular-connected unmanned aerial vehicles (UAVs). However, the high vulnerability of mmWave to blockages poses practical challenges to the implementation of such a technology. In this paper, we tackle the challenges by proposing a channel knowledge map (CKM)-… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  48. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  49. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  50. arXiv:2404.10233  [pdf, ps, other

    eess.SP

    Little Pilot is Needed for Channel Estimation with Integrated Super-Resolution Sensing and Communication

    Authors: Jingran Xu, Huizhi Wang, Yong Zeng, Xiaoli Xu

    Abstract: Integrated super-resolution sensing and communication (ISSAC) is a promising technology to achieve extremely high sensing performance for critical parameters, such as the angles of the wireless channels. In this paper, we propose an ISSAC-based channel estimation method, which requires little or even no pilot, yet still achieves accurate channel state information (CSI) estimation. The key idea is… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 6 pages, 5 figures, accepted by IEEE WCNC 2024 workshops