Zum Hauptinhalt springen

Showing 1–50 of 122 results for author: Xie, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.10853  [pdf, other

    cs.SD cs.AI eess.AS

    Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?

    Authors: Yuankun Xie, Chenxu Xiong, Xiaopeng Wang, Zhiyong Wang, Yi Lu, Xin Qi, Ruibo Fu, Yukun Liu, Zhengqi Wen, Jianhua Tao, Guanjun Li, Long Ye

    Abstract: Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society. Consequently, effective audio deepfake detection technologies to detect ALM-based a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2408.10852  [pdf, other

    cs.SD eess.AS

    EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

    Authors: Xin Qi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Shuchen Shi, Yi Lu, Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Yukun Liu, Guanjun Li, Xuefei Liu, Yongwei Li

    Abstract: In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  3. arXiv:2408.10849  [pdf, other

    cs.SD eess.AS

    A Noval Feature via Color Quantisation for Fake Audio Detection

    Authors: Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Yukun Liu, Guanjun Li, Xin Qi, Yi Lu, Xuefei Liu, Yongwei Li

    Abstract: In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features, such as wav2vec2.0 and Masked Auto Encoder. These methods have proven that using real audio for reconstruction pre-training can better help the model… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: accepted by ISCSLP2024

  4. arXiv:2408.06922  [pdf, other

    cs.SD cs.AI eess.AS

    Temporal Variability and Multi-Viewed Self-Supervised Representations to Tackle the ASVspoof5 Deepfake Challenge

    Authors: Yuankun Xie, Xiaopeng Wang, Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Haonan Cheng, Long Ye

    Abstract: ASVspoof5, the fifth edition of the ASVspoof series, is one of the largest global audio security challenges. It aims to advance the development of countermeasure (CM) to discriminate bonafide and spoofed speech utterances. In this paper, we focus on addressing the problem of open-domain audio deepfake detection, which corresponds directly to the ASVspoof5 Track1 open condition. At first, we compre… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  5. arXiv:2407.10976  [pdf, other

    cs.NI cs.LG eess.SP stat.AP

    Learning Cellular Network Connection Quality with Conformal

    Authors: Hanyang Jiang, Elizabeth Belding, Ellen Zegure, Yao Xie

    Abstract: In this paper, we address the problem of uncertainty quantification for cellular network speed. It is a well-known fact that the actual internet speed experienced by a mobile phone can fluctuate significantly, even when remaining in a single location. This high degree of variability underscores that mere point estimation of network speed is insufficient. Rather, it is advantageous to establish a p… ▽ More

    Submitted 4 June, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.05641

  6. arXiv:2406.17801  [pdf, other

    cs.SD cs.CL eess.AS

    A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge

    Authors: Xiaopeng Wang, Yi Lu, Xin Qi, Zhiyong Wang, Yuankun Xie, Shuchen Shi, Ruibo Fu

    Abstract: This paper presents the development of a speech synthesis system for the LIMMITS'24 Challenge, focusing primarily on Track 2. The objective of the challenge is to establish a multi-speaker, multi-lingual Indic Text-to-Speech system with voice cloning capabilities, covering seven Indian languages with both male and female speakers. The system was trained using challenge data and fine-tuned for few-… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  7. arXiv:2406.08112  [pdf, other

    cs.SD cs.AI eess.AS

    Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio

    Authors: Yi Lu, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Zhiyong Wang, Xin Qi, Xuefei Liu, Yongwei Li, Yukun Liu, Xiaopeng Wang, Shuchen Shi

    Abstract: With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024. arXiv admin note: substantial text overlap with arXiv:2405.04880

  8. arXiv:2406.03247  [pdf, other

    cs.SD eess.AS

    Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection

    Authors: Xiaopeng Wang, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Yuankun Xie, Yukun Liu, Jianhua Tao, Xuefei Liu, Yongwei Li, Xin Qi, Yi Lu, Shuchen Shi

    Abstract: The generalization of Fake Audio Detection (FAD) is critical due to the emergence of new spoofing techniques. Traditional FAD methods often focus solely on distinguishing between genuine and known spoofed audio. We propose a Genuine-Focused Learning (GFL) framework guided, aiming for highly generalized FAD, called GFL-FAD. This method incorporates a Counterfactual Reasoning Enhanced Representation… ▽ More

    Submitted 9 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  9. arXiv:2406.03240  [pdf, other

    cs.SD cs.AI eess.AS

    Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy

    Authors: Yuankun Xie, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Xiaopeng Wang, Haonnan Cheng, Long Ye, Jianhua Tao

    Abstract: With the proliferation of deepfake audio, there is an urgent need to investigate their attribution. Current source tracing methods can effectively distinguish in-distribution (ID) categories. However, the rapid evolution of deepfake algorithms poses a critical challenge in the accurate identification of out-of-distribution (OOD) novel deepfake algorithms. In this paper, we propose Real Emphasis an… ▽ More

    Submitted 8 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  10. arXiv:2406.03237  [pdf, other

    cs.SD eess.AS

    Generalized Fake Audio Detection via Deep Stable Learning

    Authors: Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Yuankun Xie, Yukun Liu, Xiaopeng Wang, Xuefei Liu, Yongwei Li, Jianhua Tao, Yi Lu, Xin Qi, Shuchen Shi

    Abstract: Although current fake audio detection approaches have achieved remarkable success on specific datasets, they often fail when evaluated with datasets from different distributions. Previous studies typically address distribution shift by focusing on using extra data or applying extra loss restrictions during training. However, these methods either require a substantial amount of data or complicate t… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: accepted by INTERSPEECH2024

  11. arXiv:2405.07547  [pdf, other

    cs.IT eess.SP

    Channel Coding Toward 6G: Technical Overview and Outlook

    Authors: Mohammad Rowshan, Min Qiu, Yixuan Xie, Xinyi Gu, Jinhong Yuan

    Abstract: Channel coding plays a pivotal role in ensuring reliable communication over wireless channels. With the growing need for ultra-reliable communication in emerging wireless use cases, the significance of channel coding has amplified. Furthermore, minimizing decoding latency is crucial for critical-mission applications, while optimizing energy efficiency is paramount for mobile and the Internet of Th… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 102 pages, 87 figures, IEEE Open Journal of the Communications Society (invited paper)

  12. arXiv:2405.04880  [pdf, other

    cs.SD cs.AI eess.AS

    The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

    Authors: Yuankun Xie, Yi Lu, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Jianhua Tao, Xin Qi, Xiaopeng Wang, Yukun Liu, Haonan Cheng, Long Ye, Yi Sun

    Abstract: With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods. ALM-based deepfake audio currently exhibits widespread, high deception, and type versatility, posing a significant challenge to current audio deepfake detection (ADD) models trained solely on vocoded data. To effectively detect ALM-based deepfake audio, we focus on… ▽ More

    Submitted 15 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  13. arXiv:2404.19096  [pdf, other

    eess.SY

    Data-Driven Min-Max MPC for Linear Systems: Robustness and Adaptation

    Authors: Yifan Xie, Julian Berberich, Frank Allgöwer

    Abstract: Data-driven controllers design is an important research problem, in particular when data is corrupted by the noise. In this paper, we propose a data-driven min-max model predictive control (MPC) scheme using noisy input-state data for unknown linear time-invariant (LTI) system. The unknown system matrices are characterized by a set-membership representation using the noisy input-state data. Levera… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2309.17307

  14. arXiv:2404.13372  [pdf, other

    eess.IV cs.CV

    HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression

    Authors: Lei Lu, Yanyue Xie, Wei Jiang, Wei Wang, Xue Lin, Yanzhi Wang

    Abstract: This paper investigates the challenging problem of learned image compression (LIC) with extreme low bitrates. Previous LIC methods based on transmitting quantized continuous features often yield blurry and noisy reconstruction due to the severe quantization loss. While previous LIC methods based on learned codebooks that discretize visual space usually give poor-fidelity reconstruction due to the… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  15. arXiv:2404.04879  [pdf, other

    cs.RO eess.SY

    Multi-Type Map Construction via Semantics-Aware Autonomous Exploration in Unknown Indoor Environments

    Authors: Jianfang Mao, Yuheng Xie, Si Chen, Zhixiong Nan, Xiao Wang

    Abstract: This paper proposes a novel semantics-aware autonomous exploration model to handle the long-standing issue: the mainstream RRT (Rapid-exploration Random Tree) based exploration models usually make the mobile robot switch frequently between different regions, leading to the excessively-repeated explorations for the same region. Our proposed semantics-aware model encourages a mobile robot to fully e… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  16. arXiv:2404.03329  [pdf

    cs.LG eess.SP stat.ML

    DeepFunction: Deep Metric Learning-based Imbalanced Classification for Diagnosing Threaded Pipe Connection Defects using Functional Data

    Authors: Yukun Xie, Juan Du, Chen Zhang

    Abstract: In modern manufacturing, most of the product lines are conforming. Few products are nonconforming but with different defect types. The identification of defect types can help further root cause diagnosis of production lines. With the sensing development, signals of process variables can be collected in high resolution, which can be regarded as multichannel functional data. They have abundant infor… ▽ More

    Submitted 24 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Revised version for submission to IISE Transactions

  17. Guiding the underwater acoustic target recognition with interpretable contrastive learning

    Authors: Yuan Xie, Jiawei Ren, Ji Xu

    Abstract: Recognizing underwater targets from acoustic signals is a challenging task owing to the intricate ocean environments and variable underwater channels. While deep learning-based systems have become the mainstream approach for underwater acoustic target recognition, they have faced criticism for their lack of interpretability and weak generalization performance in practical applications. In this wor… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Journal ref: OCEANS 2023-Limerick. IEEE, 2023: 1-6

  18. Unraveling Complex Data Diversity in Underwater Acoustic Target Recognition through Convolution-based Mixture of Experts

    Authors: Yuan Xie, Jiawei Ren, Ji Xu

    Abstract: Underwater acoustic target recognition is a difficult task owing to the intricate nature of underwater acoustic signals. The complex underwater environments, unpredictable transmission channels, and dynamic motion states greatly impact the real-world underwater acoustic signals, and may even obscure the intrinsic characteristics related to targets. Consequently, the data distribution of underwater… ▽ More

    Submitted 30 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Journal ref: Expert Systems with Applications (2024): 123431

  19. arXiv:2402.05582  [pdf

    eess.IV cs.CV cs.MM

    Joint End-to-End Image Compression and Denoising: Leveraging Contrastive Learning and Multi-Scale Self-ONNs

    Authors: Yuxin Xie, Li Yu, Farhad Pakdaman, Moncef Gabbouj

    Abstract: Noisy images are a challenge to image compression algorithms due to the inherent difficulty of compressing noise. As noise cannot easily be discerned from image details, such as high-frequency signals, its presence leads to extra bits needed for compression. Since the emerging learned image compression paradigm enables end-to-end optimization of codecs, recent efforts were made to integrate denois… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Copyright 2024 IEEE - Submitted to IEEE ICIP 2024

  20. arXiv:2402.02889  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Exploring Federated Self-Supervised Learning for General Purpose Audio Understanding

    Authors: Yasar Abbas Ur Rehman, Kin Wai Lau, Yuyang Xie, Lan Ma, Jiajun Shen

    Abstract: The integration of Federated Learning (FL) and Self-supervised Learning (SSL) offers a unique and synergetic combination to exploit the audio data for general-purpose audio understanding, without compromising user data privacy. However, rare efforts have been made to investigate the SSL models in the FL regime for general-purpose audio understanding, especially when the training data is generated… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  21. arXiv:2401.11058  [pdf, ps, other

    cs.IT eess.SP

    Low Complexity Turbo SIC-MMSE Detection for Orthogonal Time Frequency Space Modulation

    Authors: Qi Li, Jinhong Yuan, Min Qiu, Shuangyang Li, Yixuan Xie

    Abstract: Recently, orthogonal time frequency space (OTFS) modulation has garnered considerable attention due to its robustness against doubly-selective wireless channels. In this paper, we propose a low-complexity iterative successive interference cancellation based minimum mean squared error (SIC-MMSE) detection algorithm for zero-padded OTFS (ZP-OTFS) modulation. In the proposed algorithm, signals are de… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 15 pages, 12 figures, accepted by IEEE Transactions on Communications

  22. arXiv:2310.11551  [pdf, other

    cs.NI eess.SP

    WaveFlex: A Smart Surface for Private CBRS Wireless Cellular Networks

    Authors: Fan Yi, Kun Woo Cho, Yaxiong Xie, Kyle Jamieson

    Abstract: We present the design and implementation of WaveFlex, the first smart surface that enhances Private LTE/5G networks operating under the shared-license framework in the Citizens Broadband Radio Service frequency band. WaveFlex works in the presence of frequency diversity: multiple nearby base stations operating on different frequencies, as dictated by a Spectrum Access System coordinator. It also h… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: 15 pages

  23. arXiv:2310.06291  [pdf, other

    eess.IV cs.CV physics.med-ph

    Three-Dimensional Medical Image Fusion with Deformable Cross-Attention

    Authors: Lin Liu, Xinxin Fan, Chulong Zhang, Jingjing Dai, Yaoqin Xie, Xiaokun Liang

    Abstract: Multimodal medical image fusion plays an instrumental role in several areas of medical image processing, particularly in disease recognition and tumor detection. Traditional fusion methods tend to process each modality independently before combining the features and reconstructing the fusion image. However, this approach often neglects the fundamental commonalities and disparities between multimod… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  24. arXiv:2309.17307  [pdf, ps, other

    eess.SY

    Data-Driven Min-Max MPC for Linear Systems

    Authors: Yifan Xie, Julian Berberich, Frank Allgower

    Abstract: Designing data-driven controllers in the presence of noise is an important research problem, in particular when guarantees on stability, robustness, and constraint satisfaction are desired. In this paper, we propose a data-driven min-max model predictive control (MPC) scheme to design state-feedback controllers from noisy data for unknown linear time-invariant (LTI) system. The considered min-max… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  25. arXiv:2309.03036  [pdf, other

    cs.SD cs.AI eess.AS

    An Efficient Temporary Deepfake Location Approach Based Embeddings for Partially Spoofed Audio Detection

    Authors: Yuankun Xie, Haonan Cheng, Yutian Wang, Long Ye

    Abstract: Partially spoofed audio detection is a challenging task, lying in the need to accurately locate the authenticity of audio at the frame level. To address this issue, we propose a fine-grained partially spoofed audio detection method, namely Temporal Deepfake Location (TDL), which can effectively capture information of both features and locations. Specifically, our approach involves two novel parts:… ▽ More

    Submitted 21 November, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

  26. arXiv:2309.02232  [pdf, other

    cs.SD cs.AI eess.AS

    FSD: An Initial Chinese Dataset for Fake Song Detection

    Authors: Yuankun Xie, Jingjing Zhou, Xiaolin Lu, Zhenghao Jiang, Yuxin Yang, Haonan Cheng, Long Ye

    Abstract: Singing voice synthesis and singing voice conversion have significantly advanced, revolutionizing musical experiences. However, the rise of "Deepfake Songs" generated by these technologies raises concerns about authenticity. Unlike Audio DeepFake Detection (ADD), the field of song deepfake detection lacks specialized datasets or methods for song authenticity verification. In this paper, we initial… ▽ More

    Submitted 6 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  27. arXiv:2308.16742  [pdf, other

    eess.IV cs.CV

    Unsupervised CT Metal Artifact Reduction by Plugging Diffusion Priors in Dual Domains

    Authors: Xuan Liu, Yaoqin Xie, Songhui Diao, Shan Tan, Xiaokun Liang

    Abstract: During the process of computed tomography (CT), metallic implants often cause disruptive artifacts in the reconstructed images, impeding accurate diagnosis. Several supervised deep learning-based approaches have been proposed for reducing metal artifacts (MAR). However, these methods heavily rely on training with simulated data, as obtaining paired metal artifact CT and clean CT data in clinical s… ▽ More

    Submitted 5 January, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

  28. arXiv:2308.05305  [pdf, other

    eess.IV cs.CV cs.LG

    From CNN to Transformer: A Review of Medical Image Segmentation Models

    Authors: Wenjian Yao, Jiajun Bai, Wei Liao, Yuheng Chen, Mengjuan Liu, Yao Xie

    Abstract: Medical image segmentation is an important step in medical image analysis, especially as a crucial prerequisite for efficient disease diagnosis and treatment. The use of deep learning for image segmentation has become a prevalent trend. The widely adopted approach currently is U-Net and its variants. Additionally, with the remarkable success of pre-trained models in natural language processing tas… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: 18 pages, 8 figures

  29. Synthetic white balancing for intra-operative hyperspectral imaging

    Authors: Anisha Bahl, Conor C. Horgan, Mirek Janatka, Oscar J. MacCormac, Philip Noonan, Yijing Xie, Jianrong Qiu, Nicola Cavalcanti, Philipp Fürnstahl, Michael Ebner, Mads S. Bergholt, Jonathan Shapey, Tom Vercauteren

    Abstract: Hyperspectral imaging shows promise for surgical applications to non-invasively provide spatially-resolved, spectral information. For calibration purposes, a white reference image of a highly-reflective Lambertian surface should be obtained under the same imaging conditions. Standard white references are not sterilizable, and so are unsuitable for surgical environments. We demonstrate the necessit… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: 22 pages, 10 figures

  30. arXiv:2307.07265  [pdf, other

    cs.SD cs.AI eess.AS

    AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND Audio-Based-Interaction-Recognition Challenge 2023

    Authors: Kin Wai Lau, Yasar Abbas Ur Rehman, Yuyang Xie, Lan Ma

    Abstract: This report presents the technical details of our submission to the 2023 Epic-Kitchen EPIC-SOUNDS Audio-Based Interaction Recognition Challenge. The task is to learn the mapping from audio samples to their corresponding action labels. To achieve this goal, we propose a simple yet effective single-stream CNN-based architecture called AudioInceptionNeXt that operates on the time-frequency log-mel-sp… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

  31. arXiv:2306.13307  [pdf, other

    eess.AS cs.CL

    Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems

    Authors: Mingyu Cui, Jiawen Kang, Jiajun Deng, Xi Yin, Yutao Xie, Xie Chen, Xunying Liu

    Abstract: Current ASR systems are mainly trained and evaluated at the utterance level. Long range cross utterance context can be incorporated. A key task is to derive a suitable compact representation of the most relevant history contexts. In contrast to previous researches based on either LSTM-RNN encoded histories that attenuate the information from longer range contexts, or frame level concatenation of t… ▽ More

    Submitted 25 June, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: Accepted by INTERSPEECH 2023

  32. Adaptive ship-radiated noise recognition with learnable fine-grained wavelet transform

    Authors: Yuan Xie, Jiawei Ren, Ji Xu

    Abstract: Analyzing the ocean acoustic environment is a tricky task. Background noise and variable channel transmission environment make it complicated to implement accurate ship-radiated noise recognition. Existing recognition systems are weak in addressing the variable underwater environment, thus leading to disappointing performance in practical application. In order to keep the recognition system robust… ▽ More

    Submitted 19 February, 2024; v1 submitted 31 May, 2023; originally announced June 2023.

    Journal ref: Ocean Engineering 265 (2022): 112626

  33. arXiv:2305.19621  [pdf, other

    eess.IV cs.CV physics.med-ph

    XTransCT: Ultra-Fast Volumetric CT Reconstruction using Two Orthogonal X-Ray Projections for Image-guided Radiation Therapy via a Transformer Network

    Authors: Chulong Zhang, Lin Liu, Jingjing Dai, Xuan Liu, Wenfeng He, Yinping Chan, Yaoqin Xie, Feng Chi, Xiaokun Liang

    Abstract: Computed tomography (CT) scans offer a detailed, three-dimensional representation of patients' internal organs. However, conventional CT reconstruction techniques necessitate acquiring hundreds or thousands of x-ray projections through a complete rotational scan of the body, making navigation or positioning during surgery infeasible. In image-guided radiation therapy, a method that reconstructs ul… ▽ More

    Submitted 23 November, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

  34. arXiv:2305.19612  [pdf, other

    cs.SD cs.LG eess.AS

    Underwater-Art: Expanding Information Perspectives With Text Templates For Underwater Acoustic Target Recognition

    Authors: Yuan Xie, Jiawei Ren, Ji Xu

    Abstract: Underwater acoustic target recognition is an intractable task due to the complex acoustic source characteristics and sound propagation patterns. Limited by insufficient data and narrow information perspective, recognition models based on deep learning seem far from satisfactory in practical underwater scenarios. Although underwater acoustic signals are severely influenced by distance, channel dept… ▽ More

    Submitted 19 February, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

    Journal ref: The Journal of the Acoustical Society of America, 2022, 152(5): 2641-2651

  35. Learning Music Sequence Representation from Text Supervision

    Authors: Tianyu Chen, Yuan Xie, Shuai Zhang, Shaohan Huang, Haoyi Zhou, Jianxin Li

    Abstract: Music representation learning is notoriously difficult for its complex human-related concepts contained in the sequence of numerical signals. To excavate better MUsic SEquence Representation from labeled audio, we propose a novel text-supervision pre-training method, namely MUSER. MUSER adopts an audio-spectrum-text tri-modal contrastive learning framework, where the text input could be any form o… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022: 4583-4587

  36. arXiv:2305.17937  [pdf, other

    eess.IV cs.CV

    Attention Mechanisms in Medical Image Segmentation: A Survey

    Authors: Yutong Xie, Bing Yang, Qingbiao Guan, Jianpeng Zhang, Qi Wu, Yong Xia

    Abstract: Medical image segmentation plays an important role in computer-aided diagnosis. Attention mechanisms that distinguish important parts from irrelevant parts have been widely used in medical image segmentation tasks. This paper systematically reviews the basic principles of attention mechanisms and their applications in medical image segmentation. First, we review the basic concepts of attention mec… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: Submitted to Medical Image Analysis, survey paper, 34 pages, over 300 references

  37. arXiv:2305.15887  [pdf, other

    eess.IV cs.CV

    Diffusion Probabilistic Priors for Zero-Shot Low-Dose CT Image Denoising

    Authors: Xuan Liu, Yaoqin Xie, Jun Cheng, Songhui Diao, Shan Tan, Xiaokun Liang

    Abstract: Denoising low-dose computed tomography (CT) images is a critical task in medical image computing. Supervised deep learning-based approaches have made significant advancements in this area in recent years. However, these methods typically require pairs of low-dose and normal-dose CT images for training, which are challenging to obtain in clinical settings. Existing unsupervised deep learning-based… ▽ More

    Submitted 13 July, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

  38. arXiv:2305.04159  [pdf, other

    eess.AS

    Lookahead When It Matters: Adaptive Non-causal Transformers for Streaming Neural Transducers

    Authors: Grant P. Strimel, Yi Xie, Brian King, Martin Radfar, Ariya Rastrow, Athanasios Mouchtaris

    Abstract: Streaming speech recognition architectures are employed for low-latency, real-time applications. Such architectures are often characterized by their causality. Causal architectures emit tokens at each frame, relying only on current and past signal, while non-causal models are exposed to a window of future frames at each step to increase predictive accuracy. This dichotomy amounts to a trade-off fo… ▽ More

    Submitted 9 May, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

    Comments: Accepted to ICML 2023

  39. arXiv:2304.11907  [pdf, other

    cs.LG cs.SD eess.AS

    Advancing underwater acoustic target recognition via adaptive data pruning and smoothness-inducing regularization

    Authors: Yuan Xie, Tianyu Chen, Ji Xu

    Abstract: Underwater acoustic recognition for ship-radiated signals has high practical application value due to the ability to recognize non-line-of-sight targets. However, due to the difficulty of data acquisition, the collected signals are scarce in quantity and mainly composed of mechanical periodic noise. According to the experiments, we observe that the repeatability of periodic signals leads to a doub… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

  40. arXiv:2304.08384  [pdf, other

    cs.CV eess.IV

    Unsupervised Image Denoising with Score Function

    Authors: Yutong Xie, Mingze Yuan, Bin Dong, Quanzheng Li

    Abstract: Though achieving excellent performance in some cases, current unsupervised learning methods for single image denoising usually have constraints in applications. In this paper, we propose a new approach which is more general and applicable to complicated noise models. Utilizing the property of score function, the gradient of logarithmic probability, we define a solving system for denoising. Once th… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  41. GPSMirror: Expanding Accurate GPS Positioning to Shadowed and Indoor Regions with Backscatter

    Authors: Huixin Dong, Yirong Xie, Xianan Zhang, Wei Wang, Xinyu Zhang, Jianhua He

    Abstract: Despite the prevalence of GPS services, they still suffer from intermittent positioning with poor accuracy in partially shadowed regions like urban canyons, flyover shadows, and factories' indoor areas. Existing wisdom relies on hardware modifications of GPS receivers or power-hungry infrastructures requiring continuous plug-in power supply which is hard to provide in outdoor regions and some fact… ▽ More

    Submitted 15 April, 2023; originally announced April 2023.

    Comments: 13 pages, 26 figures, to appear in MobiCom 2023

  42. arXiv:2301.04401  [pdf, other

    eess.IV cs.CV

    An atrium segmentation network with location guidance and siamese adjustment

    Authors: Yuhan Xie, Zhiyong Zhang, Shaolong Chen, Changzhen Qiu

    Abstract: The segmentation of atrial scan images is of great significance for the three-dimensional reconstruction of the atrium and the surgical positioning. Most of the existing segmentation networks adopt a 2D structure and only take original images as input, ignoring the context information of 3D images and the role of prior information. In this paper, we propose an atrium segmentation network LGSANet w… ▽ More

    Submitted 11 January, 2023; originally announced January 2023.

    Comments: 17 pages,9 figures

    ACM Class: I.4.6

  43. arXiv:2212.01099  [pdf, ps, other

    eess.SY

    Linear Data-Driven Economic MPC with Generalized Terminal Constraint

    Authors: Yifan Xie, Julian Berberich, Frank Allgöwer

    Abstract: In this paper, we propose a data-driven economic model predictive control (EMPC) scheme with generalized terminal constraint to control an unknown linear time-invariant system. Our scheme is based on the Fundamental Lemma to predict future system trajectories using a persistently exciting input-output trajectory. The control objective is to minimize an economic cost objective. By employing a gener… ▽ More

    Submitted 5 December, 2022; v1 submitted 2 December, 2022; originally announced December 2022.

  44. arXiv:2211.08191  [pdf, other

    eess.AS cs.LG

    Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder

    Authors: Yuying Xie, Thomas Arildsen, Zheng-Hua Tan

    Abstract: Leveraging the fact that speaker identity and content vary on different time scales, \acrlong{fhvae} (\acrshort{fhvae}) uses different latent variables to symbolize these two attributes. Disentanglement of these attributes is carried out by different prior settings of the corresponding latent variables. For the prior of speaker identity variable, \acrshort{fhvae} assumes it is a Gaussian distribut… ▽ More

    Submitted 14 June, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: accepted by EUSIPCO 2023

  45. arXiv:2211.06555  [pdf, other

    eess.SP

    Transforming RIS-Assisted Passive Beamforming from Tedious to Simple: A Relaxation Algorithm for Rician Channel

    Authors: Xuehui Dong, Rujing Xiong, Tiebin Mi, Yuan Xie, Robert Caiming Qiu

    Abstract: This paper investigates the problem of maximizing the signal-to-noise ratio (SNR) in reconfigurable intelligent surface (RIS)-assisted MISO communication systems. The problem will be reformulated as a complex quadratic form problem with unit circle constraints. We proved that the SNR maximizing problem has a closed-form global optimal solution when it is a rank-one problem, whereas the former rese… ▽ More

    Submitted 21 November, 2022; v1 submitted 11 November, 2022; originally announced November 2022.

  46. arXiv:2210.00844  [pdf

    eess.SY

    A Dual Realization of Chua's Chaotic Oscillator Using a Current-Controlled Nonlinear Resistor

    Authors: Yihang Chen, Weijie Dong, Yongping Xie

    Abstract: A dual realization of Chuas chaotic oscillator is proposed using current-controlled nonlinear resistors, one linear resistor, one capacitor and two inductors. Two problems are solved. First, unit rescaling is necessary when transforming the standard chaotic equations into circuit equations to ensure that the current units are milliamperes. In addition, the connection and parameters of two current-… ▽ More

    Submitted 20 September, 2022; originally announced October 2022.

    Comments: 6 pages

  47. arXiv:2208.07655  [pdf, other

    eess.IV cs.CV cs.LG physics.med-ph

    A Hybrid Deep Feature-Based Deformable Image Registration Method for Pathology Images

    Authors: Chulong Zhang, Yuming Jiang, Na Li, Zhicheng Zhang, Md Tauhidul Islam, Jingjing Dai, Lin Liu, Wenfeng He, Wenjian Qin, Jing Xiong, Yaoqin Xie, Xiaokun Liang

    Abstract: Pathologists need to combine information from differently stained pathology slices for accurate diagnosis. Deformable image registration is a necessary technique for fusing multi-modal pathology slices. This paper proposes a hybrid deep feature-based deformable image registration framework for stained pathology samples. We first extract dense feature points via the detector-based and detector-free… ▽ More

    Submitted 10 April, 2023; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: 22 pages, 12 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  48. arXiv:2208.05163  [pdf, other

    cs.CV cs.LG eess.IV

    Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization

    Authors: Zhengang Li, Mengshu Sun, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, Miriam Leeser, Zhangyang Wang, Xue Lin, Zhenman Fang

    Abstract: Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks. However, their complex architecture and enormous computation/storage demand impose urgent needs for new hardware accelerator design methodology. This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization. To the best of our knowledge, thi… ▽ More

    Submitted 10 August, 2022; originally announced August 2022.

    Comments: Published in FPL2022

  49. arXiv:2207.12793  [pdf

    eess.SY

    Modeling mandatory and discretionary lane changes using dynamic interaction networks

    Authors: Yue Zhang, Yajie Zou, Yuanchang Xie, Lei Chen

    Abstract: A quantitative understanding of dynamic lane-changing (LC) interaction patterns is indispensable for improving the decision-making of autonomous vehicles, especially in mixed traffic with human-driven vehicles. This paper develops a novel framework combining the hidden Markov model and graph structure to identify the difference in dynamic interaction networks between mandatory lane changes (MLC) a… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

  50. arXiv:2207.10869  [pdf, other

    eess.IV cs.CV

    Optimizing Image Compression via Joint Learning with Denoising

    Authors: Ka Leong Cheng, Yueqi Xie, Qifeng Chen

    Abstract: High levels of noise usually exist in today's captured images due to the relatively small sensors equipped in the smartphone cameras, where the noise brings extra challenges to lossy image compression algorithms. Without the capacity to tell the difference between image details and noise, general image compression methods allocate additional bits to explicitly store the undesired image noise durin… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022