Zum Hauptinhalt springen

Showing 1–10 of 10 results for author: Jia, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.20585  [pdf, other

    cs.NI eess.SP

    A UAV-Enabled Time-Sensitive Data Collection Scheme for Grassland Monitoring Edge Networks

    Authors: Dongbin Jiao, Zihao Wang, Wen Fan, Weibo Yang, Peng Yang, Zhanhuan Shang, Shi Yan

    Abstract: Grassland monitoring is essential for the sustainable development of grassland resources. Traditional Internet of Things (IoT) devices generate critical ecological data, making data loss unacceptable, but the harsh environment complicates data collection. Unmanned Aerial Vehicle (UAV) and mobile edge computing (MEC) offer efficient data collection solutions, enhancing performance on resource-limit… ▽ More

    Submitted 10 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

  2. arXiv:2407.04416  [pdf, other

    cs.SD cs.MM eess.AS

    Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions

    Authors: Yi Yuan, Dongya Jia, Xiaobin Zhuang, Yuanzhe Chen, Zhengxi Liu, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xubo Liu, Xiyuan Kang, Mark D. Plumbley, Wenwu Wang

    Abstract: Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the simplicity and scarcity of the training data. This work aims to create a large-scale audio dataset with rich captions for improving audio generation models.… ▽ More

    Submitted 14 August, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: 5 pages with 1 appendix

  3. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  4. arXiv:2404.17806  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

    Authors: Yi Yuan, Zhuo Chen, Xubo Liu, Haohe Liu, Xuenan Xu, Dongya Jia, Yuanzhe Chen, Mark D. Plumbley, Wenwu Wang

    Abstract: Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal information within audio and text features, presenting substantial limitations for tasks such as audio retrieval and generation. To address this gap, we introd… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Preprint submitted to IEEE MLSP 2024

  5. arXiv:2404.06674  [pdf, other

    cs.SD cs.AI eess.AS

    VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

    Authors: Philip Anastassiou, Zhenyu Tang, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yuping Wang, Yuxuan Wang, Mingbo Ma

    Abstract: We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre. Previous works have been constrained to specialized models that can only edit these attributes individually and suffer from the following pitfalls: the magnitude of the conversion… ▽ More

    Submitted 11 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  6. arXiv:2308.05862  [pdf, other

    eess.IV cs.AI cs.CV

    Unleashing the Strengths of Unlabeled Data in Pan-cancer Abdominal Organ Quantification: the FLARE22 Challenge

    Authors: Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Shihao Ma, Adamo Young, Cheng Zhu, Kangkang Meng, Xin Yang, Ziyan Huang, Fan Zhang, Wentao Liu, YuanKe Pan, Shoujin Huang, Jiacheng Wang, Mingze Sun, Weixin Xu, Dengqiang Jia, Jae Won Choi, Natália Alves, Bram de Wilde, Gregor Koehler, Yajun Wu, Manuel Wiesenfarth, Qiongjie Zhu , et al. (4 additional authors not shown)

    Abstract: Quantitative organ assessment is an essential step in automated abdominal disease diagnosis and treatment planning. Artificial intelligence (AI) has shown great potential to automatize this process. However, most existing AI algorithms rely on many expert annotations and lack a comprehensive evaluation of accuracy and efficiency in real-world multinational settings. To overcome these limitations,… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: MICCAI FLARE22: https://flare22.grand-challenge.org/

  7. arXiv:2305.13869  [pdf, other

    physics.acc-ph cs.AI cs.LG eess.SY

    Trend-Based SAC Beam Control Method with Zero-Shot in Superconducting Linear Accelerator

    Authors: Xiaolong Chen, Xin Qi, Chunguang Su, Yuan He, Zhijun Wang, Kunxiang Sun, Chao Jin, Weilong Chen, Shuhui Liu, Xiaoying Zhao, Duanyang Jia, Man Yi

    Abstract: The superconducting linear accelerator is a highly flexiable facility for modern scientific discoveries, necessitating weekly reconfiguration and tuning. Accordingly, minimizing setup time proves essential in affording users with ample experimental time. We propose a trend-based soft actor-critic(TBSAC) beam control method with strong robustness, allowing the agents to be trained in a simulated en… ▽ More

    Submitted 25 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  8. arXiv:2212.05751  [pdf, other

    eess.AS

    Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network

    Authors: Dongya Jia, Qiao Tian, Kainan Peng, Jiaxin Li, Yuanzhe Chen, Mingbo Ma, Yuping Wang, Yuxuan Wang

    Abstract: The goal of accent conversion (AC) is to convert the accent of speech into the target accent while preserving the content and speaker identity. AC enables a variety of applications, such as language learning, speech content creation, and data augmentation. Previous methods rely on reference utterances in the inference phase or are unable to preserve speaker identity. To address these issues, we pr… ▽ More

    Submitted 10 August, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: Accepted by INTERSPEECH 2023

  9. arXiv:2012.08929  [pdf, other

    eess.IV cs.CV cs.LG

    Learning-Based Algorithms for Vessel Tracking: A Review

    Authors: Dengqiang Jia, Xiahai Zhuang

    Abstract: Developing efficient vessel-tracking algorithms is crucial for imaging-based diagnosis and treatment of vascular diseases. Vessel tracking aims to solve recognition problems such as key (seed) point detection, centerline extraction, and vascular segmentation. Extensive image-processing techniques have been developed to overcome the problems of vessel tracking that are mainly attributed to the comp… ▽ More

    Submitted 16 December, 2020; originally announced December 2020.

    Comments: 19 pages, 3 figures, 9 tables, accept by Computerized Medical Imaging and Graphics

  10. arXiv:2001.09653  [pdf

    eess.AS eess.SP

    Audio Codec Enhancement with Generative Adversarial Networks

    Authors: Arijit Biswas, Dai Jia

    Abstract: Audio codecs are typically transform-domain based and efficiently code stationary audio signals, but they struggle with speech and signals containing dense transient events such as applause. Specifically, with these two classes of signals as examples, we demonstrate a technique for restoring audio from coding noise based on generative adversarial networks (GAN). A primary advantage of the proposed… ▽ More

    Submitted 27 January, 2020; originally announced January 2020.

    Comments: Accepted to 45th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 04-08 May 2020