Zum Hauptinhalt springen

Showing 1–19 of 19 results for author: Chong, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.15916  [pdf, other

    eess.AS cs.LG cs.SD

    Multi-modal Adversarial Training for Zero-Shot Voice Cloning

    Authors: John Janiczek, Dading Chong, Dongyang Dai, Arlo Faria, Chao Wang, Tao Wang, Yuzong Liu

    Abstract: A text-to-speech (TTS) model trained to reconstruct speech given text tends towards predictions that are close to the average characteristics of a dataset, failing to model the variations that make human speech sound natural. This problem is magnified for zero-shot voice cloning, a task that requires training data with high variance in speaking styles. We build off of recent works which have used… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted at INTERSPEECH 2024

  2. arXiv:2408.13893  [pdf, other

    cs.SD cs.CL eess.AS

    SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models

    Authors: Dongchao Yang, Rongjie Huang, Yuanyuan Wang, Haohan Guo, Dading Chong, Songxiang Liu, Xixin Wu, Helen Meng

    Abstract: Scaling Text-to-speech (TTS) to large-scale datasets has been demonstrated as an effective method for improving the diversity and naturalness of synthesized speech. At the high level, previous large-scale TTS models can be categorized into either Auto-regressive (AR) based (\textit{e.g.}, VALL-E) or Non-auto-regressive (NAR) based models (\textit{e.g.}, NaturalSpeech 2/3). Although these works dem… ▽ More

    Submitted 28 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: Submit to TASLP

  3. arXiv:2406.09838  [pdf, other

    cs.CV cs.AI

    Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps

    Authors: Jian Chen, Peilin Zhou, Yining Hua, Dading Chong, Meng Cao, Yaowei Li, Zixuan Yuan, Bing Zhu, Junwei Liang

    Abstract: Real-time detection and prediction of extreme weather protect human lives and infrastructure. Traditional methods rely on numerical threshold setting and manual interpretation of weather heatmaps with Geographic Information Systems (GIS), which can be slow and error-prone. Our research redefines Extreme Weather Events Detection (EWED) by framing it as a Visual Question Answering (VQA) problem, the… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  4. arXiv:2406.04555  [pdf, other

    cs.CL cs.AI

    Creating an AI Observer: Generative Semantic Workspaces

    Authors: Pavan Holur, Shreyas Rajesh, David Chong, Vwani Roychowdhury

    Abstract: An experienced human Observer reading a document -- such as a crime report -- creates a succinct plot-like $\textit{``Working Memory''}$ comprising different actors, their prototypical roles and states at any point, their evolution over time based on their interactions, and even a map of missing Semantic parts anticipating them in the future.… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 37 pages with appendix, 28 figures

  5. arXiv:2405.20215  [pdf, other

    cs.CL

    TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models

    Authors: Chen Zhang, Chengguang Tang, Dading Chong, Ke Shi, Guohua Tang, Feng Jiang, Haizhou Li

    Abstract: Mainstream approaches to aligning large language models (LLMs) heavily rely on human preference data, particularly when models require periodic updates. The standard process for iterative alignment of LLMs involves collecting new human feedback for each update. However, the data collection process is costly and challenging to scale. To address this issue, we introduce the "TS-Align" framework, whi… ▽ More

    Submitted 14 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  6. arXiv:2310.17956  [pdf, other

    cs.CV cs.AI cs.CL

    Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare

    Authors: Junling Liu, Ziming Wang, Qichen Ye, Dading Chong, Peilin Zhou, Yining Hua

    Abstract: Large Language Models (LLMs) have introduced a new era of proficiency in comprehending complex healthcare and biomedical topics. However, there is a noticeable lack of models in languages other than English and models that can interpret multi-modal input, which is crucial for global healthcare accessibility. In response, this study introduces Qilin-Med-VL, the first Chinese large vision-language m… ▽ More

    Submitted 1 November, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

  7. arXiv:2310.09089  [pdf, other

    cs.CL

    Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model

    Authors: Qichen Ye, Junling Liu, Dading Chong, Peilin Zhou, Yining Hua, Fenglin Liu, Meng Cao, Ziming Wang, Xuxin Cheng, Zhu Lei, Zhenhua Guo

    Abstract: Integrating large language models (LLMs) into healthcare holds great potential but faces challenges. Pre-training LLMs from scratch for domains like medicine is resource-heavy and often unfeasible. On the other hand, sole reliance on Supervised Fine-tuning (SFT) can result in overconfident predictions and may not tap into domain-specific insights. In response, we present a multi-stage training met… ▽ More

    Submitted 17 April, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

  8. arXiv:2308.12241  [pdf, other

    cs.IR cs.AI

    LLMRec: Benchmarking Large Language Models on Recommendation Task

    Authors: Junling Liu, Chao Liu, Peilin Zhou, Qichen Ye, Dading Chong, Kang Zhou, Yueqi Xie, Yuwei Cao, Shoujin Wang, Chenyu You, Philip S. Yu

    Abstract: Recently, the fast development of Large Language Models (LLMs) such as ChatGPT has significantly advanced NLP tasks by enhancing the capabilities of conversational models. However, the application of LLMs in the recommendation domain has not been thoroughly investigated. To bridge this gap, we propose LLMRec, a LLM-based recommender system designed for benchmarking LLMs on various recommendation t… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  9. arXiv:2306.03030  [pdf, other

    cs.CL

    Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset

    Authors: Junling Liu, Peilin Zhou, Yining Hua, Dading Chong, Zhongyu Tian, Andrew Liu, Helin Wang, Chenyu You, Zhenhua Guo, Lei Zhu, Michael Lingzhi Li

    Abstract: Recent advancements in large language models (LLMs) have transformed the field of question answering (QA). However, evaluating LLMs in the medical field is challenging due to the lack of standardized and comprehensive datasets. To address this gap, we introduce CMExam, sourced from the Chinese National Medical Licensing Examination. CMExam consists of 60K+ multiple-choice questions for standardize… ▽ More

    Submitted 22 October, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted by NeurIPS 2023 Datasets and Benchmarks Track

  10. arXiv:2211.06993  [pdf, other

    cs.CL

    GreenPLM: Cross-Lingual Transfer of Monolingual Pre-Trained Language Models at Almost No Cost

    Authors: Qingcheng Zeng, Lucas Garay, Peilin Zhou, Dading Chong, Yining Hua, Jiageng Wu, Yikang Pan, Han Zhou, Rob Voigt, Jie Yang

    Abstract: Large pre-trained models have revolutionized natural language processing (NLP) research and applications, but high training costs and limited data resources have prevented their benefits from being shared equally amongst speakers of all the world's languages. To address issues of cross-linguistic access to such models and reduce energy consumption for sustainability during large-scale model traini… ▽ More

    Submitted 26 May, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

    Comments: Accepted at IJCAI 2023 AI and Social Good Track

  11. arXiv:2209.13773  [pdf, other

    cs.CL

    METS-CoV: A Dataset of Medical Entity and Targeted Sentiment on COVID-19 Related Tweets

    Authors: Peilin Zhou, Zeqiang Wang, Dading Chong, Zhijiang Guo, Yining Hua, Zichang Su, Zhiyang Teng, Jiageng Wu, Jie Yang

    Abstract: The COVID-19 pandemic continues to bring up various topics discussed or debated on social media. In order to explore the impact of pandemics on people's lives, it is crucial to understand the public's concerns and attitudes towards pandemic-related entities (e.g., drugs, vaccines) on social media. However, models trained on existing named entity recognition (NER) or targeted sentiment analysis (TS… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: 10 pages, 6 figures, 6 tables, accepted by NeurIPS 2022 Datasets and Benchmarks track

  12. arXiv:2206.12759  [pdf, other

    cs.CL cs.SD eess.AS

    Low-resource Accent Classification in Geographically-proximate Settings: A Forensic and Sociophonetics Perspective

    Authors: Qingcheng Zeng, Dading Chong, Peilin Zhou, Jie Yang

    Abstract: Accented speech recognition and accent classification are relatively under-explored research areas in speech technology. Recently, deep learning-based methods and Transformer-based pretrained models have achieved superb performances in both areas. However, most accent classification tasks focused on classifying different kinds of English accents and little attention was paid to geographically-prox… ▽ More

    Submitted 28 June, 2022; v1 submitted 25 June, 2022; originally announced June 2022.

    Comments: INTERSPEECH 2022

  13. arXiv:2205.12702  [pdf, other

    cs.CL

    Detecting Label Errors by using Pre-Trained Language Models

    Authors: Derek Chong, Jenny Hong, Christopher D. Manning

    Abstract: We show that large pre-trained language models are inherently highly capable of identifying label errors in natural language datasets: simply examining out-of-sample data points in descending order of fine-tuned task loss significantly outperforms more complex error-detection mechanisms proposed in previous work. To this end, we contribute a novel method for introducing realistic, human-originat… ▽ More

    Submitted 15 December, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: 18 pages, 10 figures. Accepted to EMNLP 2022; typesetting of this version slightly differs from conference version

  14. arXiv:2205.11008  [pdf, other

    cs.CL cs.SD eess.AS

    Calibrate and Refine! A Novel and Agile Framework for ASR-error Robust Intent Detection

    Authors: Peilin Zhou, Dading Chong, Helin Wang, Qingcheng Zeng

    Abstract: The past ten years have witnessed the rapid development of text-based intent detection, whose benchmark performances have already been taken to a remarkable level by deep learning techniques. However, automatic speech recognition (ASR) errors are inevitable in real-world applications due to the environment noise, unique speech patterns and etc, leading to sharp performance drop in state-of-the-art… ▽ More

    Submitted 22 May, 2022; originally announced May 2022.

    Comments: Submit to INTERSPEECH 2022

  15. arXiv:2204.12768  [pdf, other

    cs.SD eess.AS

    Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training

    Authors: Dading Chong, Helin Wang, Peilin Zhou, Qingcheng Zeng

    Abstract: Transformer-based models attain excellent results and generalize well when trained on sufficient amounts of data. However, constrained by the limited data available in the audio domain, most transformer-based models for audio tasks are finetuned from pre-trained models in other domains (e.g. image), which has a notable gap with the audio domain. Other methods explore the self-supervised learning a… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: Submit to INTERSPEECH 2022

  16. arXiv:2108.00071  [pdf

    cs.LG cs.AI stat.ML

    Foundations of data imbalance and solutions for a data democracy

    Authors: Ajay Kulkarni, Deri Chong, Feras A. Batarseh

    Abstract: Dealing with imbalanced data is a prevalent problem while performing classification on the datasets. Many times, this problem contributes to bias while making decisions or implementing policies. Thus, it is vital to understand the factors which cause imbalance in the data (or class imbalance). Such hidden biases and imbalances can lead to data tyranny and a major challenge to a data democracy. In… ▽ More

    Submitted 30 July, 2021; originally announced August 2021.

    Comments: Published in Data Democracy: 1st Edition At the Nexus of Artificial Intelligence, Software Development, and Knowledge Engineering. (Chapter 5)

    Report number: eBook ISBN: 9780128189399, Paperback ISBN: 9780128183663

  17. arXiv:2011.14041  [pdf

    cs.RO

    A RGB-D SLAM Algorithm for Indoor Dynamic Scene

    Authors: Deng Su, Dehong Chong

    Abstract: Visual slam technology is one of the key technologies for robot to explore unknown environment independently. Accurate estimation of camera pose based on visual sensor is the basis of autonomous navigation and positioning. However, most visual slam algorithms are based on static environment assumption and cannot estimate accurate camera pose in dynamic environment. In order to solve this problem,… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

    Comments: in Chinese

  18. arXiv:2007.03781  [pdf, other

    cs.SD eess.AS

    Acoustic Scene Classification with Spectrogram Processing Strategies

    Authors: Helin Wang, Yuexian Zou, Dading Chong

    Abstract: Recently, convolutional neural networks (CNN) have achieved the state-of-the-art performance in acoustic scene classification (ASC) task. The audio data is often transformed into two-dimensional spectrogram representations, which are then fed to the neural networks. In this paper, we study the problem of efficiently taking advantage of different spectrogram representations through discriminative p… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

    Comments: Submitted to DCASE 2020 Workshop

  19. arXiv:1912.06808  [pdf, other

    cs.SD cs.LG eess.AS

    Environmental Sound Classification with Parallel Temporal-spectral Attention

    Authors: Helin Wang, Yuexian Zou, Dading Chong, Wenwu Wang

    Abstract: Convolutional neural networks (CNN) are one of the best-performing neural network architectures for environmental sound classification (ESC). Recently, temporal attention mechanisms have been used in CNN to capture the useful information from the relevant time frames for audio classification, especially for weakly labelled data where the onset and offset times of the sound events are not applied.… ▽ More

    Submitted 20 May, 2020; v1 submitted 14 December, 2019; originally announced December 2019.

    Comments: submitted to INTERSPEECH2020