Zum Hauptinhalt springen

Showing 1–16 of 16 results for author: Chuang, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.17022  [pdf

    cs.SD eess.AS

    Investigating differences in lab-quality and remote recording methods with dynamic acoustic measures

    Authors: Cong Zhang, Kathleen Jepson, Yu-Ying Chuang

    Abstract: Increasingly, phonetic research utilizes data collected from participants who record themselves on readily available devices. Though such recordings are convenient, their suitability for acoustic analysis remains an open question, especially regarding how the individual methods affect acoustic measures over time. We used Quantile Generalized Additive Mixed Models (QGAMMs) to analyze measures of F0… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  2. arXiv:2404.09385  [pdf, other

    eess.AS cs.CL eess.SP

    A Large-Scale Evaluation of Speech Foundation Models

    Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

    Abstract: The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferred

  3. arXiv:2401.13463  [pdf, other

    cs.CL cs.IR cs.SD eess.AS

    SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering

    Authors: Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Lin-shan Lee

    Abstract: Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. SQA has been previously achieved without ASR to avoid recognition errors and Out-of-Vocabulary (OOV) problems. However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the ans… ▽ More

    Submitted 24 August, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  4. arXiv:2311.05477  [pdf, other

    eess.IV cs.CV cs.LG

    Using ResNet to Utilize 4-class T2-FLAIR Slice Classification Based on the Cholinergic Pathways Hyperintensities Scale for Pathological Aging

    Authors: Wei-Chun Kevin Tsai, Yi-Chien Liu, Ming-Chun Yu, Chia-Ju Chou, Sui-Hing Yan, Yang-Teng Fan, Yan-Hsiang Huang, Yen-Ling Chiu, Yi-Fang Chuang, Ran-Zan Wang, Yao-Chia Shih

    Abstract: The Cholinergic Pathways Hyperintensities Scale (CHIPS) is a visual rating scale used to assess the extent of cholinergic white matter hyperintensities in T2-FLAIR images, serving as an indicator of dementia severity. However, the manual selection of four specific slices for rating throughout the entire brain is a time-consuming process. Our goal was to develop a deep learning-based model capable… ▽ More

    Submitted 11 September, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: 8 pages, 2 figures, 2 tables

  5. arXiv:2310.07654  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Audio-Visual Neural Syntax Acquisition

    Authors: Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass

    Abstract: We study phrase structure induction from visually-grounded speech. The core idea is to first segment the speech waveform into sequences of word segments, and subsequently induce phrase structure using the inferred segment-level continuous representations. We present the Audio-Visual Neural Syntax Learner (AV-NSL) that learns phrase structure by listening to audio and looking at images, without eve… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  6. arXiv:2203.04911  [pdf, other

    cs.CL cs.SD eess.AS

    DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

    Authors: Guan-Ting Lin, Yung-Sung Chuang, Ho-Lam Chung, Shu-wen Yang, Hsuan-Jui Chen, Shuyan Dong, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Lin-shan Lee

    Abstract: Spoken Question Answering (SQA) is to find the answer from a spoken document given a question, which is crucial for personal assistants when replying to the queries from the users. Existing SQA methods all rely on Automatic Speech Recognition (ASR) transcripts. Not only does ASR need to be trained with massive annotated data that are time and cost-prohibitive to collect for low-resourced languages… ▽ More

    Submitted 21 June, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: Accepted by Interspeech 2022

  7. arXiv:2110.01147  [pdf, other

    cs.SD cs.CL eess.AS

    On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis

    Authors: Cheng-I Jeff Lai, Erica Cooper, Yang Zhang, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David Cox, James Glass

    Abstract: Are end-to-end text-to-speech (TTS) models over-parametrized? To what extent can these models be pruned, and what happens to their synthesis capabilities? This work serves as a starting point to explore pruning both spectrogram prediction networks and vocoders. We thoroughly investigate the tradeoffs between sparsity and its subsequent effects on synthetic speech. Additionally, we explored several… ▽ More

    Submitted 27 October, 2021; v1 submitted 3 October, 2021; originally announced October 2021.

  8. arXiv:2106.05933  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

    Authors: Cheng-I Jeff Lai, Yang Zhang, Alexander H. Liu, Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, James Glass

    Abstract: Self-supervised speech representation learning (speech SSL) has demonstrated the benefit of scale in learning rich representations for Automatic Speech Recognition (ASR) with limited paired data, such as wav2vec 2.0. We investigate the existence of sparse subnetworks in pre-trained speech SSL models that achieve even better low-resource ASR results. However, directly applying widely adopted prunin… ▽ More

    Submitted 26 October, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

  9. arXiv:2106.00497  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Omnizart: A General Toolbox for Automatic Music Transcription

    Authors: Yu-Te Wu, Yin-Jyun Luo, Tsung-Ping Chen, I-Chieh Wei, Jui-Yang Hsu, Yi-Chin Chuang, Li Su

    Abstract: We present and release Omnizart, a new Python library that provides a streamlined solution to automatic music transcription (AMT). Omnizart encompasses modules that construct the life-cycle of deep learning-based AMT, and is designed for ease of use with a compact command-line interface. To the best of our knowledge, Omnizart is the first transcription toolkit which offers models covering a wide c… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

  10. arXiv:2105.01051  [pdf, ps, other

    cs.CL cs.SD eess.AS

    SUPERB: Speech processing Universal PERformance Benchmark

    Authors: Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

    Abstract: Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for various tasks with minimal adaptation. However, the speech processing community lacks a similar setup to systematically explore the paradigm. To bridge… ▽ More

    Submitted 15 October, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: To appear in Interspeech 2021

  11. arXiv:2008.04902  [pdf, other

    cs.CV eess.IV

    Learning to See Through Obstructions with Layered Decomposition

    Authors: Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang

    Abstract: We present a learning-based approach for removing unwanted obstructions, such as window reflections, fence occlusions, or adherent raindrops, from a short sequence of images captured by a moving camera. Our method leverages motion differences between the background and obstructing elements to recover both layers. Specifically, we alternate between estimating dense optical flow fields of the two la… ▽ More

    Submitted 25 July, 2021; v1 submitted 11 August, 2020; originally announced August 2020.

    Comments: Project page: https://alex04072000.github.io/SOLD/ Code: https://github.com/alex04072000/SOLD Extension of the CVPR 2020 paper: arXiv:2004.01180

  12. arXiv:2004.14252  [pdf

    eess.SP

    Task-Projected Hyperdimensional Computing for Multi-Task Learning

    Authors: Cheng-Yang Chang, Yu-Chuan Chuang, An-Yeu Wu

    Abstract: Brain-inspired Hyperdimensional (HD) computing is an emerging technique for cognitive tasks in the field of low-power design. As a fast-learning and energy-efficient computational paradigm, HD computing has shown great success in many real-world applications. However, an HD model incrementally trained on multiple tasks suffers from the negative impacts of catastrophic forgetting. The model forgets… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

    Comments: To be published in 16th International Conference on Artificial Intelligence Applications and Innovations

  13. arXiv:2004.09089  [pdf, other

    eess.IV cs.CV

    Deep Exposure Fusion with Deghosting via Homography Estimation and Attention Learning

    Authors: Sheng-Yeh Chen, Yung-Yu Chuang

    Abstract: Modern cameras have limited dynamic ranges and often produce images with saturated or dark regions using a single exposure. Although the problem could be addressed by taking multiple images with different exposures, exposure fusion methods need to deal with ghosting artifacts and detail loss caused by camera motion or moving objects. This paper proposes a deep network for exposure fusion. For redu… ▽ More

    Submitted 20 April, 2020; originally announced April 2020.

    Comments: ICASSP 2020

  14. arXiv:2004.01179  [pdf, other

    eess.IV cs.CV

    Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline

    Authors: Yu-Lun Liu, Wei-Sheng Lai, Yu-Sheng Chen, Yi-Lung Kao, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang

    Abstract: Recovering a high dynamic range (HDR) image from a single low dynamic range (LDR) input image is challenging due to missing details in under-/over-exposed regions caused by quantization and saturation of camera sensors. In contrast to existing learning-based methods, our core idea is to incorporate the domain knowledge of the LDR image formation pipeline into our model. We model the HDRto-LDR imag… ▽ More

    Submitted 2 April, 2020; originally announced April 2020.

    Comments: CVPR 2020. Project page: https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR Code: https://github.com/alex04072000/SingleHDR

  15. arXiv:1910.11559  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechBERT: An Audio-and-text Jointly Learned Language Model for End-to-end Spoken Question Answering

    Authors: Yung-Sung Chuang, Chi-Liang Liu, Hung-Yi Lee, Lin-shan Lee

    Abstract: While various end-to-end models for spoken language understanding tasks have been explored recently, this paper is probably the first known attempt to challenge the very difficult task of end-to-end spoken question answering (SQA). Learning from the very successful BERT model for various text processing tasks, here we proposed an audio-and-text jointly learned SpeechBERT model. This model outperfo… ▽ More

    Submitted 11 August, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

    Comments: Interspeech 2020

  16. arXiv:1809.08410  [pdf, other

    q-bio.QM cs.LG eess.SP

    Entropy-Assisted Multi-Modal Emotion Recognition Framework Based on Physiological Signals

    Authors: Kuan Tung, Po-Kang Liu, Yu-Chuan Chuang, Sheng-Hui Wang, An-Yeu Wu

    Abstract: As the result of the growing importance of the Human Computer Interface system, understanding human's emotion states has become a consequential ability for the computer. This paper aims to improve the performance of emotion recognition by conducting the complexity analysis of physiological signals. Based on AMIGOS dataset, we extracted several entropy-domain features such as Refined Composite Mult… ▽ More

    Submitted 22 September, 2018; originally announced September 2018.