Zum Hauptinhalt springen

Showing 1–4 of 4 results for author: Siddhartha, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.00230  [pdf, other

    cs.CL cs.SD eess.AS

    SLM: Bridge the thin gap between speech and text foundation models

    Authors: Mingqiu Wang, Wei Han, Izhak Shafran, Zelin Wu, Chung-Cheng Chiu, Yuan Cao, Yongqiang Wang, Nanxin Chen, Yu Zhang, Hagen Soltau, Paul Rubenstein, Lukas Zilka, Dian Yu, Zhong Meng, Golan Pundak, Nikhil Siddhartha, Johan Schalkwyk, Yonghui Wu

    Abstract: We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-modal model that takes advantage of pretrained foundational speech and language models. SLM freezes the pretrained foundation models to maximally preserves their capabilities, and only trains a simple adapter with just 1\% (156M) of the foundation models' parameters. This adaptation not only leads SLM to achiev… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

  2. arXiv:2111.08137  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Joint Unsupervised and Supervised Training for Multilingual ASR

    Authors: Junwen Bai, Bo Li, Yu Zhang, Ankur Bapna, Nikhil Siddhartha, Khe Chai Sim, Tara N. Sainath

    Abstract: Self-supervised training has shown promising gains in pretraining models and facilitating the downstream finetuning for speech recognition, like multilingual ASR. Most existing methods adopt a 2-stage scheme where the self-supervised loss is optimized in the first pretraining stage, and the standard supervised finetuning resumes in the second stage. In this paper, we propose an end-to-end (E2E) Jo… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

  3. arXiv:2110.00165  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning

    Authors: Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He

    Abstract: Self- and semi-supervised learning methods have been actively investigated to reduce labeled training data or enhance the model performance. However, the approach mostly focus on in-domain performance for public datasets. In this study, we utilize the combination of self- and semi-supervised learning methods to solve unseen domain adaptation problem in a large-scale production setting for online A… ▽ More

    Submitted 15 February, 2022; v1 submitted 30 September, 2021; originally announced October 2021.

    Comments: ICASSP 2022 accepted, 5 pages, 2 figures, 5 tables

  4. arXiv:2110.00155  [pdf, other

    cs.SD cs.LG eess.AS

    Incremental Layer-wise Self-Supervised Learning for Efficient Speech Domain Adaptation On Device

    Authors: Zhouyuan Huo, Dongseong Hwang, Khe Chai Sim, Shefali Garg, Ananya Misra, Nikhil Siddhartha, Trevor Strohman, Françoise Beaufays

    Abstract: Streaming end-to-end speech recognition models have been widely applied to mobile devices and show significant improvement in efficiency. These models are typically trained on the server using transcribed speech data. However, the server data distribution can be very different from the data distribution on user devices, which could affect the model performance. There are two main challenges for on… ▽ More

    Submitted 30 September, 2021; originally announced October 2021.

    Comments: 5 pages