Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: Cabrera, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.08133  [pdf, ps, other

    eess.AS cs.CL

    Large-scale Language Model Rescoring on Long-form Data

    Authors: Tongzhou Chen, Cyril Allauzen, Yinghui Huang, Daniel Park, David Rybach, W. Ronny Huang, Rodrigo Cabrera, Kartik Audhkhasi, Bhuvana Ramabhadran, Pedro J. Moreno, Michael Riley

    Abstract: In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error Eate (WER) on US English (en-us) and code-switched Indian English (en-in) long-form ASR test sets and a reduction of up to 30\% relative on Salient Term Error Rate (STER)… ▽ More

    Submitted 5 September, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: 5 pages, accepted in ICASSP 2023

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  2. arXiv:2302.08917  [pdf, other

    cs.CL cs.LG

    Massively Multilingual Shallow Fusion with Large Language Models

    Authors: Ke Hu, Tara N. Sainath, Bo Li, Nan Du, Yanping Huang, Andrew M. Dai, Yu Zhang, Rodrigo Cabrera, Zhifeng Chen, Trevor Strohman

    Abstract: While large language models (LLM) have made impressive progress in natural language processing, it remains unclear how to utilize them in improving automatic speech recognition (ASR). In this work, we propose to train a single multilingual language model (LM) for shallow fusion in multiple languages. We push the limits of the multilingual LM to cover up to 84 languages by scaling up using a mixtur… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

    Comments: Accepted to IEEE ICASSP 2023

  3. arXiv:2104.04487  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Language model fusion for streaming end to end speech recognition

    Authors: Rodrigo Cabrera, Xiaofeng Liu, Mohammadreza Ghodsi, Zebulun Matteson, Eugene Weinstein, Anjuli Kannan

    Abstract: Streaming processing of speech audio is required for many contemporary practical speech recognition tasks. Even with the large corpora of manually transcribed speech data available today, it is impossible for such corpora to cover adequately the long tail of linguistic content that's important for tasks such as open-ended dictation and voice search. We seek to address both the streaming and the ta… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: 5 pages