Google Scholar

Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models

B Yusuf, MK Baskar, A Rosenberg… - arXiv preprint arXiv …, 2024 - arxiv.org

B Yusuf, MK Baskar, A Rosenberg, B Ramabhadran

arXiv preprint arXiv:2407.04641, 2024•arxiv.org

This paper explores speculative speech recognition (SSR), where we empower
conventional automatic speech recognition (ASR) with speculation capabilities, allowing the
recognizer to run ahead of audio. We introduce a metric for measuring SSR performance
and we propose a model which does SSR by combining a RNN-Transducer-based ASR
system with an audio-prefixed language model (LM). The ASR system transcribes ongoing
audio and feeds the resulting transcripts, along with an audio-dependent prefix, to the LM …

This paper explores speculative speech recognition (SSR), where we empower conventional automatic speech recognition (ASR) with speculation capabilities, allowing the recognizer to run ahead of audio. We introduce a metric for measuring SSR performance and we propose a model which does SSR by combining a RNN-Transducer-based ASR system with an audio-prefixed language model (LM). The ASR system transcribes ongoing audio and feeds the resulting transcripts, along with an audio-dependent prefix, to the LM, which speculates likely completions for the transcriptions. We experiment with a variety of ASR datasets on which show the efficacy our method and the feasibility of SSR as a method of reducing ASR latency.

arxiv.org

Show moreShow less

Speichern Sie Cite Related articles All 2 versions View as HTML

Cite

Advanced search

Saved to My library

Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models