Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models

B Yusuf, MK Baskar, A Rosenberg… - arXiv preprint arXiv …, 2024 - arxiv.org
arXiv preprint arXiv:2407.04641, 2024arxiv.org
This paper explores speculative speech recognition (SSR), where we empower
conventional automatic speech recognition (ASR) with speculation capabilities, allowing the
recognizer to run ahead of audio. We introduce a metric for measuring SSR performance
and we propose a model which does SSR by combining a RNN-Transducer-based ASR
system with an audio-prefixed language model (LM). The ASR system transcribes ongoing
audio and feeds the resulting transcripts, along with an audio-dependent prefix, to the LM …
This paper explores speculative speech recognition (SSR), where we empower conventional automatic speech recognition (ASR) with speculation capabilities, allowing the recognizer to run ahead of audio. We introduce a metric for measuring SSR performance and we propose a model which does SSR by combining a RNN-Transducer-based ASR system with an audio-prefixed language model (LM). The ASR system transcribes ongoing audio and feeds the resulting transcripts, along with an audio-dependent prefix, to the LM, which speculates likely completions for the transcriptions. We experiment with a variety of ASR datasets on which show the efficacy our method and the feasibility of SSR as a method of reducing ASR latency.
arxiv.org