Revision as of 16:34, 30 October 2023 edit Dotx3 (talk \| contribs) Extended confirmed users 2,084 edits m lead Tag: Visual edit ← Previous edit		Revision as of 09:58, 8 November 2023 edit undo Bahariada (talk \| contribs) 65 edits No edit summary Tag: Visual edit Next edit →
Line 38: Whisper has a differing error rate with respect to transcribing different languages, with a higher [[word error rate]] in languages not well-represented in the training data.<ref>{{Cite web \|last=Wiggers \|first=Kyle \|date=2023-03-01 \|title=OpenAI debuts Whisper API for speech-to-text transcription and translation \|url=https://techcrunch.com/2023/03/01/openai-debuts-whisper-api-for-text-to-speech-transcription-and-translation/ \|url-status=live \|archive-url=https://web.archive.org/web/20230718040023/https://techcrunch.com/2023/03/01/openai-debuts-whisper-api-for-text-to-speech-transcription-and-translation/ \|archive-date=2023-07-18 \|access-date=2023-08-21 \|website=TechCrunch \|language=en-US}}</ref> The model has been used as the base for ana unified model for speech recognition and more general [[sound recognition]].<ref>{{Cite arXiv \|arxiv=2307.03183 \|first1=Gong \|last1=Yuan \|first2=Sameer \|last2=Khurana \|title=Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers \|first3=Leonid \|last3=Karlinsky \|first4=James \|last4=Glass}}</ref> == Architecture == The Whisper architecture is based on an encoder-decoder transformer. Input audio is split into 30-second chunks converted into a [[Mel-frequency cepstrum]], which is passed to an encoder. A decoder is trained to predict later text captions. Special tokens are used to perform several tasks, such as phrase-level timestamps.<ref name="whisperoff">{{Cite web \|date=2022-09-21 \|title=Introducing Whisper \|url=https://openai.com/research/whisper \|url-status=live \|archive-url=https://web.archive.org/web/20230820005801/https://openai.com/research/whisper \|archive-date=2023-08-20 \|access-date=2023-08-21 \|website=openai.com \|language=en-US}}</ref> == See also ==

Whisper (speech recognition system): Difference between revisions