Content deleted Content added
m lead |
No edit summary |
||
Line 38:
Whisper has a differing error rate with respect to transcribing different languages, with a higher [[word error rate]] in languages not well-represented in the training data.<ref>{{Cite web |last=Wiggers |first=Kyle |date=2023-03-01 |title=OpenAI debuts Whisper API for speech-to-text transcription and translation |url=https://techcrunch.com/2023/03/01/openai-debuts-whisper-api-for-text-to-speech-transcription-and-translation/ |url-status=live |archive-url=https://web.archive.org/web/20230718040023/https://techcrunch.com/2023/03/01/openai-debuts-whisper-api-for-text-to-speech-transcription-and-translation/ |archive-date=2023-07-18 |access-date=2023-08-21 |website=TechCrunch |language=en-US}}</ref>
The model has been used as the base for
== Architecture ==
The Whisper architecture is based on an encoder-decoder transformer. Input audio is split into 30-second chunks converted into a [[Mel-frequency cepstrum]], which is passed to an encoder. A decoder is trained to predict later text captions. Special tokens are used to perform several tasks, such as phrase-level timestamps.<ref name="whisperoff">{{Cite web |date=2022-09-21 |title=Introducing Whisper |url=https://openai.com/research/whisper |url-status=live |archive-url=https://web.archive.org/web/20230820005801/https://openai.com/research/whisper |archive-date=2023-08-20 |access-date=2023-08-21 |website=openai.com |language=en-US}}</ref>
== See also ==
|