Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: Korostik, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.16074  [pdf, ps, other

    eess.AS

    Schrödinger Bridge for Generative Speech Enhancement

    Authors: Ante Jukić, Roman Korostik, Jagadeesh Balam, Boris Ginsburg

    Abstract: This paper proposes a generative speech enhancement model based on Schrödinger bridge (SB). The proposed model is employing a tractable SB to formulate a data-to-data process between the clean speech distribution and the observed noisy speech distribution. The model is trained with a data prediction loss, aiming to recover the complex-valued clean speech coefficients, and an auxiliary time-domain… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  2. arXiv:2302.14036  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator

    Authors: Vladimir Bataev, Roman Korostik, Evgeny Shabalin, Vitaly Lavrukhin, Boris Ginsburg

    Abstract: We propose an end-to-end Automatic Speech Recognition (ASR) system that can be trained on transcribed speech data, text-only data, or a mixture of both. The proposed model uses an integrated auxiliary block for text-based training. This block combines a non-autoregressive multi-speaker text-to-mel-spectrogram generator with a GAN-based enhancer to improve the spectrogram quality. The proposed syst… ▽ More

    Submitted 16 August, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: Accepted to INTERSPEECH 2023

  3. arXiv:2005.07157  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation

    Authors: Aleksandr Laptev, Roman Korostik, Aleksey Svischev, Andrei Andrusenko, Ivan Medennikov, Sergey Rybin

    Abstract: Data augmentation is one of the most effective ways to make end-to-end automatic speech recognition (ASR) perform close to the conventional hybrid approach, especially when dealing with low-resource tasks. Using recent advances in speech synthesis (text-to-speech, or TTS), we build our TTS system on an ASR training database and then extend the data with synthesized speech to train a recognition mo… ▽ More

    Submitted 30 July, 2020; v1 submitted 14 May, 2020; originally announced May 2020.