Zum Hauptinhalt springen

Showing 1–4 of 4 results for author: Kosgi, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.11926  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting

    Authors: Neil Shah, Vishal Tambrahalli, Saiteja Kosgi, Niranjan Pedanekar, Vineet Gandhi

    Abstract: We present MParrotTTS, a unified multilingual, multi-speaker text-to-speech (TTS) synthesis model that can produce high-quality speech. Benefiting from a modularized training paradigm exploiting self-supervised speech representations, MParrotTTS adapts to a new language with minimal supervised data and generalizes to languages not seen while training the self-supervised backbone. Moreover, without… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: 5 pages, 1 figure

  2. arXiv:2303.01261  [pdf, other

    cs.CL cs.SD eess.AS

    ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations

    Authors: Neil Shah, Saiteja Kosgi, Vishal Tambrahalli, Neha Sahipjohn, Niranjan Pedanekar, Vineet Gandhi

    Abstract: We present ParrotTTS, a modularized text-to-speech synthesis model leveraging disentangled self-supervised speech representations. It can train a multi-speaker variant effectively using transcripts from a single speaker. ParrotTTS adapts to a new language in low resource setup and generalizes to languages not seen while training the self-supervised backbone. Moreover, without training on bilingual… ▽ More

    Submitted 16 December, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  3. Emotional Prosody Control for Speech Generation

    Authors: Sarath Sivaprasad, Saiteja Kosgi, Vineet Gandhi

    Abstract: Machine-generated speech is characterized by its limited or unnatural emotional variation. Current text to speech systems generates speech with either a flat emotion, emotion selected from a predefined set, average variation learned from prosody sequences in training data or transferred from a source style. We propose a text to speech(TTS) system, where a user can choose the emotion of generated s… ▽ More

    Submitted 7 November, 2021; originally announced November 2021.

  4. arXiv:2110.07981  [pdf, other

    cs.LG cs.AI

    Reappraising Domain Generalization in Neural Networks

    Authors: Sarath Sivaprasad, Akshay Goindani, Vaibhav Garg, Ritam Basu, Saiteja Kosgi, Vineet Gandhi

    Abstract: Given that Neural Networks generalize unreasonably well in the IID setting (with benign overfitting and betterment in performance with more parameters), OOD presents a consistent failure case to better the understanding of how they learn. This paper focuses on Domain Generalization (DG), which is perceived as the front face of OOD generalization. We find that the presence of multiple domains incen… ▽ More

    Submitted 28 April, 2022; v1 submitted 15 October, 2021; originally announced October 2021.