Zum Hauptinhalt springen

Showing 1–9 of 9 results for author: Slizovskaia, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2212.13581  [pdf, other

    cs.SD eess.AS

    Voice conversion with limited data and limitless data augmentations

    Authors: Olga Slizovskaia, Jordi Janer, Pritish Chandna, Oscar Mayor

    Abstract: Applying changes to an input speech signal to change the perceived speaker of speech to a target while maintaining the content of the input is a challenging but interesting task known as Voice conversion (VC). Over the last few years, this task has gained significant interest where most systems use data-driven machine learning models. Doing the conversion in a low-latency real-world scenario is ev… ▽ More

    Submitted 27 December, 2022; originally announced December 2022.

  2. arXiv:2203.04197  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Locate This, Not That: Class-Conditioned Sound Event DOA Estimation

    Authors: Olga Slizovskaia, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux

    Abstract: Existing systems for sound event localization and detection (SELD) typically operate by estimating a source location for all classes at every time instant. In this paper, we propose an alternative class-conditioned SELD model for situations where we may not be interested in localizing all classes all of the time. This class-conditioned SELD model takes as input the spatial and spectral features fr… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: Accepted for publication at ICASSP 2022

  3. arXiv:2006.07931  [pdf, other

    eess.AS cs.DB cs.SD

    Solos: A Dataset for Audio-Visual Music Analysis

    Authors: Juan F. Montesinos, Olga Slizovskaia, Gloria Haro

    Abstract: In this paper, we present a new dataset of music performance videos which can be used for training machine learning methods for multiple tasks such as audio-visual blind source separation and localization, cross-modal correspondences, cross-modal generation and, in general, any audio-visual self-supervised task. These videos, gathered from YouTube, consist of solo musical performances of 13 differ… ▽ More

    Submitted 6 August, 2020; v1 submitted 14 June, 2020; originally announced June 2020.

    Comments: Rephrased some sentenced. Explanation about OpenPose. Minor grammatical errors

  4. Conditioned Source Separation for Music Instrument Performances

    Authors: Olga Slizovskaia, Gloria Haro, Emilia Gómez

    Abstract: In music source separation, the number of sources may vary for each piece and some of the sources may belong to the same family of instruments, thus sharing timbral characteristics and making the sources more correlated. This leads to additional challenges in the source separation problem. This paper proposes a source separation method for multiple musical instruments sounding simultaneously and e… ▽ More

    Submitted 7 July, 2021; v1 submitted 8 April, 2020; originally announced April 2020.

    Comments: 14 pages, 5 figures, under review

  5. arXiv:2004.02541  [pdf, other

    eess.AS cs.CV cs.LG

    Vocoder-Based Speech Synthesis from Silent Videos

    Authors: Daniel Michelsanti, Olga Slizovskaia, Gloria Haro, Emilia Gómez, Zheng-Hua Tan, Jesper Jensen

    Abstract: Both acoustic and visual information influence human perception of speech. For this reason, the lack of audio in a video sequence determines an extremely low speech intelligibility for untrained lip readers. In this paper, we present a way to synthesise speech from the silent video of a talker using deep learning. The system learns a mapping function from raw video frames to acoustic features and… ▽ More

    Submitted 15 August, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: Accepted to Interspeech 2020

  6. arXiv:1909.11480  [pdf, other

    cs.LG stat.ML

    Input complexity and out-of-distribution detection with likelihood-based generative models

    Authors: Joan Serrà, David Álvarez, Vicenç Gómez, Olga Slizovskaia, José F. Núñez, Jordi Luque

    Abstract: Likelihood-based generative models are a promising resource to detect out-of-distribution (OOD) inputs which could compromise the robustness or reliability of a machine learning system. However, likelihoods derived from such models have been shown to be problematic for detecting certain types of inputs that significantly differ from training data. In this paper, we pose that this problem is due to… ▽ More

    Submitted 17 January, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

    Comments: Accepted for ICLR2020

  7. arXiv:1907.01813  [pdf, other

    cs.SD cs.LG eess.AS

    A Case Study of Deep-Learned Activations via Hand-Crafted Audio Features

    Authors: Olga Slizovskaia, Emilia Gómez, Gloria Haro

    Abstract: The explainability of Convolutional Neural Networks (CNNs) is a particularly challenging task in all areas of application, and it is notably under-researched in music and audio domain. In this paper, we approach explainability by exploiting the knowledge we have on hand-crafted audio features. Our study focuses on a well-defined MIR task, the recognition of musical instruments from user-generated… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Comments: The 2018 Joint Workshop on Machine Learning for Music, The Federated Artificial Intelligence Meeting (FAIM), Joint workshop program of ICML, IJCAI/ECAI, and AAMAS, Stockholm, Sweden, Saturday, July 14th, 2018

  8. arXiv:1811.01850  [pdf, other

    cs.SD cs.LG eess.AS

    End-to-End Sound Source Separation Conditioned On Instrument Labels

    Authors: Olga Slizovskaia, Leo Kim, Gloria Haro, Emilia Gomez

    Abstract: Can we perform an end-to-end music source separation with a variable number of sources using a deep learning model? We present an extension of the Wave-U-Net model which allows end-to-end monaural source separation with a non-fixed number of sources. Furthermore, we propose multiplicative conditioning with instrument labels at the bottleneck of the Wave-U-Net and show its effect on the separation… ▽ More

    Submitted 9 May, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

    Comments: 5 pages, 2 figures, 2 tables, ICASSP 2019

  9. arXiv:1703.06697  [pdf, other

    cs.SD

    Timbre Analysis of Music Audio Signals with Convolutional Neural Networks

    Authors: Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, Xavier Serra

    Abstract: The focus of this work is to study how to efficiently tailor Convolutional Neural Networks (CNNs) towards learning timbre representations from log-mel magnitude spectrograms. We first review the trends when designing CNN architectures. Through this literature overview we discuss which are the crucial points to consider for efficiently learning timbre representations using CNNs. From this discussio… ▽ More

    Submitted 2 June, 2017; v1 submitted 20 March, 2017; originally announced March 2017.