Zum Hauptinhalt springen

Showing 1–28 of 28 results for author: Tzinis, E

Searching in archive cs. Search in all archives.
.
  1. Complete and separate: Conditional separation with missing target source attribute completion

    Authors: Dimitrios Bralios, Efthymios Tzinis, Paris Smaragdis

    Abstract: Recent approaches in source separation leverage semantic information about their input mixtures and constituent sources that when used in conditional separation models can achieve impressive performance. Most approaches along these lines have focused on simple descriptions, which are not always useful for varying types of input mixtures. In this work, we present an approach in which a model, given… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023

    Journal ref: 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

  2. The CHiME-7 UDASE task: Unsupervised domain adaptation for conversational speech enhancement

    Authors: Simon Leglaive, Léonie Borne, Efthymios Tzinis, Mostafa Sadeghi, Matthieu Fraticelli, Scott Wisdom, Manuel Pariente, Daniel Pressnitzer, John R. Hershey

    Abstract: Supervised speech enhancement models are trained using artificially generated mixtures of clean speech and noise signals, which may not match real-world recording conditions at test time. This mismatch can lead to poor performance if the test domain significantly differs from the synthetic training domain. This paper introduces the unsupervised domain adaptation for conversational speech enhanceme… ▽ More

    Submitted 2 October, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

    Journal ref: The 7th International Workshop on Speech Processing in Everyday Environments (CHiME), Dublin, Ireland, 2023

  3. Latent Iterative Refinement for Modular Source Separation

    Authors: Dimitrios Bralios, Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux

    Abstract: Traditional source separation approaches train deep neural network models end-to-end with all the data available at once by minimizing the empirical risk on the whole training set. On the inference side, after training the model, the user fetches a static computation graph and runs the full model on some specified observed mixture signal to get the estimated source signals. Additionally, many of t… ▽ More

    Submitted 15 October, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  4. arXiv:2211.05927  [pdf, other

    cs.SD cs.LG eess.AS

    Optimal Condition Training for Target Source Separation

    Authors: Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux

    Abstract: Recent research has shown remarkable performance in leveraging multiple extraneous conditional and non-mutually exclusive semantic concepts for sound source separation, allowing the flexibility to extract a given target source based on multiple different queries. In this work, we propose a new optimal condition training (OCT) method for single-channel target source separation, based on greedy para… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  5. arXiv:2207.10141  [pdf, other

    cs.SD cs.CV eess.AS

    AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation

    Authors: Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey

    Abstract: We introduce AudioScopeV2, a state-of-the-art universal audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos. We identify several limitations of previous work on audio-visual on-screen sound separation, including the coarse resolution of spatio-temporal attention, poor convergence o… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: ECCV 2022

  6. arXiv:2205.07390  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Learning Representations for New Sound Classes With Continual Self-Supervised Learning

    Authors: Zhepei Wang, Cem Subakan, Xilin Jiang, Junkai Wu, Efthymios Tzinis, Mirco Ravanelli, Paris Smaragdis

    Abstract: In this paper, we work on a sound recognition system that continually incorporates new sound classes. Our main goal is to develop a framework where the model can be updated without relying on labeled data. For this purpose, we propose adopting representation learning, where an encoder is trained using unlabeled data. This learning framework enables the study and implementation of a practically rel… ▽ More

    Submitted 13 December, 2022; v1 submitted 15 May, 2022; originally announced May 2022.

    Comments: Accepted to IEEE Signal Processing Letters

  7. Heterogeneous Target Speech Separation

    Authors: Efthymios Tzinis, Gordon Wichern, Aswin Subramanian, Paris Smaragdis, Jonathan Le Roux

    Abstract: We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts (e.g., loudness, gender, language, spatial location, etc). Our proposed heterogeneous separation framework can seamlessly leverage datasets with large distribution shifts and learn cross-domain representations under a variety of concepts u… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: Submitted to Interspeech 2022

    Journal ref: Interspeech 2022

  8. arXiv:2202.08862  [pdf, other

    cs.SD cs.LG eess.AS

    RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing

    Authors: Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar

    Abstract: We present RemixIT, a simple yet effective self-supervised method for training speech enhancement without the need of a single isolated in-domain speech nor a noise waveform. Our approach overcomes limitations of previous methods which make them dependent on clean in-domain target signals and thus, sensitive to any domain mismatch between train and test samples. RemixIT is based on a continuous se… ▽ More

    Submitted 3 August, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

    Comments: To appear in IEEE Journal of Selected Topics in Signal Processing

    Journal ref: J-STSP-SLSAP-00040-2022

  9. Continual self-training with bootstrapped remixing for speech enhancement

    Authors: Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar

    Abstract: We propose RemixIT, a simple and novel self-supervised training method for speech enhancement. The proposed method is based on a continuously self-training scheme that overcomes limitations from previous studies including assumptions for the in-domain noise distribution and having access to clean target signals. Specifically, a separation teacher model is pre-trained on an out-of-domain dataset an… ▽ More

    Submitted 29 January, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: To appear in Proc. ICASSP 2022, May 22-27, 2022, Singapore

    Journal ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  10. arXiv:2106.09669  [pdf, other

    cs.SD cs.CV cs.LG

    Improving On-Screen Sound Separation for Open-Domain Videos with Audio-Visual Self-Attention

    Authors: Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey

    Abstract: We introduce a state-of-the-art audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos. We identify limitations of previous work on audio-visual on-screen sound separation, including the simplicity and coarse resolution of spatio-temporal attention, and poor convergence of the audio s… ▽ More

    Submitted 14 October, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

  11. arXiv:2105.07596  [pdf, other

    cs.SD eess.AS

    Sound Event Detection with Adaptive Frequency Selection

    Authors: Zhepei Wang, Jonah Casebeer, Adam Clemmitt, Efthymios Tzinis, Paris Smaragdis

    Abstract: In this work, we present HIDACT, a novel network architecture for adaptive computation for efficiently recognizing acoustic events. We evaluate the model on a sound event detection task where we train it to adaptively process frequency bands. The model learns to adapt to the input without requesting all frequency sub-bands provided. It can make confident predictions within fewer processing steps,… ▽ More

    Submitted 29 July, 2021; v1 submitted 16 May, 2021; originally announced May 2021.

    Comments: Accepted by IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2021

  12. arXiv:2105.04727  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Separate but Together: Unsupervised Federated Learning for Speech Enhancement from Non-IID Data

    Authors: Efthymios Tzinis, Jonah Casebeer, Zhepei Wang, Paris Smaragdis

    Abstract: We propose FEDENHANCE, an unsupervised federated learning (FL) approach for speech enhancement and separation with non-IID distributed data across multiple clients. We simulate a real-world scenario where each client only has access to a few noisy recordings from a limited and disjoint number of speakers (hence non-IID). Each client trains their model in isolation using mixture invariant training… ▽ More

    Submitted 26 September, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

    Comments: Accepted to WASPAA 21

    Journal ref: 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

  13. Unsupervised low-rank representations for speech emotion recognition

    Authors: Georgios Paraskevopoulos, Efthymios Tzinis, Nikolaos Ellinas, Theodoros Giannakopoulos, Alexandros Potamianos

    Abstract: We examine the use of linear and non-linear dimensionality reduction algorithms for extracting low-rank feature representations for speech emotion recognition. Two feature sets are used, one based on low-level descriptors and their aggregations (IS10) and one modeling recurrence dynamics of speech (RQA), as well as their fusion. We report speech emotion recognition (SER) results for learned repres… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: Published at Interspeech 2019 https://www.isca-speech.org/archive/Interspeech_2019/abstracts/2769.html

  14. arXiv:2103.02644  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Compute and memory efficient universal sound source separation

    Authors: Efthymios Tzinis, Zhepei Wang, Xilin Jiang, Paris Smaragdis

    Abstract: Recent progress in audio source separation lead by deep learning has enabled many neural network models to provide robust solutions to this fundamental estimation problem. In this study, we provide a family of efficient neural network architectures for general purpose audio source separation while focusing on multiple computational aspects that hinder the application of neural networks in real-wor… ▽ More

    Submitted 14 July, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

    Comments: Accepted to Journal of Signal Processing Systems https://www.springer.com/journal/11265. arXiv admin note: substantial text overlap with arXiv:2007.06833

  15. arXiv:2011.01143  [pdf, other

    cs.SD cs.CV eess.AS

    Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

    Authors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey

    Abstract: Recent progress in deep learning has enabled many advances in sound separation and visual scene understanding. However, extracting sound sources which are apparent in natural videos remains an open problem. In this work, we present AudioScope, a novel audio-visual sound separation framework that can be trained without supervision to isolate on-screen sound sources from real in-the-wild videos. Pri… ▽ More

    Submitted 29 May, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: ICLR 2021, 27 pages

  16. Unified Gradient Reweighting for Model Biasing with Applications to Source Separation

    Authors: Efthymios Tzinis, Dimitrios Bralios, Paris Smaragdis

    Abstract: Recent deep learning approaches have shown great improvement in audio source separation tasks. However, the vast majority of such work is focused on improving average separation performance, often neglecting to examine or control the distribution of the results. In this paper, we propose a simple, unified gradient reweighting scheme, with a lightweight modification to bias the learning process of… ▽ More

    Submitted 25 October, 2020; originally announced October 2020.

    Journal ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  17. arXiv:2007.06833  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Sudo rm -rf: Efficient Networks for Universal Audio Source Separation

    Authors: Efthymios Tzinis, Zhepei Wang, Paris Smaragdis

    Abstract: In this paper, we present an efficient neural network for end-to-end general purpose audio source separation. Specifically, the backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRMRF) as well as their aggregation which is performed through simple one-dimensional convolutions. In this way, we are able to obtain high qual… ▽ More

    Submitted 14 July, 2020; originally announced July 2020.

    Comments: accepted to MLSP 2020

    Journal ref: Published in 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP)

  18. arXiv:2006.12701  [pdf, other

    eess.AS cs.LG cs.SD

    Unsupervised Sound Separation Using Mixture Invariant Training

    Authors: Scott Wisdom, Efthymios Tzinis, Hakan Erdogan, Ron J. Weiss, Kevin Wilson, John R. Hershey

    Abstract: In recent years, rapid progress has been made on the problem of single-channel sound separation using supervised training of deep neural networks. In such supervised approaches, a model is trained to predict the component sources from synthetic mixtures created by adding up isolated ground-truth sources. Reliance on this synthetic training data is problematic because good performance depends upon… ▽ More

    Submitted 23 October, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: Accepted for spotlight presentation at NeurIPS 2020

  19. arXiv:2005.04132  [pdf, other

    eess.AS cs.SD

    Asteroid: the PyTorch-based audio source separation toolkit for researchers

    Authors: Manuel Pariente, Samuele Cornell, Joris Cosentino, Sunit Sivasankaran, Efthymios Tzinis, Jens Heitkaemper, Michel Olvera, Fabian-Robert Stöter, Mathieu Hu, Juan M. Martín-Doñas, David Ditter, Ariel Frank, Antoine Deleforge, Emmanuel Vincent

    Abstract: This paper describes Asteroid, the PyTorch-based audio source separation toolkit for researchers. Inspired by the most successful neural source separation systems, it provides all neural building blocks required to build such a system. To improve reproducibility, Kaldi-style recipes on common audio source separation datasets are also provided. This paper describes the software architecture of Aste… ▽ More

    Submitted 8 May, 2020; originally announced May 2020.

    Comments: Submitted to Interspeech 2020

  20. arXiv:1911.07951  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Improving Universal Sound Separation Using Sound Classification

    Authors: Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis

    Abstract: Deep learning approaches have recently achieved impressive performance on both audio source separation and sound classification. Most audio source separation approaches focus only on separating sources belonging to a restricted domain of source classes, such as speech and music. However, recent work has demonstrated the possibility of "universal sound separation", which aims to separate acoustic s… ▽ More

    Submitted 18 November, 2019; originally announced November 2019.

    Journal ref: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  21. arXiv:1911.00102  [pdf, other

    cs.SD eess.AS

    End-to-end Non-Negative Autoencoders for Sound Source Separation

    Authors: Shrikant Venkataramani, Efthymios Tzinis, Paris Smaragdis

    Abstract: Discriminative models for source separation have recently been shown to produce impressive results. However, when operating on sources outside of the training set, these models can not perform as well and are cumbersome to update. Classical methods like Non-negative Matrix Factorization (NMF) provide modular approaches to source separation that can be easily updated to adapt to new mixture scenari… ▽ More

    Submitted 31 October, 2019; originally announced November 2019.

  22. arXiv:1910.09804  [pdf, other

    cs.LG cs.CL cs.SD eess.AS stat.ML

    Two-Step Sound Source Separation: Training on Learned Latent Targets

    Authors: Efthymios Tzinis, Shrikant Venkataramani, Zhepei Wang, Cem Subakan, Paris Smaragdis

    Abstract: In this paper, we propose a two-step training procedure for source separation via a deep neural network. In the first step we learn a transform (and it's inverse) to a latent space where masking-based separation performance using oracles is optimal. For the second step, we train a separation module that operates on the previously learned space. In order to do so, we also make use of a scale-invari… ▽ More

    Submitted 23 October, 2019; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: Submitted to ICASSP 2020

    Journal ref: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  23. arXiv:1906.00654  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Continual Learning of New Sound Classes using Generative Replay

    Authors: Zhepei Wang, Cem Subakan, Efthymios Tzinis, Paris Smaragdis, Laurent Charlin

    Abstract: Continual learning consists in incrementally training a model on a sequence of datasets and testing on the union of all datasets. In this paper, we examine continual learning for the problem of sound classification, in which we wish to refine already trained models to learn new sound classes. In practice one does not want to maintain all past training data and retrain from scratch, but naively upd… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

  24. arXiv:1905.00151  [pdf, other

    cs.SD eess.AS

    A Style Transfer Approach to Source Separation

    Authors: Shrikant Venkataramani, Efthymios Tzinis, Paris Smaragdis

    Abstract: Training neural networks for source separation involves presenting a mixture recording at the input of the network and updating network parameters in order to produce an output that resembles the clean source. Consequently, supervised source separation depends on the availability of paired mixture-clean training examples. In this paper, we interpret source separation as a style transfer problem. W… ▽ More

    Submitted 9 May, 2019; v1 submitted 30 April, 2019; originally announced May 2019.

  25. arXiv:1902.01482  [pdf, other

    cs.LG stat.ML

    Bootstrapped Coordinate Search for Multidimensional Scaling

    Authors: Efthymios Tzinis

    Abstract: In this work, a unified framework for gradient-free Multidimensional Scaling (MDS) based on Coordinate Search (CS) is proposed. This family of algorithms is an instance of General Pattern Search (GPS) methods which avoid the explicit computation of derivatives but instead evaluate the objective function while searching on coordinate steps of the embedding space. The backbone element of CSMDS frame… ▽ More

    Submitted 4 February, 2019; originally announced February 2019.

  26. arXiv:1811.04133  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Integrating Recurrence Dynamics for Speech Emotion Recognition

    Authors: Efthymios Tzinis, Georgios Paraskevopoulos, Christos Baziotis, Alexandros Potamianos

    Abstract: We investigate the performance of features that can capture nonlinear recurrence dynamics embedded in the speech signal for the task of Speech Emotion Recognition (SER). Reconstruction of the phase space of each speech frame and the computation of its respective Recurrence Plot (RP) reveals complex structures which can be measured by performing Recurrence Quantification Analysis (RQA). These measu… ▽ More

    Submitted 9 November, 2018; originally announced November 2018.

    Journal ref: Proc. Interspeech 2018, pp. 927-931

  27. arXiv:1811.01531  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Unsupervised Deep Clustering for Source Separation: Direct Learning from Mixtures using Spatial Information

    Authors: Efthymios Tzinis, Shrikant Venkataramani, Paris Smaragdis

    Abstract: We present a monophonic source separation system that is trained by only observing mixtures with no ground truth separation information. We use a deep clustering approach which trains on multi-channel mixtures and learns to project spectrogram bins to source clusters that correlate with various spatial features. We show that using such a training process we can obtain separation performance that i… ▽ More

    Submitted 9 November, 2018; v1 submitted 5 November, 2018; originally announced November 2018.

    Comments: Submitted to ICASSP 2019 (v1: November 5th 2018)

    Journal ref: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  28. arXiv:1806.00416  [pdf, other

    cs.LG stat.ML

    Pattern Search Multidimensional Scaling

    Authors: Georgios Paraskevopoulos, Efthymios Tzinis, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Alexandros Potamianos

    Abstract: We present a novel view of nonlinear manifold learning using derivative-free optimization techniques. Specifically, we propose an extension of the classical multi-dimensional scaling (MDS) method, where instead of performing gradient descent, we sample and evaluate possible "moves" in a sphere of fixed radius for each point in the embedded space. A fixed-point convergence guarantee can be shown by… ▽ More

    Submitted 30 October, 2019; v1 submitted 1 June, 2018; originally announced June 2018.

    Comments: 36 pages, Under review for JMLR