Zum Hauptinhalt springen

Showing 1–8 of 8 results for author: Petermann, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.03567  [pdf, other

    eess.AS cs.SD

    Hyperbolic Distance-Based Speech Separation

    Authors: Darius Petermann, Minje Kim

    Abstract: In this work, we explore the task of hierarchical distance-based speech separation defined on a hyperbolic manifold. Based on the recent advent of audio-related tasks performed in non-Euclidean spaces, we propose to make use of the Poincaré ball to effectively unveil the inherent hierarchical structure found in complex speaker mixtures. We design two sets of experiments in which the distance-based… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: To be published at ICASSP2024, 14th of April 2024, Seoul, South Korea. Copyright (c) 2023 IEEE. 5 pages, 2 figures, 3 tables

  2. arXiv:2303.08005  [pdf, other

    eess.AS cs.SD

    Native Multi-Band Audio Coding within Hyper-Autoencoded Reconstruction Propagation Networks

    Authors: Darius Petermann, Inseon Jang, Minje Kim

    Abstract: Spectral sub-bands do not portray the same perceptual relevance. In audio coding, it is therefore desirable to have independent control over each of the constituent bands so that bitrate assignment and signal reconstruction can be achieved efficiently. In this work, we present a novel neural audio coding network that natively supports a multi-band coding paradigm. Our model extends the idea of com… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted to ICASSP 2023. For resources and examples, see https://saige.sice.indiana.edu/research-projects/HARP-Net/

  3. arXiv:2212.07327  [pdf, other

    eess.AS cs.SD

    Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks

    Authors: Darius Petermann, Gordon Wichern, Aswin Shanmugam Subramanian, Zhong-Qiu Wang, Jonathan Le Roux

    Abstract: Emulating the human ability to solve the cocktail party problem, i.e., focus on a source of interest in a complex acoustic scene, is a long standing goal of audio source separation research. Much of this research investigates separating speech from noise, speech from speech, musical instruments from each other, or sound events from each other. In this paper, we focus on the cocktail fork problem,… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: Submitted to IEEE TASLP (In review), 13 pages, 6 figures

  4. arXiv:2212.05008  [pdf, other

    eess.AS cs.SD

    Hyperbolic Audio Source Separation

    Authors: Darius Petermann, Gordon Wichern, Aswin Subramanian, Jonathan Le Roux

    Abstract: We introduce a framework for audio source separation using embeddings on a hyperbolic manifold that compactly represent the hierarchical relationship between sound sources and time-frequency features. Inspired by recent successes modeling hierarchical relationships in text and images with hyperbolic embeddings, our algorithm obtains a hyperbolic embedding for each time-frequency bin of a mixture s… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

    Comments: Submitted to ICASSP 2023, Demo page: https://darius522.github.io/hyperbolic-audio-sep/

  5. arXiv:2202.07523  [pdf, other

    eess.AS cs.SD

    SpaIn-Net: Spatially-Informed Stereophonic Music Source Separation

    Authors: Darius Petermann, Minje Kim

    Abstract: With the recent advancements of data driven approaches using deep neural networks, music source separation has been formulated as an instrument-specific supervised problem. While existing deep learning models implicitly absorb the spatial information conveyed by the multi-channel input signals, we argue that a more explicit and active use of spatial information could not only improve the separatio… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

    Comments: To Appear in Proc. ICASSP2022

  6. arXiv:2110.09958  [pdf, other

    eess.AS cs.SD

    The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks

    Authors: Darius Petermann, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux

    Abstract: The cocktail party problem aims at isolating any source of interest within a complex acoustic scene, and has long inspired audio source separation research. Recent efforts have mainly focused on separating speech from noise, speech from speech, musical instruments from each other, or sound events from each other. However, separating an audio mixture (e.g., movie soundtrack) into the three broad ca… ▽ More

    Submitted 23 March, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: Accepted to ICASSP2022. For resources and examples, see https://cocktail-fork.github.io

  7. arXiv:2107.10843  [pdf, other

    eess.AS cs.AI cs.SD

    HARP-Net: Hyper-Autoencoded Reconstruction Propagation for Scalable Neural Audio Coding

    Authors: Darius Petermann, Seungkwon Beack, Minje Kim

    Abstract: An autoencoder-based codec employs quantization to turn its bottleneck layer activation into bitstrings, a process that hinders information flow between the encoder and decoder parts. To circumvent this issue, we employ additional skip connections between the corresponding pair of encoder-decoder layers. The assumption is that, in a mirrored autoencoder topology, a decoder layer reconstructs the i… ▽ More

    Submitted 23 July, 2021; v1 submitted 22 July, 2021; originally announced July 2021.

    Comments: Accepted to the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2021, Mohonk Mountain House, New Paltz, NY

  8. arXiv:2008.07645  [pdf, other

    eess.AS cs.LG cs.SD

    Deep Learning Based Source Separation Applied To Choir Ensembles

    Authors: Darius Petermann, Pritish Chandna, Helena Cuesta, Jordi Bonada, Emilia Gomez

    Abstract: Choral singing is a widely practiced form of ensemble singing wherein a group of people sing simultaneously in polyphonic harmony. The most commonly practiced setting for choir ensembles consists of four parts; Soprano, Alto, Tenor and Bass (SATB), each with its own range of fundamental frequencies (F$0$s). The task of source separation for this choral setting entails separating the SATB mixture i… ▽ More

    Submitted 17 August, 2020; originally announced August 2020.

    Comments: To appear at the 21st International Society for Music Information Retrieval Conference, Montréal, Canada, 2020, audio examples available at: "https://darius522.github.io/satb-source-separation-results/"