Zum Hauptinhalt springen

Showing 1–17 of 17 results for author: Gamper, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.18062  [pdf, other

    cs.SD eess.AS

    Audio Entailment: Assessing Deductive Reasoning for Audio Understanding

    Authors: Soham Deshmukh, Shuo Han, Hazim Bukhari, Benjamin Elizalde, Hannes Gamper, Rita Singh, Bhiksha Raj

    Abstract: Recent literature uses language to build foundation models for audio. These Audio-Language Models (ALMs) are trained on a vast number of audio-text pairs and show remarkable performance in tasks including Text-to-Audio Retrieval, Captioning, and Question Answering. However, their ability to engage in more complex open-ended tasks, like Interactive Question-Answering, requires proficiency in logica… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  2. arXiv:2405.19497  [pdf, other

    eess.AS cs.LG cs.SD

    Gaussian Flow Bridges for Audio Domain Transfer with Unpaired Data

    Authors: Eloi Moliner, Sebastian Braun, Hannes Gamper

    Abstract: Audio domain transfer is the process of modifying audio signals to match characteristics of a different domain, while retaining the original content. This paper investigates the potential of Gaussian Flow Bridges, an emerging approach in generative modeling, for this problem. The presented framework addresses the transport problem across different distributions of audio signals through the impleme… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Submitted to IWAENC 2024

  3. arXiv:2402.00282  [pdf, other

    eess.AS cs.SD

    PAM: Prompting Audio-Language Models for Audio Quality Assessment

    Authors: Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, Huaming Wang

    Abstract: While audio quality is a key performance metric for various audio processing tasks, including generative modeling, its objective measurement remains a challenge. Audio-Language Models (ALMs) are pre-trained on audio-text pairs that may contain information about audio quality, the presence of artifacts, or noise. Given an audio input and a text prompt related to quality, an ALM can be used to calcu… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  4. arXiv:2312.05412  [pdf, other

    cs.LG cs.CV cs.MM cs.SD eess.AS

    CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling

    Authors: Ruihan Yang, Hannes Gamper, Sebastian Braun

    Abstract: We introduce a multi-modal diffusion model tailored for the bi-directional conditional generation of video and audio. Recognizing the importance of accurate alignment between video and audio events in multi-modal generation tasks, we propose a joint contrastive training loss to enhance the synchronization between visual and auditory occurrences. Our research methodology involves conducting compreh… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  5. arXiv:2309.12553  [pdf, other

    eess.AS cs.SD

    ICASSP 2023 Acoustic Echo Cancellation Challenge

    Authors: Ross Cutler, Ando Saabas, Tanel Parnamaa, Marju Purin, Evgenii Indenbom, Nicolae-Catalin Ristea, Jegor Gužvin, Hannes Gamper, Sebastian Braun, Robert Aichner

    Abstract: The ICASSP 2023 Acoustic Echo Cancellation Challenge is intended to stimulate research in acoustic echo cancellation (AEC), which is an important area of speech enhancement and is still a top issue in audio communication. This is the fourth AEC challenge and it is enhanced by adding a second track for personalized acoustic echo cancellation, reducing the algorithmic + buffering latency to 20ms, as… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2202.13290, arXiv:2009.04972

  6. arXiv:2303.11510  [pdf, other

    cs.SD eess.AS

    ICASSP 2023 Deep Noise Suppression Challenge

    Authors: Harishchandra Dubey, Ashkan Aazami, Vishak Gopal, Babak Naderi, Sebastian Braun, Ross Cutler, Alex Ju, Mehdi Zohourian, Min Tang, Hannes Gamper, Mehrsa Golestaneh, Robert Aichner

    Abstract: Deep Speech Enhancement Challenge is the 5th edition of deep noise suppression (DNS) challenges organized at ICASSP 2023 Signal Processing Grand Challenges. DNS challenges were organized during 2019-2023 to stimulate research in deep speech enhancement (DSE). Previous DNS challenges were organized at INTERSPEECH 2020, ICASSP 2021, INTERSPEECH 2021, and ICASSP 2022. From prior editions, we learnt t… ▽ More

    Submitted 8 May, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: 6 pages, 1 figure. arXiv admin note: text overlap with arXiv:2202.13288

  7. arXiv:2212.01911  [pdf, other

    cs.SD cs.AI eess.AS

    Speech MOS multi-task learning and rater bias correction

    Authors: Haleh Akrami, Hannes Gamper

    Abstract: Perceptual speech quality is an important performance metric for teleconferencing applications. The mean opinion score (MOS) is standardized for the perceptual evaluation of speech quality and is obtained by asking listeners to rate the quality of a speech sample. Recently, there has been increasing research interest in developing models for estimating MOS blindly. Here we propose a multi-task fra… ▽ More

    Submitted 4 December, 2022; originally announced December 2022.

    Comments: Submitted to ICASSP 2023

  8. arXiv:2204.06616  [pdf, other

    cs.SD eess.AS

    Predicting score distribution to improve non-intrusive speech quality estimation

    Authors: Abu Zaher Md Faridee, Hannes Gamper

    Abstract: Deep noise suppressors (DNS) have become an attractive solution to remove background noise, reverberation, and distortions from speech and are widely used in telephony/voice applications. They are also occasionally prone to introducing artifacts and lowering the perceptual quality of the speech. Subjective listening tests that use multiple human judges to derive a mean opinion score (MOS) are a po… ▽ More

    Submitted 13 April, 2022; originally announced April 2022.

    Comments: Submitted to Interspeech 2022

  9. arXiv:2202.13290  [pdf, other

    eess.AS cs.SD

    ICASSP 2022 Acoustic Echo Cancellation Challenge

    Authors: Ross Cutler, Ando Saabas, Tanel Parnamaa, Marju Purin, Hannes Gamper, Sebastian Braun, Karsten Sørensen, Robert Aichner

    Abstract: The ICASSP 2022 Acoustic Echo Cancellation Challenge is intended to stimulate research in acoustic echo cancellation (AEC), which is an important area of speech enhancement and still a top issue in audio communication. This is the third AEC challenge and it is enhanced by including mobile scenarios, adding speech recognition rate in the challenge goal metrics, and making the default sample rate 48… ▽ More

    Submitted 26 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2009.04972

  10. arXiv:2202.13288  [pdf, other

    eess.AS cs.SD

    ICASSP 2022 Deep Noise Suppression Challenge

    Authors: Harishchandra Dubey, Vishak Gopal, Ross Cutler, Ashkan Aazami, Sergiy Matusevych, Sebastian Braun, Sefik Emre Eskimez, Manthan Thakker, Takuya Yoshioka, Hannes Gamper, Robert Aichner

    Abstract: The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. This is the 4th DNS challenge, with the previous editions held at INTERSPEECH 2020, ICASSP 2021, and INTERSPEECH 2021. We open-source datasets and test sets for researchers to train their deep noise suppression models, as well as a subjective e… ▽ More

    Submitted 26 February, 2022; originally announced February 2022.

  11. arXiv:2111.11606  [pdf, other

    eess.AS cs.SD

    Effect of noise suppression losses on speech distortion and ASR performance

    Authors: Sebastian Braun, Hannes Gamper

    Abstract: Deep learning based speech enhancement has made rapid development towards improving quality, while models are becoming more compact and usable for real-time on-the-edge inference. However, the speech quality scales directly with the model size, and small models are often still unable to achieve sufficient quality. Furthermore, the introduced speech distortion and artifacts greatly harm speech qual… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

    Comments: submitted to ICASSP 2022

  12. arXiv:2101.09249  [pdf, other

    eess.AS cs.SD

    Towards efficient models for real-time deep noise suppression

    Authors: Sebastian Braun, Hannes Gamper, Chandan K. A. Reddy, Ivan Tashev

    Abstract: With recent research advancements, deep learning models are becoming attractive and powerful choices for speech enhancement in real-time applications. While state-of-the-art models can achieve outstanding results in terms of speech quality and background noise reduction, the main challenge is to obtain compact enough models, which are resource efficient during inference time. An important but ofte… ▽ More

    Submitted 19 May, 2021; v1 submitted 22 January, 2021; originally announced January 2021.

  13. arXiv:2101.01902  [pdf, other

    cs.SD cs.LG eess.AS

    Interspeech 2021 Deep Noise Suppression Challenge

    Authors: Chandan K A Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan

    Abstract: The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. We recently organized a DNS challenge special session at INTERSPEECH and ICASSP 2020. We open-sourced training and test datasets for the wideband scenario. We also open-sourced a subjective evaluation framework based on ITU-T standard P.808, wh… ▽ More

    Submitted 4 April, 2021; v1 submitted 6 January, 2021; originally announced January 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2009.06122

  14. arXiv:2009.04972  [pdf, other

    eess.AS cs.SD

    ICASSP 2021 Acoustic Echo Cancellation Challenge: Datasets, Testing Framework, and Results

    Authors: Kusha Sridhar, Ross Cutler, Ando Saabas, Tanel Parnamaa, Markus Loide, Hannes Gamper, Sebastian Braun, Robert Aichner, Sriram Srinivasan

    Abstract: The ICASSP 2021 Acoustic Echo Cancellation Challenge is intended to stimulate research in the area of acoustic echo cancellation (AEC), which is an important part of speech enhancement and still a top issue in audio communication and conferencing systems. Many recent AEC studies report good performance on synthetic datasets where the train and test samples come from the same underlying distributio… ▽ More

    Submitted 30 October, 2020; v1 submitted 10 September, 2020; originally announced September 2020.

  15. arXiv:1911.01802  [pdf, other

    eess.AS cs.LG cs.SD eess.IV eess.SP

    Fast acoustic scattering using convolutional neural networks

    Authors: Ziqi Fan, Vibhav Vineet, Hannes Gamper, Nikunj Raghuvanshi

    Abstract: Diffracted scattering and occlusion are important acoustic effects in interactive auralization and noise control applications, typically requiring expensive numerical simulation. We propose training a convolutional neural network to map from a convex scatterer's cross-section to a 2D slice of the resulting spatial loudness distribution. We show that employing a full-resolution residual network for… ▽ More

    Submitted 15 February, 2020; v1 submitted 30 October, 2019; originally announced November 2019.

    Comments: Accepted by ICASSP 2020

  16. arXiv:1911.00566  [pdf, other

    eess.AS cs.SD

    Predicting word error rate for reverberant speech

    Authors: Hannes Gamper, Dimitra Emmanouilidou, Sebastian Braun, Ivan J. Tashev

    Abstract: Reverberation negatively impacts the performance of automatic speech recognition (ASR). Prior work on quantifying the effect of reverberation has shown that clarity (C50), a parameter that can be estimated from the acoustic impulse response, is correlated with ASR performance. In this paper we propose predicting ASR performance in terms of the word error rate (WER) directly from acoustic parameter… ▽ More

    Submitted 14 February, 2020; v1 submitted 1 November, 2019; originally announced November 2019.

    Comments: Presented at IEEE 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)

  17. arXiv:1903.06908  [pdf, other

    eess.AS cs.SD

    Non-intrusive speech quality assessment using neural networks

    Authors: Anderson R. Avila, Hannes Gamper, Chandan Reddy, Ross Cutler, Ivan Tashev, Johannes Gehrke

    Abstract: Estimating the perceived quality of an audio signal is critical for many multimedia and audio processing systems. Providers strive to offer optimal and reliable services in order to increase the user quality of experience (QoE). In this work, we present an investigation of the applicability of neural networks for non-intrusive audio quality assessment. We propose three neural network-based approac… ▽ More

    Submitted 16 March, 2019; originally announced March 2019.

    Comments: Accepted at ICASSP 2019