Showing 1–2 of 2 results for author: Cadoux, C

Search v0.5.6 released 2020-02-24

arXiv:2006.02774 [pdf, other]

cs.SD eess.AS

A study on more realistic room simulation for far-field keyword spotting

Authors: Eric Bezzam, Robin Scheibler, Cyril Cadoux, Thibault Gisselbrecht

Abstract: We investigate the impact of more realistic room simulation for training far-field keyword spotting systems without fine-tuning on in-domain data. To this end, we study the impact of incorporating the following factors in the room impulse response (RIR) generation: air absorption, surface- and frequency-dependent coefficients of real materials, and stochastic ray tracing. Through an ablation study… ▽ More We investigate the impact of more realistic room simulation for training far-field keyword spotting systems without fine-tuning on in-domain data. To this end, we study the impact of incorporating the following factors in the room impulse response (RIR) generation: air absorption, surface- and frequency-dependent coefficients of real materials, and stochastic ray tracing. Through an ablation study, a wake word task is used to measure the impact of these factors in comparison with a ground-truth set of measured RIRs. On a hold-out set of re-recordings under clean and noisy far-field conditions, we demonstrate up to $35.8\%$ relative improvement over the commonly-used (single absorption coefficient) image source method. Source code is made available in the Pyroomacoustics package, allowing others to incorporate these techniques in their work. △ Less

Submitted 18 November, 2020; v1 submitted 4 June, 2020; originally announced June 2020.

Comments: 7 pages, 4 figures, accepted at APSIPA 2020, room impulse response generation code can be found at https://github.com/ebezzam/room-simulation
arXiv:1911.02091 [pdf, other]

eess.AS cs.SD

Closing the Training/Inference Gap for Deep Attractor Networks

Authors: Cyril Cadoux, Stefan Uhlich, Marc Ferras, Yuki Mitsufuji

Abstract: This paper improves the deep attractor network (DANet) approach by closing its gap between training and inference. During training, DANet relies on attractors, which are computed from the ground truth separations. As this information is not available at inference time, the attractors have to be estimated, which is typically done by k-means. This results in two mismatches: The first mismatch stems… ▽ More This paper improves the deep attractor network (DANet) approach by closing its gap between training and inference. During training, DANet relies on attractors, which are computed from the ground truth separations. As this information is not available at inference time, the attractors have to be estimated, which is typically done by k-means. This results in two mismatches: The first mismatch stems from using classical k-means with Euclidean norm, whereas masks are computed during training using the dot product similarity. By using spherical k-means instead, we can show that we can already improve the performance of DANet. Furthermore, we show that we can fully incorporate k-means clustering into the DANet training. This yields the benefit of having no training/inference gap and consequently results in an scale-invariant signal-to-distortion ratio (SI-SDR) improvement of 1.1dB on the Wall Street Journal corpus (WSJ0). △ Less

Submitted 5 November, 2019; originally announced November 2019.

Search v0.5.6 released 2020-02-24