Enhancing and Adversarial: Improve ASR with Speaker Labels

Zhou, Wei; Wu, Haotian; Xu, Jingjing; Zeineldeen, Mohammad; Lüscher, Christoph; Schlüter, Ralf; Ney, Hermann

doi:10.1109/ICASSP49357.2023.10096722

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2211.06369 (eess)

[Submitted on 11 Nov 2022 (v1), last revised 24 Feb 2023 (this version, v2)]

Title:Enhancing and Adversarial: Improve ASR with Speaker Labels

Authors:Wei Zhou, Haotian Wu, Jingjing Xu, Mohammad Zeineldeen, Christoph Lüscher, Ralf Schlüter, Hermann Ney

View PDF

Abstract:ASR can be improved by multi-task learning (MTL) with domain enhancing or domain adversarial training, which are two opposite objectives with the aim to increase/decrease domain variance towards domain-aware/agnostic ASR, respectively. In this work, we study how to best apply these two opposite objectives with speaker labels to improve conformer-based ASR. We also propose a novel adaptive gradient reversal layer for stable and effective adversarial training without tuning effort. Detailed analysis and experimental verification are conducted to show the optimal positions in the ASR neural network (NN) to apply speaker enhancing and adversarial training. We also explore their combination for further improvement, achieving the same performance as i-vectors plus adversarial training. Our best speaker-based MTL achieves 7\% relative improvement on the Switchboard Hub5'00 set. We also investigate the effect of such speaker-based MTL w.r.t. cleaner dataset and weaker ASR NN.

Comments:	accepted at ICASSP 2023
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2211.06369 [eess.AS]
	(or arXiv:2211.06369v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2211.06369
Related DOI:	https://doi.org/10.1109/ICASSP49357.2023.10096722

Submission history

From: Wei Zhou [view email]
[v1] Fri, 11 Nov 2022 17:40:08 UTC (243 KB)
[v2] Fri, 24 Feb 2023 09:21:39 UTC (245 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Enhancing and Adversarial: Improve ASR with Speaker Labels

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Enhancing and Adversarial: Improve ASR with Speaker Labels

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators