NAS-VAD: Neural Architecture Search for Voice Activity Detection

Rho, Daniel; Park, Jinhyeok; Ko, Jong Hwan

doi:10.21437/Interspeech.2022-975

Computer Science > Sound

arXiv:2201.09032 (cs)

[Submitted on 22 Jan 2022 (v1), last revised 29 Mar 2022 (this version, v2)]

Title:NAS-VAD: Neural Architecture Search for Voice Activity Detection

Authors:Daniel Rho, Jinhyeok Park, Jong Hwan Ko

View PDF

Abstract:Various neural network-based approaches have been proposed for more robust and accurate voice activity detection (VAD). Manual design of such neural architectures is an error-prone and time-consuming process, which prompted the development of neural architecture search (NAS) that automatically design and optimize network architectures. While NAS has been successfully applied to improve performance in a variety of tasks, it has not yet been exploited in the VAD domain. In this paper, we present the first work that utilizes NAS approaches on the VAD task. To effectively search architectures for the VAD task, we propose a modified macro structure and a new search space with a much broader range of operations that includes attention operations. The results show that the network structures found by the propose NAS framework outperform previous manually designed state-of-the-art VAD models in various noise-added and real-world-recorded datasets. We also show that the architectures searched on a particular dataset achieve improved generalization performance on unseen audio datasets. Our code and models are available at this https URL.

Comments:	Submitted to Interspeech 2022
Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2201.09032 [cs.SD]
	(or arXiv:2201.09032v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2201.09032
Related DOI:	https://doi.org/10.21437/Interspeech.2022-975

Submission history

From: Daniel Rho [view email]
[v1] Sat, 22 Jan 2022 12:06:41 UTC (332 KB)
[v2] Tue, 29 Mar 2022 08:16:03 UTC (185 KB)

Computer Science > Sound

Title:NAS-VAD: Neural Architecture Search for Voice Activity Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:NAS-VAD: Neural Architecture Search for Voice Activity Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators