Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: Scarpati, A S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.03419  [pdf, other

    eess.AS cs.LG cs.SD

    Personalizing Keyword Spotting with Speaker Information

    Authors: Beltrán Labrador, Pai Zhu, Guanlong Zhao, Angelo Scorza Scarpati, Quan Wang, Alicia Lozano-Diez, Alex Park, Ignacio López Moreno

    Abstract: Keyword spotting systems often struggle to generalize to a diverse population with various accents and age groups. To address this challenge, we propose a novel approach that integrates speaker information into keyword spotting using Feature-wise Linear Modulation (FiLM), a recent method for learning from multiple sources of information. We explore both Text-Dependent and Text-Independent speaker… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  2. arXiv:2302.12961  [pdf, other

    cs.CL cs.LG

    Locale Encoding For Scalable Multilingual Keyword Spotting Models

    Authors: Pai Zhu, Hyun Jin Park, Alex Park, Angelo Scorza Scarpati, Ignacio Lopez Moreno

    Abstract: A Multilingual Keyword Spotting (KWS) system detects spokenkeywords over multiple locales. Conventional monolingual KWSapproaches do not scale well to multilingual scenarios because ofhigh development/maintenance costs and lack of resource sharing.To overcome this limit, we propose two locale-conditioned universalmodels with locale feature concatenation and feature-wise linearmodulation (FiLM). We… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: Accepted for ICASSP 2023

  3. arXiv:2211.06478  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting

    Authors: Beltrán Labrador, Guanlong Zhao, Ignacio López Moreno, Angelo Scorza Scarpati, Liam Fowl, Quan Wang

    Abstract: In this paper, we present a novel approach to adapt a sequence-to-sequence Transformer-Transducer ASR system to the keyword spotting (KWS) task. We achieve this by replacing the keyword in the text transcription with a special token <kw> and training the system to detect the <kw> token in an audio stream. At inference time, we create a decision function inspired by conventional KWS approaches, to… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.