Zum Hauptinhalt springen

Showing 1–8 of 8 results for author: Kilgour, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2204.05738  [pdf, other

    eess.AS cs.SD

    Text-Driven Separation of Arbitrary Sounds

    Authors: Kevin Kilgour, Beat Gfeller, Qingqing Huang, Aren Jansen, Scott Wisdom, Marco Tagliasacchi

    Abstract: We propose a method of separating a desired sound source from a single-channel mixture, based on either a textual description or a short audio sample of the target source. This is achieved by combining two distinct models. The first model, SoundWords, is trained to jointly embed both an audio clip and its textual description to the same embedding in a shared representation. The second model, Sound… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

  3. arXiv:2106.02443  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Teaching keyword spotters to spot new keywords with limited examples

    Authors: Abhijeet Awasthi, Kevin Kilgour, Hassan Rom

    Abstract: Learning to recognize new keywords with just a few examples is essential for personalizing keyword spotting (KWS) models to a user's choice of keywords. However, modern KWS models are typically trained on large datasets and restricted to a small vocabulary of keywords, limiting their transferability to a broad range of unseen keywords. Towards easily customizable KWS models, we present KeySEM (Key… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Comments: In INTERSPEECH 2021

  4. arXiv:2003.09891  [pdf, other

    eess.AS cs.CL cs.SD

    Low Latency ASR for Simultaneous Speech Translation

    Authors: Thai Son Nguyen, Jan Niehues, Eunah Cho, Thanh-Le Ha, Kevin Kilgour, Markus Muller, Matthias Sperber, Sebastian Stueker, Alex Waibel

    Abstract: User studies have shown that reducing the latency of our simultaneous lecture translation system should be the most important goal. We therefore have worked on several techniques for reducing the latency for both components, the automatic speech recognition and the speech translation module. Since the commonly used commitment latency is not appropriate in our case of continuous stream decoding, we… ▽ More

    Submitted 22 March, 2020; originally announced March 2020.

  5. arXiv:2002.01322  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Training Keyword Spotters with Limited and Synthesized Speech Data

    Authors: James Lin, Kevin Kilgour, Dominik Roblek, Matthew Sharifi

    Abstract: With the rise of low power speech-enabled devices, there is a growing demand to quickly produce models for recognizing arbitrary sets of keywords. As with many machine learning tasks, one of the most challenging parts in the model creation process is obtaining a sufficient amount of training data. In this paper, we explore the effectiveness of synthesized speech data in training small, spoken term… ▽ More

    Submitted 31 January, 2020; originally announced February 2020.

  6. arXiv:1812.08466  [pdf, other

    eess.AS cs.SD

    Fréchet Audio Distance: A Metric for Evaluating Music Enhancement Algorithms

    Authors: Kevin Kilgour, Mauricio Zuluaga, Dominik Roblek, Matthew Sharifi

    Abstract: We propose the Fréchet Audio Distance (FAD), a novel, reference-free evaluation metric for music enhancement algorithms. We demonstrate how typical evaluation metrics for speech enhancement and blind source separation can fail to accurately measure the perceived effect of a wide variety of distortions. As an alternative, we propose adapting the Fréchet Inception Distance (FID) metric used to evalu… ▽ More

    Submitted 17 January, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

  7. arXiv:1811.00006  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition

    Authors: David B. Ramsay, Kevin Kilgour, Dominik Roblek, Matthew Sharifi

    Abstract: Low power digital signal processors (DSPs) typically have a very limited amount of memory in which to cache data. In this paper we develop efficient bottleneck feature (BNF) extractors that can be run on a DSP, and retrain a baseline large-vocabulary continuous speech recognition (LVCSR) system to use these BNFs with only a minimal loss of accuracy. The small BNFs allow the DSP chip to cache more… ▽ More

    Submitted 31 October, 2018; originally announced November 2018.

    Comments: Submitted to ICASSP 2019

  8. arXiv:1711.10958  [pdf, other

    cs.SD cs.AI eess.AS

    Now Playing: Continuous low-power music recognition

    Authors: Blaise Agüera y Arcas, Beat Gfeller, Ruiqi Guo, Kevin Kilgour, Sanjiv Kumar, James Lyon, Julian Odell, Marvin Ritter, Dominik Roblek, Matthew Sharifi, Mihajlo Velimirović

    Abstract: Existing music recognition applications require a connection to a server that performs the actual recognition. In this paper we present a low-power music recognizer that runs entirely on a mobile device and automatically recognizes music without user interaction. To reduce battery consumption, a small music detector runs continuously on the mobile device's DSP chip and wakes up the main applicatio… ▽ More

    Submitted 29 November, 2017; originally announced November 2017.

    Comments: Authors are listed in alphabetical order by last name