Zum Hauptinhalt springen

Showing 1–5 of 5 results for author: Mallidi, S H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2103.08393  [pdf, other

    eess.AS cs.LG cs.SD

    Wav2vec-C: A Self-supervised Model for Speech Representation Learning

    Authors: Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

    Abstract: Wav2vec-C introduces a novel representation learning technique combining elements from wav2vec 2.0 and VQ-VAE. Our model learns to reproduce quantized representations from partially masked speech encoding using a contrastive loss in a way similar to Wav2vec 2.0. However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to th… ▽ More

    Submitted 23 June, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

    Comments: To appear in Interspeech 2021

  2. arXiv:2007.09245  [pdf, other

    eess.AS cs.SD

    Streaming ResLSTM with Causal Mean Aggregation for Device-Directed Utterance Detection

    Authors: Xiaosu Tong, Che-Wei Huang, Sri Harish Mallidi, Shaun Joseph, Sonal Pareek, Chander Chandak, Ariya Rastrow, Roland Maas

    Abstract: In this paper, we propose a streaming model to distinguish voice queries intended for a smart-home device from background speech. The proposed model consists of multiple CNN layers with residual connections, followed by a stacked LSTM architecture. The streaming capability is achieved by using unidirectional LSTM layers and a causal mean aggregation layer to form the final utterance-level predicti… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

  3. arXiv:1906.08041  [pdf, other

    eess.AS cs.CL cs.SD

    Multi-Stream End-to-End Speech Recognition

    Authors: Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, Hynek Hermansky

    Abstract: Attention-based methods and Connectionist Temporal Classification (CTC) network have been promising research directions for end-to-end (E2E) Automatic Speech Recognition (ASR). The joint CTC/Attention model has achieved great success by utilizing both architectures during multi-task training and joint decoding. In this work, we present a multi-stream framework based on joint CTC/Attention E2E ASR… ▽ More

    Submitted 18 October, 2019; v1 submitted 17 June, 2019; originally announced June 2019.

    Comments: submitted to IEEE TASLP (In review). arXiv admin note: substantial text overlap with arXiv:1811.04897, arXiv:1811.04903

  4. arXiv:1810.03459  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling

    Authors: Jaejin Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri Harish Mallidi, Nelson Yalta, Martin Karafiat, Shinji Watanabe, Takaaki Hori

    Abstract: Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of requiring more data compared to conventional DNN-HMM systems. In this work, we attempt to use data from 10 BABEL languages to build a multi-lingual seq2seq model a… ▽ More

    Submitted 4 October, 2018; originally announced October 2018.

  5. arXiv:1808.02504  [pdf, other

    cs.CL eess.AS

    Device-directed Utterance Detection

    Authors: Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas, Björn Hoffmeister

    Abstract: In this work, we propose a classifier for distinguishing device-directed queries from background speech in the context of interactions with voice assistants. Applications include rejection of false wake-ups or unintended interactions as well as enabling wake-word free follow-up queries. Consider the example interaction: $"Computer,~play~music", "Computer,~reduce~the~volume"$. In this interaction,… ▽ More

    Submitted 7 August, 2018; originally announced August 2018.

    Comments: Interspeech 2018 (accepted)