Zum Hauptinhalt springen

Showing 1–5 of 5 results for author: Bekal, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.08295  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechVerse: A Large-scale Generalizable Audio Language Model

    Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Single Column, 13 page

  2. arXiv:2211.13280  [pdf, other

    cs.CL cs.SD eess.AS

    Device Directedness with Contextual Cues for Spoken Dialog Systems

    Authors: Dhanush Bekal, Sundararajan Srinivasan, Sravan Bodapati, Srikanth Ronanki, Katrin Kirchhoff

    Abstract: In this work, we define barge-in verification as a supervised learning task where audio-only information is used to classify user spoken dialogue into true and false barge-ins. Following the success of pre-trained models, we use low-level speech representations from a self-supervised representation learning model for our downstream classification task. Further, we propose a novel technique to infu… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

  3. arXiv:2109.05092  [pdf, other

    eess.AS cs.SD

    Remember the context! ASR slot error correction through memorization

    Authors: Dhanush Bekal, Ashish Shenoy, Monica Sunkara, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Accurate recognition of slot values such as domain specific words or named entities by automatic speech recognition (ASR) systems forms the core of the Goal-oriented Dialogue Systems. Although it is a critical step with direct impact on downstream tasks such as language understanding, many domain agnostic ASR systems tend to perform poorly on domain specific or long tail words. They are often supp… ▽ More

    Submitted 17 September, 2021; v1 submitted 10 September, 2021; originally announced September 2021.

    Comments: 8 pages, 3 figures, 4 tables, Accepted to ASRU 2021

  4. arXiv:2008.00702  [pdf, other

    eess.AS cs.CL

    Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech

    Authors: Monica Sunkara, Srikanth Ronanki, Dhanush Bekal, Sravan Bodapati, Katrin Kirchhoff

    Abstract: In this work, we explore a multimodal semi-supervised learning approach for punctuation prediction by learning representations from large amounts of unlabelled audio and text data. Conventional approaches in speech processing typically use forced alignment to encoder per frame acoustic features to word level features and perform multimodal fusion of the resulting acoustic and lexical representatio… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

    Comments: Accepted for Interspeech 2020

  5. arXiv:1904.02342  [pdf, other

    cs.CL

    Text Generation from Knowledge Graphs with Graph Transformers

    Authors: Rik Koncel-Kedziorski, Dhanush Bekal, Yi Luan, Mirella Lapata, Hannaneh Hajishirzi

    Abstract: Generating texts which express complex ideas spanning multiple sentences requires a structured representation of their content (document plan), but these representations are prohibitively expensive to manually produce. In this work, we address the problem of generating coherent multi-sentence texts from the output of an information extraction system, and in particular a knowledge graph. Graphical… ▽ More

    Submitted 24 March, 2022; v1 submitted 4 April, 2019; originally announced April 2019.

    Comments: Accepted as a long paper in NAACL 2019