Zum Hauptinhalt springen

Showing 1–12 of 12 results for author: Sunkara, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.05588  [pdf, other

    cs.CL cs.AI cs.LG

    CERET: Cost-Effective Extrinsic Refinement for Text Generation

    Authors: Jason Cai, Hang Su, Monica Sunkara, Igor Shalyminov, Saab Mansour

    Abstract: Large Language Models (LLMs) are powerful models for generation tasks, but they may not generate good quality outputs in their first attempt. Apart from model fine-tuning, existing approaches to improve prediction accuracy and quality typically involve LLM self-improvement / self-reflection that incorporate feedback from models themselves. Despite their effectiveness, these methods are hindered by… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: The source code and data samples are released at https://github.com/amazon-science/CERET-LLM-refine

  2. arXiv:2405.08295  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechVerse: A Large-scale Generalizable Audio Language Model

    Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Single Column, 13 page

  3. A-DisETrac Advanced Analytic Dashboard for Distributed Eye Tracking

    Authors: Yasasi Abeysinghe, Bhanuka Mahanama, Gavindya Jayawardena, Yasith Jayawardana, Mohan Sunkara, Andrew T. Duchowski, Vikas Ashok, Sampath Jayarathna

    Abstract: Understanding how individuals focus and perform visual searches during collaborative tasks can help improve user engagement. Eye tracking measures provide informative cues for such understanding. This article presents A-DisETrac, an advanced analytic dashboard for distributed eye tracking. It uses off-the-shelf eye trackers to monitor multiple users in parallel, compute both traditional and advanc… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Journal ref: International Journal of Multimedia Data Engineering and Management (IJMDEM) 15.1 (2024) 1-20

  4. arXiv:2305.07677  [pdf, other

    cs.SD cs.CL cs.LG

    Masked Audio Text Encoders are Effective Multi-Modal Rescorers

    Authors: Jinglun Cai, Monica Sunkara, Xilai Li, Anshu Bhatia, Xiao Pan, Sravan Bodapati

    Abstract: Masked Language Models (MLMs) have proven to be effective for second-pass rescoring in Automatic Speech Recognition (ASR) systems. In this work, we propose Masked Audio Text Encoder (MATE), a multi-modal masked language model rescorer which incorporates acoustic representations into the input space of MLM. We adopt contrastive learning for effectively aligning the modalities by learning shared rep… ▽ More

    Submitted 24 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

  5. arXiv:2305.03837  [pdf, other

    eess.AS cs.LG cs.SD

    Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation

    Authors: Nilaksh Das, Monica Sunkara, Sravan Bodapati, Jinglun Cai, Devang Kulshreshtha, Jeff Farris, Katrin Kirchhoff

    Abstract: End-to-end ASR models trained on large amount of data tend to be implicitly biased towards language semantics of the training data. Internal language model estimation (ILME) has been proposed to mitigate this bias for autoregressive models such as attention-based encoder-decoder and RNN-T. Typically, ILME is performed by modularizing the acoustic and language components of the model architecture,… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted to ICASSP 2023

  6. arXiv:2211.07828  [pdf, other

    cs.CL

    Adaptation Approaches for Nearest Neighbor Language Models

    Authors: Rishabh Bhardwaj, George Polovets, Monica Sunkara

    Abstract: Semi-parametric Nearest Neighbor Language Models ($k$NN-LMs) have produced impressive gains over purely parametric LMs, by leveraging large-scale neighborhood retrieval over external memory datastores. However, there has been little investigation into adapting such models for new domains. This work attempts to fill that gap and suggests the following approaches for adapting $k$NN-LMs -- 1) adaptin… ▽ More

    Submitted 12 June, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: 10 pages, 4 figures

  7. arXiv:2210.09510  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting

    Authors: Saket Dingliwal, Monica Sunkara, Sravan Bodapati, Srikanth Ronanki, Jeff Farris, Katrin Kirchhoff

    Abstract: End-to-end speech recognition models trained using joint Connectionist Temporal Classification (CTC)-Attention loss have gained popularity recently. In these models, a non-autoregressive CTC decoder is often used at inference time due to its speed and simplicity. However, such models are hard to personalize because of their conditional independence assumption that prevents output tokens from previ… ▽ More

    Submitted 13 November, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: To appear in SLT 2022

  8. arXiv:2109.05092  [pdf, other

    eess.AS cs.SD

    Remember the context! ASR slot error correction through memorization

    Authors: Dhanush Bekal, Ashish Shenoy, Monica Sunkara, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Accurate recognition of slot values such as domain specific words or named entities by automatic speech recognition (ASR) systems forms the core of the Goal-oriented Dialogue Systems. Although it is a critical step with direct impact on downstream tasks such as language understanding, many domain agnostic ASR systems tend to perform poorly on domain specific or long tail words. They are often supp… ▽ More

    Submitted 17 September, 2021; v1 submitted 10 September, 2021; originally announced September 2021.

    Comments: 8 pages, 3 figures, 4 tables, Accepted to ASRU 2021

  9. Adapting Long Context NLM for ASR Rescoring in Conversational Agents

    Authors: Ashish Shenoy, Sravan Bodapati, Monica Sunkara, Srikanth Ronanki, Katrin Kirchhoff

    Abstract: Neural Language Models (NLM), when trained and evaluated with context spanning multiple utterances, have been shown to consistently outperform both conventional n-gram language models and NLMs that use limited context. In this paper, we investigate various techniques to incorporate turn based context history into both recurrent (LSTM) and Transformer-XL based NLMs. For recurrent based NLMs, we exp… ▽ More

    Submitted 4 June, 2021; v1 submitted 20 April, 2021; originally announced April 2021.

    Comments: Accepted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2103.10325

  10. arXiv:2102.06380  [pdf, ps, other

    cs.CL eess.AS

    Neural Inverse Text Normalization

    Authors: Monica Sunkara, Chaitanya Shivade, Sravan Bodapati, Katrin Kirchhoff

    Abstract: While there have been several contributions exploring state of the art techniques for text normalization, the problem of inverse text normalization (ITN) remains relatively unexplored. The best known approaches leverage finite state transducer (FST) based models which rely on manually curated rules and are hence not scalable. We propose an efficient and robust neural solution for ITN leveraging tr… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

    Comments: 5 pages, accepted to ICASSP 2021

  11. arXiv:2008.00702  [pdf, other

    eess.AS cs.CL

    Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech

    Authors: Monica Sunkara, Srikanth Ronanki, Dhanush Bekal, Sravan Bodapati, Katrin Kirchhoff

    Abstract: In this work, we explore a multimodal semi-supervised learning approach for punctuation prediction by learning representations from large amounts of unlabelled audio and text data. Conventional approaches in speech processing typically use forced alignment to encoder per frame acoustic features to word level features and perform multimodal fusion of the resulting acoustic and lexical representatio… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

    Comments: Accepted for Interspeech 2020

  12. arXiv:2007.02025  [pdf, other

    cs.CL cs.SD eess.AS

    Robust Prediction of Punctuation and Truecasing for Medical ASR

    Authors: Monica Sunkara, Srikanth Ronanki, Kalpit Dixit, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Automatic speech recognition (ASR) systems in the medical domain that focus on transcribing clinical dictations and doctor-patient conversations often pose many challenges due to the complexity of the domain. ASR output typically undergoes automatic punctuation to enable users to speak naturally, without having to vocalise awkward and explicit punctuation commands, such as "period", "add comma" or… ▽ More

    Submitted 11 July, 2020; v1 submitted 4 July, 2020; originally announced July 2020.

    Comments: Accepted for ACL NLPMC workshop 2020