Zum Hauptinhalt springen

Showing 1–34 of 34 results for author: Raju, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17151  [pdf, other

    cs.RO

    Socially Acceptable Bipedal Robot Navigation via Social Zonotope Network Model Predictive Control

    Authors: Abdulaziz Shamsah, Krishanu Agarwal, Nigam Katta, Abirath Raju, Shreyas Kousik, Ye Zhao

    Abstract: This study addresses the challenge of social bipedal navigation in a dynamic, human-crowded environment, a research area largely underexplored in legged robot navigation. We present a zonotope-based framework that couples prediction and motion planning for a bipedal ego-agent to account for bidirectional influence with the surrounding pedestrians. This framework incorporates a Social Zonotope Netw… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 19 pages, 19 figures. arXiv admin note: text overlap with arXiv:2403.16485, arXiv:2310.09969

  2. arXiv:2406.02477  [pdf, other

    eess.IV cs.CV cs.LG

    Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion

    Authors: Colin Hansen, Simas Glinskis, Ashwin Raju, Micha Kornreich, JinHyeong Park, Jayashri Pawar, Richard Herzog, Li Zhang, Benjamin Odry

    Abstract: Data driven models for automated diagnosis in radiology suffer from insufficient and imbalanced datasets due to low representation of pathology in a population and the cost of expert annotations. Datasets can be bolstered through data augmentation. However, even when utilizing a full suite of transformations during model training, typical data augmentations do not address variations in human anato… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2401.14717  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion

    Authors: Jinhan Wang, Long Chen, Aparna Khare, Anirudh Raju, Pranav Dheram, Di He, Minhua Wu, Andreas Stolcke, Venkatesh Ravichandran

    Abstract: We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM). Experiments on the Switchboard human-human conversation dataset demonstrate that our approach consistently outperforms the baseline models with single modality. We also develop a novel multi-task instruction fine-tuning… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: To appear in IEEE ICASSP 2024

  4. Two-pass Endpoint Detection for Speech Recognition

    Authors: Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow

    Abstract: Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands. The endpoint detector has to trade-off between accuracy and latency, since waiting longer reduces the cases of users being cut-off early. We propose a novel two-pass solution for endpointing, where the utterance endpoint detected from a first pass endpointer is verified b… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: ASRU 2023

  5. arXiv:2306.12015  [pdf, other

    eess.AS cs.SD

    Federated Self-Learning with Weak Supervision for Speech Recognition

    Authors: Milind Rao, Gopinath Chennupati, Gautam Tiwari, Anit Kumar Sahu, Anirudh Raju, Ariya Rastrow, Jasha Droppo

    Abstract: Automatic speech recognition (ASR) models with low-footprint are increasingly being deployed on edge devices for conversational agents, which enhances privacy. We study the problem of federated continual incremental learning for recurrent neural network-transducer (RNN-T) ASR models in the privacy-enhancing scheme of learning on-device, without access to ground truth human transcripts or machine t… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: Proceedings of ICASSP 2023

  6. arXiv:2306.11706  [pdf, other

    cs.RO cs.LG

    RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation

    Authors: Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Laurens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan Batchelor, Thomas Lampe, Emilio Parisotto, Konrad Żołna, Scott Reed, Sergio Gómez Colmenarejo, Jon Scholz , et al. (14 additional authors not shown)

    Abstract: The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned de… ▽ More

    Submitted 22 December, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Transactions on Machine Learning Research (12/2023)

  7. A Survey on Cross-Architectural IoT Malware Threat Hunting

    Authors: Anandharaju Durai Raju, Ibrahim Abualhaol, Ronnie Salvador Giagone, Yang Zhou, Shengqiang Huang

    Abstract: In recent years, the increase in non-Windows malware threats had turned the focus of the cybersecurity community. Research works on hunting Windows PE-based malwares are maturing, whereas the developments on Linux malware threat hunting are relatively scarce. With the advent of the Internet of Things (IoT) era, smart devices that are getting integrated into human life have become a hackers highway… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: https://ieeexplore.ieee.org/abstract/document/9462110

    Journal ref: IEEE Access 2021

  8. arXiv:2306.07118  [pdf

    cs.CR cs.LG

    On building machine learning pipelines for Android malware detection: a procedural survey of practices, challenges and opportunities

    Authors: Masoud Mehrabi Koushki, Ibrahim AbuAlhaol, Anandharaju Durai Raju, Yang Zhou, Ronnie Salvador Giagone, Huang Shengqiang

    Abstract: As the smartphone market leader, Android has been a prominent target for malware attacks. The number of malicious applications (apps) identified for it has increased continually over the past decade, creating an immense challenge for all parties involved. For market holders and researchers, in particular, the large number of samples has made manual malware detection unfeasible, leading to an influ… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: file:///C:/Users/ibrahim_abualhaol/Downloads/s42400-022-00119-8.pdf

    Journal ref: SpringerOpen Cybersecurity 2022

  9. arXiv:2303.15132  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Cross-utterance ASR Rescoring with Graph-based Label Propagation

    Authors: Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Qianli Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran

    Abstract: We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity. In contrast to conventional neural language model (LM) based ASR rescoring/reranking models, our approach focuses on acoustic information and conducts the rescoring collaboratively among utterances, instead of individually. Experiments on the VCTK da… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: To appear in IEEE ICASSP 2023

    Journal ref: Proc. IEEE ICASSP, June 2023

  10. Adaptive Endpointing with Deep Contextual Multi-armed Bandits

    Authors: Do June Min, Andreas Stolcke, Anirudh Raju, Colin Vaz, Di He, Venkatesh Ravichandran, Viet Anh Trinh

    Abstract: Current endpointing (EP) solutions learn in a supervised framework, which does not allow the model to incorporate feedback and improve in an online setting. Also, it is a common practice to utilize costly grid-search to find the best configuration for an endpointing model. In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Journal ref: Proc. IEEE ICASSP, June 2023

  11. arXiv:2303.07280  [pdf, other

    cs.CV cs.AI cs.LG

    Vision-Language Models as Success Detectors

    Authors: Yuqing Du, Ksenia Konyushkova, Misha Denil, Akhil Raju, Jessica Landon, Felix Hill, Nando de Freitas, Serkan Cabi

    Abstract: Detecting successful behaviour is crucial for training intelligent agents. As such, generalisable reward models are a prerequisite for agents that can learn to generalise their behaviour. In this work we focus on developing robust success detectors that leverage large, pretrained vision-language models (Flamingo, Alayrac et al. (2022)) and human reward annotations. Concretely, we treat success det… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

  12. Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities

    Authors: Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke

    Abstract: As for other forms of AI, speech recognition has recently been examined with respect to performance disparities across different user cohorts. One approach to achieve fairness in speech recognition is to (1) identify speaker cohorts that suffer from subpar performance and (2) apply fairness mitigation measures targeting the cohorts discovered. In this paper, we report on initial findings with both… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: Proc. Interspeech 2022

    Journal ref: Proc. Interspeech, Sept. 2022, pp. 1268-1272

  13. ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale

    Authors: Gopinath Chennupati, Milind Rao, Gurpreet Chadha, Aaron Eakin, Anirudh Raju, Gautam Tiwari, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo, Andy Oberlin, Buddha Nandanoor, Prahalad Venkataramanan, Zheng Wu, Pankaj Sitpure

    Abstract: Incremental learning is one paradigm to enable model building and updating at scale with streaming data. For end-to-end automatic speech recognition (ASR) tasks, the absence of human annotated labels along with the need for privacy preserving policies for model building makes it a daunting challenge. Motivated by these challenges, in this paper we use a cloud based framework for production systems… ▽ More

    Submitted 22 July, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

    Comments: 9 pages

  14. arXiv:2112.06743  [pdf, other

    cs.CL cs.AI

    Attentive Contextual Carryover for Multi-Turn End-to-End Spoken Language Understanding

    Authors: Kai Wei, Thanh Tran, Feng-Ju Chang, Kanthashree Mysore Sathyendra, Thejaswi Muniyappa, Jing Liu, Anirudh Raju, Ross McGowan, Nathan Susanj, Ariya Rastrow, Grant P. Strimel

    Abstract: Recent years have seen significant advances in end-to-end (E2E) spoken language understanding (SLU) systems, which directly predict intents and slots from spoken audio. While dialogue history has been exploited to improve conventional text-based natural language understanding systems, current E2E SLU approaches have not yet incorporated such critical contextual signals in multi-turn and task-orien… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Journal ref: ASRU2021

  15. arXiv:2110.06192  [pdf, other

    cs.RO cs.LG

    Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes

    Authors: Alex X. Lee, Coline Devin, Yuxiang Zhou, Thomas Lampe, Konstantinos Bousmalis, Jost Tobias Springenberg, Arunkumar Byravan, Abbas Abdolmaleki, Nimrod Gileadi, David Khosid, Claudio Fantacci, Jose Enrique Chen, Akhil Raju, Rae Jeong, Michael Neunert, Antoine Laurens, Stefano Saliceti, Federico Casarini, Martin Riedmiller, Raia Hadsell, Francesco Nori

    Abstract: We study the problem of robotic stacking with objects of complex geometry. We propose a challenging and diverse set of such objects that was carefully designed to require strategies beyond a simple "pick-and-place" solution. Our method is a reinforcement learning (RL) approach combined with vision-based interactive policy distillation and simulation-to-reality transfer. Our learned policies can ef… ▽ More

    Submitted 3 November, 2021; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: CoRL 2021. Video: https://dpmd.ai/robotics-stacking-YT . Blog: https://dpmd.ai/robotics-stacking . Code: https://github.com/deepmind/rgb_stacking

  16. arXiv:2106.15919  [pdf, other

    cs.CL cs.SD eess.AS

    On joint training with interfaces for spoken language understanding

    Authors: Anirudh Raju, Milind Rao, Gautam Tiwari, Pranav Dheram, Bryan Anderson, Zhe Zhang, Chul Lee, Bach Bui, Ariya Rastrow

    Abstract: Spoken language understanding (SLU) systems extract both text transcripts and semantics associated with intents and slots from input speech utterances. SLU systems usually consist of (1) an automatic speech recognition (ASR) module, (2) an interface module that exposes relevant outputs from ASR, and (3) a natural language understanding (NLU) module. Interfaces in SLU systems carry information on t… ▽ More

    Submitted 25 July, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

    Comments: Proc. Interspeech 2022

  17. Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End

    Authors: Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo

    Abstract: Comprehending the overall intent of an utterance helps a listener recognize the individual words spoken. Inspired by this fact, we perform a novel study of the impact of explicitly incorporating intent representations as additional information to improve a recurrent neural network-transducer (RNN-T) based automatic speech recognition (ASR) system. An audio-to-intent (A2I) model encodes the intent… ▽ More

    Submitted 16 June, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

    Comments: To appear in Interspeech 2021

    Journal ref: Proc. Interspeech, Sept. 2021, pp. 3455-3459

  18. arXiv:2104.02847  [pdf, other

    cs.CV cs.AI

    Deep Implicit Statistical Shape Models for 3D Medical Image Delineation

    Authors: Ashwin Raju, Shun Miao, Dakai Jin, Le Lu, Junzhou Huang, Adam P. Harrison

    Abstract: 3D delineation of anatomical structures is a cardinal goal in medical imaging analysis. Prior to deep learning, statistical shape models that imposed anatomical constraints and produced high quality surfaces were a core technology. Prior to deep learning, statistical shape models that imposed anatomical constraints and produced high quality surfaces were a core technology. Today fully-convolutiona… ▽ More

    Submitted 4 January, 2022; v1 submitted 6 April, 2021; originally announced April 2021.

  19. arXiv:2102.06750  [pdf, other

    cs.CL eess.AS

    Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding

    Authors: Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke

    Abstract: Spoken language understanding (SLU) systems extract transcriptions, as well as semantics of intent or named entities from speech, and are essential components of voice activated systems. SLU models, which either directly extract semantics from audio or are composed of pipelined automatic speech recognition (ASR) and natural language understanding (NLU) models, are typically trained via differentia… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

    Comments: Proc. IEEE ICASSP 2021

  20. arXiv:2101.01015  [pdf, other

    cs.CR cs.LG

    Echelon: Two-Tier Malware Detection for Raw Executables to Reduce False Alarms

    Authors: Anandharaju Durai Raju, Ke Wang

    Abstract: Existing malware detection approaches suffer from a simplistic trade-off between false positive rate (FPR) and true positive rate (TPR) due to a single tier classification approach, where the two measures adversely affect one another. The practical implication for malware detection is that FPR must be kept at an acceptably low level while TPR remains high. To this end, we propose a two-tiered lear… ▽ More

    Submitted 4 January, 2021; originally announced January 2021.

    Comments: 12 pages, 8 figures, 5 tables

  21. arXiv:2011.11715  [pdf, other

    cs.CL cs.AI cs.LG cs.NE cs.SD eess.AS

    Multi-task Language Modeling for Improving Speech Recognition of Rare Words

    Authors: Chao-Han Huck Yang, Linda Liu, Ankur Gandhe, Yile Gu, Anirudh Raju, Denis Filimonov, Ivan Bulyko

    Abstract: End-to-end automatic speech recognition (ASR) systems are increasingly popular due to their relative architectural simplicity and competitive performance. However, even though the average accuracy of these systems may be high, the performance on rare content words often lags behind hybrid ASR systems. To address this problem, second-pass rescoring is often applied leveraging upon language modeling… ▽ More

    Submitted 11 September, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

    Comments: Accepted to IEEE Automatic Speech Recognition and Understanding (ASRU) 2021

  22. arXiv:2009.02455  [pdf, other

    cs.CV cs.AI cs.LG

    User-Guided Domain Adaptation for Rapid Annotation from User Interactions: A Study on Pathological Liver Segmentation

    Authors: Ashwin Raju, Zhanghexuan Ji, Chi Tung Cheng, Jinzheng Cai, Junzhou Huang, Jing Xiao, Le Lu, ChienHung Liao, Adam P. Harrison

    Abstract: Mask-based annotation of medical images, especially for 3D data, is a bottleneck in developing reliable machine learning models. Using minimal-labor user interactions (UIs) to guide the annotation is promising, but challenges remain on best harmonizing the mask prediction with the UIs. To address this, we propose the user-guided domain adaptation (UGDA) framework, which uses prediction-based adver… ▽ More

    Submitted 5 September, 2020; originally announced September 2020.

  23. arXiv:2009.01004  [pdf, other

    cs.CL cs.LG stat.ML

    FAT ALBERT: Finding Answers in Large Texts using Semantic Similarity Attention Layer based on BERT

    Authors: Omar Mossad, Amgad Ahmed, Anandharaju Raju, Hari Karthikeyan, Zayed Ahmed

    Abstract: Machine based text comprehension has always been a significant research field in natural language processing. Once a full understanding of the text context and semantics is achieved, a deep learning model can be trained to solve a large subset of tasks, e.g. text summarization, classification and question answering. In this paper we focus on the question answering problem, specifically the multipl… ▽ More

    Submitted 22 August, 2020; originally announced September 2020.

    Comments: source code available: https://github.com/omossad/fat-albert

  24. Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces

    Authors: Milind Rao, Anirudh Raju, Pranav Dheram, Bach Bui, Ariya Rastrow

    Abstract: We consider the problem of spoken language understanding (SLU) of extracting natural language intents and associated slot arguments or named entities from speech that is primarily directed at voice assistants. Such a system subsumes both automatic speech recognition (ASR) as well as natural language understanding (NLU). An end-to-end joint SLU model can be built to a required specification opening… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

    Comments: Proceedings of INTERSPEECH

    ACM Class: I.2.7

    Journal ref: Proc. Interspeech 2020, 876-880 (2020)

  25. arXiv:2006.15691  [pdf, other

    cs.CV

    Harvesting, Detecting, and Characterizing Liver Lesions from Large-scale Multi-phase CT Data via Deep Dynamic Texture Learning

    Authors: Yuankai Huo, Jinzheng Cai, Chi-Tung Cheng, Ashwin Raju, Ke Yan, Bennett A. Landman, Jing Xiao, Le Lu, Chien-Hung Liao, Adam P. Harrison

    Abstract: Non-invasive radiological-based lesion characterization and identification, e.g., to differentiate cancer subtypes, has long been a major aim to enhance oncological diagnosis and treatment procedures. Here we study a specific population of human subjects, with the hope of reducing the need for invasive surgical biopsies of liver cancer patients, which can cause many harmful side-effects. To this e… ▽ More

    Submitted 30 August, 2020; v1 submitted 28 June, 2020; originally announced June 2020.

  26. arXiv:2005.13201  [pdf, other

    eess.IV cs.CV

    Co-Heterogeneous and Adaptive Segmentation from Multi-Source and Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion Segmentation

    Authors: Ashwin Raju, Chi-Tung Cheng, Yunakai Huo, Jinzheng Cai, Junzhou Huang, Jing Xiao, Le Lu, ChienHuang Liao, Adam P Harrison

    Abstract: In medical imaging, organ/pathology segmentation models trained on current publicly available and fully-annotated datasets usually do not well-represent the heterogeneous modalities, phases, pathologies, and clinical scenarios encountered in real environments. On the other hand, there are tremendous amounts of unlabelled patient imaging scans stored by many modern clinical centers. In this work, w… ▽ More

    Submitted 19 July, 2021; v1 submitted 27 May, 2020; originally announced May 2020.

    Comments: 23 pages, 8 figures

  27. arXiv:2005.12209  [pdf, other

    eess.IV cs.CV

    JSSR: A Joint Synthesis, Segmentation, and Registration System for 3D Multi-Modal Image Alignment of Large-scale Pathological CT Scans

    Authors: Fengze Liu, Jinzheng Cai, Yuankai Huo, Chi-Tung Cheng, Ashwin Raju, Dakai Jin, Jing Xiao, Alan Yuille, Le Lu, ChienHung Liao, Adam P Harrison

    Abstract: Multi-modal image registration is a challenging problem that is also an important clinical task for many real applications and scenarios. As a first step in analysis, deformable registration among different image modalities is often required in order to provide complementary visual information. During registration, semantic information is key to match homologous points and pixels. Nevertheless, ma… ▽ More

    Submitted 17 July, 2020; v1 submitted 25 May, 2020; originally announced May 2020.

    Comments: accepted to ECCV 2020

  28. arXiv:1907.01677  [pdf, ps, other

    cs.CL cs.LG

    Scalable Multi Corpora Neural Language Models for ASR

    Authors: Anirudh Raju, Denis Filimonov, Gautam Tiwari, Guitang Lan, Ariya Rastrow

    Abstract: Neural language models (NLM) have been shown to outperform conventional n-gram language models by a substantial margin in Automatic Speech Recognition (ASR) and other tasks. There are, however, a number of challenges that need to be addressed for an NLM to be used in a practical large-scale ASR system. In this paper, we present solutions to some of the challenges, including training NLM from heter… ▽ More

    Submitted 2 July, 2019; originally announced July 2019.

    Comments: Interspeech 2019 (accepted: oral)

    ACM Class: I.2.7

  29. arXiv:1901.02348  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning

    Authors: Ladislav Mošner, Minhua Wu, Anirudh Raju, Sree Hari Krishnan Parthasarathi, Kenichi Kumatani, Shiva Sundaram, Roland Maas, Björn Hoffmeister

    Abstract: For real-world speech recognition applications, noise robustness is still a challenge. In this work, we adopt the teacher-student (T/S) learning technique using a parallel clean and noisy corpus for improving automatic speech recognition (ASR) performance under multimedia noise. On top of that, we apply a logits selection method which only preserves the k highest values to prevent wrong emphasis o… ▽ More

    Submitted 15 March, 2019; v1 submitted 5 January, 2019; originally announced January 2019.

    Comments: To Appear in ICASSP 2019

  30. arXiv:1808.00563  [pdf, other

    cs.CL cs.LG stat.ML

    Data Augmentation for Robust Keyword Spotting under Playback Interference

    Authors: Anirudh Raju, Sankaran Panchapagesan, Xing Liu, Arindam Mandal, Nikko Strom

    Abstract: Accurate on-device keyword spotting (KWS) with low false accept and false reject rate is crucial to customer experience for far-field voice control of conversational agents. It is particularly challenging to maintain low false reject rate in real world conditions where there is (a) ambient noise from external sources such as TV, household appliances, or other speech that is not directed at the dev… ▽ More

    Submitted 1 August, 2018; originally announced August 2018.

  31. Contextual Language Model Adaptation for Conversational Agents

    Authors: Anirudh Raju, Behnam Hedayatnia, Linda Liu, Ankur Gandhe, Chandra Khatri, Angeliki Metallinou, Anu Venkatesh, Ariya Rastrow

    Abstract: Statistical language models (LM) play a key role in Automatic Speech Recognition (ASR) systems used by conversational agents. These ASR systems should provide a high accuracy under a variety of speaking styles, domains, vocabulary and argots. In this paper, we present a DNN-based method to adapt the LM to each user-agent interaction based on generalized contextual information, by predicting an opt… ▽ More

    Submitted 31 July, 2018; v1 submitted 26 June, 2018; originally announced June 2018.

    Comments: Interspeech 2018 (accepted)

    ACM Class: I.2.7

    Journal ref: Proc. Interspeech 2018, 3333-3337

  32. arXiv:1801.03625  [pdf, ps, other

    cs.CL cs.AI cs.CY cs.HC cs.MA

    On Evaluating and Comparing Open Domain Dialog Systems

    Authors: Anu Venkatesh, Chandra Khatri, Ashwin Ram, Fenfei Guo, Raefer Gabriel, Ashish Nagar, Rohit Prasad, Ming Cheng, Behnam Hedayatnia, Angeliki Metallinou, Rahul Goel, Shaohua Yang, Anirudh Raju

    Abstract: Conversational agents are exploding in popularity. However, much work remains in the area of non goal-oriented conversations, despite significant growth in research interest over recent years. To advance the state of the art in conversational AI, Amazon launched the Alexa Prize, a 2.5-million dollar university competition where sixteen selected university teams built conversational agents to deliv… ▽ More

    Submitted 26 December, 2018; v1 submitted 10 January, 2018; originally announced January 2018.

    Comments: 10 pages, 5 tables. NIPS 2017 Conversational AI workshop. http://alborz-geramifard.com/workshops/nips17-Conversational-AI/Main.html

    MSC Class: 97R40 ACM Class: I.2.7

    Journal ref: NIPS.Workshop.ConversationalAI 2017-12-08 http://alborz-geramifard.com/workshops/nips17-Conversational-AI/Main.html accessed 2018-01-01

  33. arXiv:1801.03622  [pdf, other

    cs.CL cs.AI cs.CY cs.HC cs.MA

    Topic-based Evaluation for Conversational Bots

    Authors: Fenfei Guo, Angeliki Metallinou, Chandra Khatri, Anirudh Raju, Anu Venkatesh, Ashwin Ram

    Abstract: Dialog evaluation is a challenging problem, especially for non task-oriented dialogs where conversational success is not well-defined. We propose to evaluate dialog quality using topic-based metrics that describe the ability of a conversational bot to sustain coherent and engaging conversations on a topic, and the diversity of topics that a bot can handle. To detect conversation topics per utteran… ▽ More

    Submitted 10 January, 2018; originally announced January 2018.

    Comments: 10 Pages, 2 figures, 9 tables. NIPS 2017 Conversational AI workshop paper. http://alborz-geramifard.com/workshops/nips17-Conversational-AI/Main.html

    MSC Class: 97R40 ACM Class: I.2.7

    Journal ref: Nips.Workshop.ConversationalAI 2017-12-08

  34. arXiv:1705.02411  [pdf, other

    cs.CL cs.LG stat.ML

    Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting

    Authors: Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, Gengshen Fu, Arindam Mandal, Spyros Matsoukas, Nikko Strom, Shiv Vitaladevuni

    Abstract: We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance.… ▽ More

    Submitted 5 May, 2017; originally announced May 2017.

    Journal ref: Spoken Language Technology Workshop (SLT), 2016 IEEE (pp. 474-480). IEEE