Zum Hauptinhalt springen

Showing 1–21 of 21 results for author: Prasanna, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.02297  [pdf, other

    cs.RO cs.CV

    Perception Matters: Enhancing Embodied AI with Uncertainty-Aware Semantic Segmentation

    Authors: Sai Prasanna, Daniel Honerkamp, Kshitij Sirohi, Tim Welschehold, Wolfram Burgard, Abhinav Valada

    Abstract: Embodied AI has made significant progress acting in unexplored environments. However, tasks such as object search have largely focused on efficient policy learning. In this work, we identify several gaps in current search methods: They largely focus on dated perception models, neglect temporal aggregation, and transfer from ground truth directly to noisy perception at test time, without accounting… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  2. arXiv:2407.20879  [pdf, other

    cs.AI q-bio.QM

    A Scalable Tool For Analyzing Genomic Variants Of Humans Using Knowledge Graphs and Machine Learning

    Authors: Shivika Prasanna, Ajay Kumar, Deepthi Rao, Eduardo Simoes, Praveen Rao

    Abstract: The integration of knowledge graphs and graph machine learning (GML) in genomic data analysis offers several opportunities for understanding complex genetic relationships, especially at the RNA level. We present a comprehensive approach for leveraging these technologies to analyze genomic variants, specifically in the context of RNA sequencing (RNA-seq) data from COVID-19 patient samples. The prop… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2312.04423

  3. arXiv:2406.09494  [pdf, other

    eess.AS cs.LG

    The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments

    Authors: Shareef Babu Kalluri, Prachi Singh, Pratik Roy Chowdhuri, Apoorva Kulkarni, Shikha Baghel, Pradyoth Hegde, Swapnil Sontakke, Deepak K T, S. R. Mahadeva Prasanna, Deepu Vijayasenan, Sriram Ganapathy

    Abstract: The DIarization of SPeaker and LAnguage in Conversational Environments (DISPLACE) 2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of speaker diarization (SD) and language diarization (LD) on a challenging multilingual conversational speech dataset. In the DISPLACE 2024 challenge, we also introduced the task of automatic speech recognition (ASR) on this datas… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, Interspeech 2024

  4. arXiv:2403.10967  [pdf, other

    cs.LG cs.AI

    Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization

    Authors: Sai Prasanna, Karim Farid, Raghu Rajan, André Biedenkapp

    Abstract: Zero-shot generalization (ZSG) to unseen dynamics is a major challenge for creating generally capable embodied agents. To address the broader challenge, we start with the simpler setting of contextual reinforcement learning (cRL), assuming observability of the context values that parameterize the variation in the system's dynamics, such as the mass or dimensions of a robot, without making further… ▽ More

    Submitted 3 August, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: In Reinforcement Learning Conference, 2024. 33 pages

  5. arXiv:2312.04423  [pdf, other

    cs.AI cs.DB q-bio.QM

    Scalable Knowledge Graph Construction and Inference on Human Genome Variants

    Authors: Shivika Prasanna, Deepthi Rao, Eduardo Simoes, Praveen Rao

    Abstract: Real-world knowledge can be represented as a graph consisting of entities and relationships between the entities. The need for efficient and scalable solutions arises when dealing with vast genomic data, like RNA-sequencing. Knowledge graphs offer a powerful approach for various tasks in such large-scale genomic data, such as analysis and inference. In this work, variant-level information extracte… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  6. arXiv:2308.10470  [pdf, other

    eess.AS cs.CL cs.SD

    Implicit Self-supervised Language Representation for Spoken Language Diarization

    Authors: Jagabandhu Mishra, S. R. Mahadeva Prasanna

    Abstract: In a code-switched (CS) scenario, the use of spoken language diarization (LD) as a pre-possessing system is essential. Further, the use of implicit frameworks is preferable over the explicit framework, as it can be easily adapted to deal with low/zero resource languages. Inspired by speaker diarization (SD) literature, three frameworks based on (1) fixed segmentation, (2) change point-based segmen… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Planning to Submit in IEEE-JSTSP

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing 2024

  7. arXiv:2306.12913  [pdf, other

    eess.AS cs.CL cs.SD

    Implicit spoken language diarization

    Authors: Jagabandhu Mishra, Amartya Chowdhury, S. R. Mahadeva Prasanna

    Abstract: Spoken language diarization (LD) and related tasks are mostly explored using the phonotactic approach. Phonotactic approaches mostly use explicit way of language modeling, hence requiring intermediate phoneme modeling and transcribed data. Alternatively, the ability of deep learning approaches to model temporal dynamics may help for the implicit modeling of language information through deep embedd… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

  8. arXiv:2302.13209  [pdf, other

    eess.AS cs.SD

    I-MSV 2022: Indic-Multilingual and Multi-sensor Speaker Verification Challenge

    Authors: Jagabandhu Mishra, Mrinmoy Bhattacharjee, S. R. Mahadeva Prasanna

    Abstract: Speaker Verification (SV) is a task to verify the claimed identity of the claimant using his/her voice sample. Though there exists an ample amount of research in SV technologies, the development concerning a multilingual conversation is limited. In a country like India, almost all the speakers are polyglot in nature. Consequently, the development of a Multilingual SV (MSV) system on the data colle… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

  9. Spoken language change detection inspired by speaker change detection

    Authors: Jagabandhu Mishra, S. R. Mahadeva Prasanna

    Abstract: Spoken language change detection (LCD) refers to identifying the language transitions in a code-switched utterance. Similarly, identifying the speaker transitions in a multispeaker utterance is known as speaker change detection (SCD). Since tasks-wise both are similar, the architecture/framework developed for the SCD task may be suitable for the LCD task. Hence, the aim of the present work is to d… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

  10. arXiv:2203.02680   

    eess.AS cs.SD eess.SP

    Language vs Speaker Change: A Comparative Study

    Authors: Jagabandhu Mishra, S. R. Mahadeva Prasanna

    Abstract: Spoken language change detection (LCD) refers to detecting language switching points in a multilingual speech signal. Speaker change detection (SCD) refers to locating the speaker change points in a multispeaker speech signal. The objective of this work is to understand the challenges in LCD task by comparing it with SCD task. Human subjective study for change detection is performed for LCD and SC… ▽ More

    Submitted 6 October, 2023; v1 submitted 5 March, 2022; originally announced March 2022.

    Comments: The work is substantially modified. The new version of the same will be submitted soon

  11. arXiv:2110.00797  [pdf, other

    eess.AS cs.SD

    Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition

    Authors: Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna

    Abstract: The automatic recognition of pathological speech, particularly from children with any articulatory impairment, is a challenging task due to various reasons. The lack of available domain specific data is one such obstacle that hinders its usage for different speech-based applications targeting pathological speakers. In line with the challenge, in this work, we investigate a few data augmentation te… ▽ More

    Submitted 2 October, 2021; originally announced October 2021.

  12. arXiv:2110.00794  [pdf, other

    cs.SD eess.AS q-bio.QM

    Processing Phoneme Specific Segments for Cleft Lip and Palate Speech Enhancement

    Authors: Protima Nomo Sudro, Rohit Sinha, S. R. Mahadeva Prasanna

    Abstract: The cleft lip and palate (CLP) speech intelligibility is distorted due to the deformation in their articulatory system. For addressing the same, a few previous works perform phoneme specific modification in CLP speech. In CLP speech, both the articulation error and the nasalization distorts the intelligibility of a word. Consequently, modification of a specific phoneme may not always yield in enha… ▽ More

    Submitted 2 October, 2021; originally announced October 2021.

  13. arXiv:2109.04138  [pdf, other

    cs.CR cs.CV

    Multilingual Audio-Visual Smartphone Dataset And Evaluation

    Authors: Hareesh Mandalapu, Aravinda Reddy P N, Raghavendra Ramachandra, K Sreenivasa Rao, Pabitra Mitra, S R Mahadeva Prasanna, Christoph Busch

    Abstract: Smartphones have been employed with biometric-based verification systems to provide security in highly sensitive applications. Audio-visual biometrics are getting popular due to their usability, and also it will be challenging to spoof because of their multimodal nature. In this work, we present an audio-visual smartphone dataset captured in five different recent smartphones. This new dataset cont… ▽ More

    Submitted 15 November, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

  14. Sonority Measurement Using System, Source, and Suprasegmental Information

    Authors: Bidisha Sharma, S. R. Mahadeva Prasanna

    Abstract: Sonorant sounds are characterized by regions with prominent formant structure, high energy and high degree of periodicity. In this work, the vocal-tract system, excitation source and suprasegmental features derived from the speech signal are analyzed to measure the sonority information present in each of them. Vocal-tract system information is extracted from the Hilbert envelope of numerator of gr… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing ( Volume: 25, Issue: 3, March 2017)

  15. Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey

    Authors: Hareesh Mandalapu, P N Aravinda Reddy, Raghavendra Ramachandra, K Sreenivasa Rao, Pabitra Mitra, S R Mahadeva Prasanna, Christoph Busch

    Abstract: Biometric recognition is a trending technology that uses unique characteristics data to identify or verify/authenticate security applications. Amidst the classically used biometrics, voice and face attributes are the most propitious for prevalent applications in day-to-day life because they are easy to obtain through restrained and user-friendly procedures. The pervasiveness of low-cost audio and… ▽ More

    Submitted 12 March, 2021; v1 submitted 24 January, 2021; originally announced January 2021.

    Journal ref: in IEEE Access, vol. 9, pp. 37431-37455, 2021

  16. arXiv:2101.05806  [pdf, other

    cs.CV

    Exploration of Visual Features and their weighted-additive fusion for Video Captioning

    Authors: Praveen S V, Akhilesh Bharadwaj, Harsh Raj, Janhavi Dadhania, Ganesh Samarth C. A, Nikhil Pareek, S R M Prasanna

    Abstract: Video captioning is a popular task that challenges models to describe events in videos using natural language. In this work, we investigate the ability of various visual feature representations derived from state-of-the-art convolutional neural networks to capture high-level semantic context. We introduce the Weighted Additive Fusion Transformer with Memory Augmented Encoders (WAFTM), a captioning… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

    Comments: 6 pages

  17. arXiv:2005.00561  [pdf, other

    cs.CL cs.LG

    When BERT Plays the Lottery, All Tickets Are Winning

    Authors: Sai Prasanna, Anna Rogers, Anna Rumshisky

    Abstract: Large Transformer-based models were shown to be reducible to a smaller number of self-attention heads and layers. We consider this phenomenon from the perspective of the lottery ticket hypothesis, using both structured and magnitude pruning. For fine-tuned BERT, we show that (a) it is possible to find subnetworks achieving performance that is comparable with that of the full model, and (b) similar… ▽ More

    Submitted 24 October, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

    Comments: EMNLP 2020 camera-ready

  18. arXiv:1909.12734  [pdf, other

    cs.LG cs.CV stat.ML

    Maximal adversarial perturbations for obfuscation: Hiding certain attributes while preserving rest

    Authors: Indu Ilanchezian, Praneeth Vepakomma, Abhishek Singh, Otkrist Gupta, G. N. Srinivasa Prasanna, Ramesh Raskar

    Abstract: In this paper we investigate the usage of adversarial perturbations for the purpose of privacy from human perception and model (machine) based detection. We employ adversarial perturbations for obfuscating certain variables in raw data while preserving the rest. Current adversarial perturbation methods are used for data poisoning with minimal perturbations of the raw data such that the machine lea… ▽ More

    Submitted 27 September, 2019; originally announced September 2019.

  19. arXiv:1902.10623  [pdf, other

    cs.CL

    Zoho at SemEval-2019 Task 9: Semi-supervised Domain Adaptation using Tri-training for Suggestion Mining

    Authors: Sai Prasanna, Sri Ananda Seelan

    Abstract: This paper describes our submission for the SemEval-2019 Suggestion Mining task. A simple Convolutional Neural Network (CNN) classifier with contextual word representations from a pre-trained language model was used for sentence classification. The model is trained using tri-training, a semi-supervised bootstrapping mechanism for labelling unseen data. Tri-training proved to be an effective techni… ▽ More

    Submitted 6 April, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

    Comments: NAACL 2019

  20. arXiv:1811.01222  [pdf, ps, other

    eess.AS cs.SD

    Time-Frequency Audio Features for Speech-Music Classification

    Authors: Mrinmoy Bhattacharjee, S. R. M. Prasanna, Prithwijit Guha

    Abstract: Distinct striation patterns are observed in the spectrograms of speech and music. This motivated us to propose three novel time-frequency features for speech-music classification. These features are extracted in two stages. First, a preset number of prominent spectral peak locations are identified from the spectra of each frame. These important peak locations obtained from each frame are used to f… ▽ More

    Submitted 3 November, 2018; originally announced November 2018.

    Comments: 4 pages, 16 figures

  21. arXiv:1407.2390  [pdf

    cs.CV

    Online Stroke and Akshara Recognition GUI in Assamese Language Using Hidden Markov Model

    Authors: SRM Prasanna, Rituparna Devi, Deepjoy Das, Subhankar Ghosh, Krishna Naik

    Abstract: The work describes the development of Online Assamese Stroke & Akshara Recognizer based on a set of language rules. In handwriting literature strokes are composed of two coordinate trace in between pen down and pen up labels. The Assamese aksharas are combination of a number of strokes, the maximum number of strokes taken to make a combination being eight. Based on these combinations eight languag… ▽ More

    Submitted 9 July, 2014; originally announced July 2014.

    Comments: 6 pages, 9 figures, International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014