Zum Hauptinhalt springen

Showing 1–13 of 13 results for author: Prasanna, S R M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09494  [pdf, other

    eess.AS cs.LG

    The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments

    Authors: Shareef Babu Kalluri, Prachi Singh, Pratik Roy Chowdhuri, Apoorva Kulkarni, Shikha Baghel, Pradyoth Hegde, Swapnil Sontakke, Deepak K T, S. R. Mahadeva Prasanna, Deepu Vijayasenan, Sriram Ganapathy

    Abstract: The DIarization of SPeaker and LAnguage in Conversational Environments (DISPLACE) 2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of speaker diarization (SD) and language diarization (LD) on a challenging multilingual conversational speech dataset. In the DISPLACE 2024 challenge, we also introduced the task of automatic speech recognition (ASR) on this datas… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, Interspeech 2024

  2. arXiv:2308.10470  [pdf, other

    eess.AS cs.CL cs.SD

    Implicit Self-supervised Language Representation for Spoken Language Diarization

    Authors: Jagabandhu Mishra, S. R. Mahadeva Prasanna

    Abstract: In a code-switched (CS) scenario, the use of spoken language diarization (LD) as a pre-possessing system is essential. Further, the use of implicit frameworks is preferable over the explicit framework, as it can be easily adapted to deal with low/zero resource languages. Inspired by speaker diarization (SD) literature, three frameworks based on (1) fixed segmentation, (2) change point-based segmen… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Planning to Submit in IEEE-JSTSP

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing 2024

  3. arXiv:2306.12913  [pdf, other

    eess.AS cs.CL cs.SD

    Implicit spoken language diarization

    Authors: Jagabandhu Mishra, Amartya Chowdhury, S. R. Mahadeva Prasanna

    Abstract: Spoken language diarization (LD) and related tasks are mostly explored using the phonotactic approach. Phonotactic approaches mostly use explicit way of language modeling, hence requiring intermediate phoneme modeling and transcribed data. Alternatively, the ability of deep learning approaches to model temporal dynamics may help for the implicit modeling of language information through deep embedd… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

  4. arXiv:2302.13209  [pdf, other

    eess.AS cs.SD

    I-MSV 2022: Indic-Multilingual and Multi-sensor Speaker Verification Challenge

    Authors: Jagabandhu Mishra, Mrinmoy Bhattacharjee, S. R. Mahadeva Prasanna

    Abstract: Speaker Verification (SV) is a task to verify the claimed identity of the claimant using his/her voice sample. Though there exists an ample amount of research in SV technologies, the development concerning a multilingual conversation is limited. In a country like India, almost all the speakers are polyglot in nature. Consequently, the development of a Multilingual SV (MSV) system on the data colle… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

  5. Spoken language change detection inspired by speaker change detection

    Authors: Jagabandhu Mishra, S. R. Mahadeva Prasanna

    Abstract: Spoken language change detection (LCD) refers to identifying the language transitions in a code-switched utterance. Similarly, identifying the speaker transitions in a multispeaker utterance is known as speaker change detection (SCD). Since tasks-wise both are similar, the architecture/framework developed for the SCD task may be suitable for the LCD task. Hence, the aim of the present work is to d… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

  6. arXiv:2203.02680   

    eess.AS cs.SD eess.SP

    Language vs Speaker Change: A Comparative Study

    Authors: Jagabandhu Mishra, S. R. Mahadeva Prasanna

    Abstract: Spoken language change detection (LCD) refers to detecting language switching points in a multilingual speech signal. Speaker change detection (SCD) refers to locating the speaker change points in a multispeaker speech signal. The objective of this work is to understand the challenges in LCD task by comparing it with SCD task. Human subjective study for change detection is performed for LCD and SC… ▽ More

    Submitted 6 October, 2023; v1 submitted 5 March, 2022; originally announced March 2022.

    Comments: The work is substantially modified. The new version of the same will be submitted soon

  7. arXiv:2110.00797  [pdf, other

    eess.AS cs.SD

    Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition

    Authors: Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna

    Abstract: The automatic recognition of pathological speech, particularly from children with any articulatory impairment, is a challenging task due to various reasons. The lack of available domain specific data is one such obstacle that hinders its usage for different speech-based applications targeting pathological speakers. In line with the challenge, in this work, we investigate a few data augmentation te… ▽ More

    Submitted 2 October, 2021; originally announced October 2021.

  8. arXiv:2110.00794  [pdf, other

    cs.SD eess.AS q-bio.QM

    Processing Phoneme Specific Segments for Cleft Lip and Palate Speech Enhancement

    Authors: Protima Nomo Sudro, Rohit Sinha, S. R. Mahadeva Prasanna

    Abstract: The cleft lip and palate (CLP) speech intelligibility is distorted due to the deformation in their articulatory system. For addressing the same, a few previous works perform phoneme specific modification in CLP speech. In CLP speech, both the articulation error and the nasalization distorts the intelligibility of a word. Consequently, modification of a specific phoneme may not always yield in enha… ▽ More

    Submitted 2 October, 2021; originally announced October 2021.

  9. arXiv:2109.04138  [pdf, other

    cs.CR cs.CV

    Multilingual Audio-Visual Smartphone Dataset And Evaluation

    Authors: Hareesh Mandalapu, Aravinda Reddy P N, Raghavendra Ramachandra, K Sreenivasa Rao, Pabitra Mitra, S R Mahadeva Prasanna, Christoph Busch

    Abstract: Smartphones have been employed with biometric-based verification systems to provide security in highly sensitive applications. Audio-visual biometrics are getting popular due to their usability, and also it will be challenging to spoof because of their multimodal nature. In this work, we present an audio-visual smartphone dataset captured in five different recent smartphones. This new dataset cont… ▽ More

    Submitted 15 November, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

  10. Sonority Measurement Using System, Source, and Suprasegmental Information

    Authors: Bidisha Sharma, S. R. Mahadeva Prasanna

    Abstract: Sonorant sounds are characterized by regions with prominent formant structure, high energy and high degree of periodicity. In this work, the vocal-tract system, excitation source and suprasegmental features derived from the speech signal are analyzed to measure the sonority information present in each of them. Vocal-tract system information is extracted from the Hilbert envelope of numerator of gr… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing ( Volume: 25, Issue: 3, March 2017)

  11. Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey

    Authors: Hareesh Mandalapu, P N Aravinda Reddy, Raghavendra Ramachandra, K Sreenivasa Rao, Pabitra Mitra, S R Mahadeva Prasanna, Christoph Busch

    Abstract: Biometric recognition is a trending technology that uses unique characteristics data to identify or verify/authenticate security applications. Amidst the classically used biometrics, voice and face attributes are the most propitious for prevalent applications in day-to-day life because they are easy to obtain through restrained and user-friendly procedures. The pervasiveness of low-cost audio and… ▽ More

    Submitted 12 March, 2021; v1 submitted 24 January, 2021; originally announced January 2021.

    Journal ref: in IEEE Access, vol. 9, pp. 37431-37455, 2021

  12. arXiv:2101.05806  [pdf, other

    cs.CV

    Exploration of Visual Features and their weighted-additive fusion for Video Captioning

    Authors: Praveen S V, Akhilesh Bharadwaj, Harsh Raj, Janhavi Dadhania, Ganesh Samarth C. A, Nikhil Pareek, S R M Prasanna

    Abstract: Video captioning is a popular task that challenges models to describe events in videos using natural language. In this work, we investigate the ability of various visual feature representations derived from state-of-the-art convolutional neural networks to capture high-level semantic context. We introduce the Weighted Additive Fusion Transformer with Memory Augmented Encoders (WAFTM), a captioning… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

    Comments: 6 pages

  13. arXiv:1811.01222  [pdf, ps, other

    eess.AS cs.SD

    Time-Frequency Audio Features for Speech-Music Classification

    Authors: Mrinmoy Bhattacharjee, S. R. M. Prasanna, Prithwijit Guha

    Abstract: Distinct striation patterns are observed in the spectrograms of speech and music. This motivated us to propose three novel time-frequency features for speech-music classification. These features are extracted in two stages. First, a preset number of prominent spectral peak locations are identified from the spectra of each frame. These important peak locations obtained from each frame are used to f… ▽ More

    Submitted 3 November, 2018; originally announced November 2018.

    Comments: 4 pages, 16 figures