Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: Shaik, M M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.07338  [pdf, other

    cs.CL cs.SD eess.AS

    Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification

    Authors: Mohammed Maqsood Shaik, Dietrich Klakow, Badr M. Abdullah

    Abstract: Pre-trained Transformer-based speech models have shown striking performance when fine-tuned on various downstream tasks such as automatic speech recognition and spoken language identification (SLID). However, the problem of domain mismatch remains a challenge in this area, where the domain of the pre-training data might differ from that of the downstream labeled data used for fine-tuning. In multi… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Submitted to ICASSP 2024

  2. arXiv:2306.02405  [pdf, other

    cs.CL

    An Information-Theoretic Analysis of Self-supervised Discrete Representations of Speech

    Authors: Badr M. Abdullah, Mohammed Maqsood Shaik, Bernd Möbius, Dietrich Klakow

    Abstract: Self-supervised representation learning for speech often involves a quantization step that transforms the acoustic input into discrete units. However, it remains unclear how to characterize the relationship between these discrete units and abstract phonetic categories such as phonemes. In this paper, we develop an information-theoretic framework whereby we represent each phonetic category as a dis… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: Accepted in Interspeech 2023

  3. arXiv:2306.01656  [pdf, other

    cs.CV cs.HC

    Backchannel Detection and Agreement Estimation from Video with Transformer Networks

    Authors: Ahmed Amer, Chirag Bhuvaneshwara, Gowtham K. Addluri, Mohammed M. Shaik, Vedant Bonde, Philipp Müller

    Abstract: Listeners use short interjections, so-called backchannels, to signify attention or express agreement. The automatic analysis of this behavior is of key importance for human conversation analysis and interactive conversational agents. Current state-of-the-art approaches for backchannel analysis from visual behavior make use of two types of features: features based on body pose and features based on… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted at IEEE IJCNN'23