Zum Hauptinhalt springen

Showing 1–11 of 11 results for author: Zeineldeen, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2402.15594  [pdf, other

    cs.CL cs.SD eess.AS

    Alternating Weak Triphone/BPE Alignment Supervision from Hybrid Model Improves End-to-End ASR

    Authors: Jintao Jiang, Yingbo Gao, Mohammad Zeineldeen, Zoltan Tuske

    Abstract: In this paper, alternating weak triphone/BPE alignment supervision is proposed to improve end-to-end model training. Towards this end, triphone and BPE alignments are extracted using a pre-existing hybrid ASR system. Then, regularization effect is obtained by cross-entropy based intermediate auxiliary losses computed on such alignments at a mid-layer representation of the encoder for triphone alig… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 5 pages, 1 figure, 3 tables

  2. arXiv:2309.08436  [pdf, other

    eess.AS cs.SD stat.ML

    Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

    Authors: Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney

    Abstract: We study a streamable attention-based encoder-decoder model in which either the decoder, or both the encoder and decoder, operate on pre-defined, fixed-size windows called chunks. A special end-of-chunk (EOC) symbol advances from one chunk to the next chunk, effectively replacing the conventional end-of-sequence symbol. This modification, while minor, situates our model as equivalent to a transduc… ▽ More

    Submitted 17 January, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted at ICASSP 2024

  3. arXiv:2303.05958  [pdf, ps, other

    cs.CL cs.SD eess.AS stat.ML

    Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss

    Authors: Mohammad Zeineldeen, Kartik Audhkhasi, Murali Karthick Baskar, Bhuvana Ramabhadran

    Abstract: This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNN-T) models. In hard distillation, a teacher model transcribes large amounts of unlabelled speech to train a student model. Soft distillation is another popular KD method that distills the output logits of the teacher model. Due to the nature of RNN-T alignments, applying soft dis… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: Accepted at ICASSP 2023

  4. arXiv:2301.04571  [pdf, other

    cs.CL eess.AS stat.ML

    Analyzing And Improving Neural Speaker Embeddings for ASR

    Authors: Christoph Lüscher, Jingjing Xu, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney

    Abstract: Neural speaker embeddings encode the speaker's speech characteristics through a DNN model and are prevalent for speaker verification tasks. However, few studies have investigated the usage of neural speaker embeddings for an ASR system. In this work, we present our efforts w.r.t integrating neural speaker embeddings into a conformer based hybrid HMM ASR system. For ASR, our improved embedding extr… ▽ More

    Submitted 20 September, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

    Comments: Accepted at ITG Speech Communications 2023

  5. arXiv:2211.06369  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Enhancing and Adversarial: Improve ASR with Speaker Labels

    Authors: Wei Zhou, Haotian Wu, Jingjing Xu, Mohammad Zeineldeen, Christoph Lüscher, Ralf Schlüter, Hermann Ney

    Abstract: ASR can be improved by multi-task learning (MTL) with domain enhancing or domain adversarial training, which are two opposite objectives with the aim to increase/decrease domain variance towards domain-aware/agnostic ASR, respectively. In this work, we study how to best apply these two opposite objectives with speaker labels to improve conformer-based ASR. We also propose a novel adaptive gradient… ▽ More

    Submitted 24 February, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: accepted at ICASSP 2023

  6. arXiv:2210.13397  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech

    Authors: Christoph Lüscher, Mohammad Zeineldeen, Zijian Yang, Tina Raissi, Peter Vieting, Khai Le-Duc, Weiyue Wang, Ralf Schlüter, Hermann Ney

    Abstract: Language barriers present a great challenge in our increasingly connected and global world. Especially within the medical domain, e.g. hospital or emergency room, communication difficulties and delays may lead to malpractice and non-optimal patient care. In the HYKIST project, we consider patient-physician communication, more specifically between a German-speaking physician and an Arabic- or Vietn… ▽ More

    Submitted 22 September, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: ASR System Paper for HYKIST project

  7. arXiv:2206.12955  [pdf, other

    cs.CL eess.AS stat.ML

    Improving the Training Recipe for a Robust Conformer-based Hybrid Model

    Authors: Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney

    Abstract: Speaker adaptation is important to build robust automatic speech recognition (ASR) systems. In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset. We propose a method, called Weighted-Simple-Add, which adds weighted speaker information vectors to the input of the m… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

    Comments: Accepted at INTERSPEECH 2022

  8. arXiv:2111.03442  [pdf, other

    cs.CL eess.AS stat.ML

    Conformer-based Hybrid ASR System for Switchboard Dataset

    Authors: Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Wilfried Michel, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney

    Abstract: The recently proposed conformer architecture has been successfully used for end-to-end automatic speech recognition (ASR) architectures achieving state-of-the-art performance on different datasets. To our best knowledge, the impact of using conformer acoustic model for hybrid ASR is not investigated. In this paper, we present and evaluate a competitive conformer-based hybrid model training recipe.… ▽ More

    Submitted 19 February, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Accepted at ICASSP 2022

  9. arXiv:2110.09324  [pdf, other

    cs.CL cs.SD eess.AS

    Automatic Learning of Subword Dependent Model Scales

    Authors: Felix Meyer, Wilfried Michel, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney

    Abstract: To improve the performance of state-of-the-art automatic speech recognition systems it is common practice to include external knowledge sources such as language models or prior corrections. This is usually done via log-linear model combination using separate scaling parameters for each model. Typically these parameters are manually optimized on some held-out data. In this work we propose to opti… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: submitted to ICASSP 2022

  10. arXiv:2104.05544  [pdf, ps, other

    cs.CL cs.SD eess.AS stat.ML

    Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models

    Authors: Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney

    Abstract: Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions. The integration with an external LM trained on much more unpaired text usually leads to better performance. A Bayesian interpretation as in the hybrid autoregressive transducer (HAT) suggests dividing by the prior of the discriminative acoustic model, which corresponds to… ▽ More

    Submitted 17 June, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: accepted to Interspeech 2021

  11. arXiv:2005.09336  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.NE

    A systematic comparison of grapheme-based vs. phoneme-based label units for encoder-decoder-attention models

    Authors: Mohammad Zeineldeen, Albert Zeyer, Wei Zhou, Thomas Ng, Ralf Schlüter, Hermann Ney

    Abstract: Following the rationale of end-to-end modeling, CTC, RNN-T or encoder-decoder-attention models for automatic speech recognition (ASR) use graphemes or grapheme-based subword units based on e.g. byte-pair encoding (BPE). The mapping from pronunciation to spelling is learned completely from data. In contrast to this, classical approaches to ASR employ secondary knowledge sources in the form of phone… ▽ More

    Submitted 15 April, 2021; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: 5 pages, 6 tables