Zum Hauptinhalt springen

Showing 1–19 of 19 results for author: Zeinali, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.17383  [pdf

    cs.CL cs.AI cs.LG

    A Comprehensive Approach to Misspelling Correction with BERT and Levenshtein Distance

    Authors: Amirreza Naziri, Hossein Zeinali

    Abstract: Writing, as an omnipresent form of human communication, permeates nearly every aspect of contemporary life. Consequently, inaccuracies or errors in written communication can lead to profound consequences, ranging from financial losses to potentially life-threatening situations. Spelling mistakes, among the most prevalent writing errors, are frequently encountered due to various factors. This resea… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 12 pages, 9 figures, 5 tables

  2. arXiv:2307.10157  [pdf, other

    cs.CV

    Leveraging Visemes for Better Visual Speech Representation and Lip Reading

    Authors: Javad Peymanfard, Vahid Saeedi, Mohammad Reza Mohammadi, Hossein Zeinali, Nasser Mozayani

    Abstract: Lip reading is a challenging task that has many potential applications in speech recognition, human-computer interaction, and security systems. However, existing lip reading systems often suffer from low accuracy due to the limitations of video features. In this paper, we propose a novel approach that leverages visemes, which are groups of phonetically similar lip shapes, to extract more discrimin… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

  3. Word-level Persian Lipreading Dataset

    Authors: Javad Peymanfard, Ali Lashini, Samin Heydarian, Hossein Zeinali, Nasser Mozayani

    Abstract: Lip-reading has made impressive progress in recent years, driven by advances in deep learning. Nonetheless, the prerequisite such advances is a suitable dataset. This paper provides a new in-the-wild dataset for Persian word-level lipreading containing 244,000 videos from approximately 1,800 speakers. We evaluated the state-of-the-art method in this field and used a novel approach for word-level l… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

    Journal ref: In 2022 12th International Conference on Computer and Knowledge Engineering (ICCKE) (pp. 225-230). IEEE

  4. arXiv:2304.03585  [pdf

    cs.CL

    ArmanTTS single-speaker Persian dataset

    Authors: Mohammd Hasan Shamgholi, Vahid Saeedi, Javad Peymanfard, Leila Alhabib, Hossein Zeinali

    Abstract: TTS, or text-to-speech, is a complicated process that can be accomplished through appropriate modeling using deep learning methods. In order to implement deep learning models, a suitable dataset is required. Since there is a scarce amount of work done in this field for the Persian language, this paper will introduce the single speaker dataset: ArmanTTS. We compared the characteristics of this data… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

  5. arXiv:2301.10180  [pdf, other

    cs.CL cs.SD eess.AS

    A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset

    Authors: Javad Peymanfard, Samin Heydarian, Ali Lashini, Hossein Zeinali, Mohammad Reza Mohammadi, Nasser Mozayani

    Abstract: In recent years, significant progress has been made in automatic lip reading. But these methods require large-scale datasets that do not exist for many low-resource languages. In this paper, we have presented a new multipurpose audio-visual dataset for Persian. This dataset consists of almost 220 hours of videos with 1760 corresponding speakers. In addition to lip reading, the dataset is suitable… ▽ More

    Submitted 21 January, 2023; originally announced January 2023.

  6. arXiv:2207.11808  [pdf, other

    cs.CL cs.AI

    ArmanEmo: A Persian Dataset for Text-based Emotion Detection

    Authors: Hossein Mirzaee, Javad Peymanfard, Hamid Habibzadeh Moshtaghin, Hossein Zeinali

    Abstract: With the recent proliferation of open textual data on social media platforms, Emotion Detection (ED) from Text has received more attention over the past years. It has many applications, especially for businesses and online service providers, where emotion detection techniques can help them make informed commercial decisions by analyzing customers/users' feelings towards their products and services… ▽ More

    Submitted 24 July, 2022; originally announced July 2022.

  7. arXiv:2104.04784  [pdf, other

    cs.CV

    Lip reading using external viseme decoding

    Authors: Javad Peymanfard, Mohammad Reza Mohammadi, Hossein Zeinali, Nasser Mozayani

    Abstract: Lip-reading is the operation of recognizing speech from lip movements. This is a difficult task because the movements of the lips when pronouncing the words are similar for some of them. Viseme is used to describe lip movements during a conversation. This paper aims to show how to use external text data (for viseme-to-character mapping) by dividing video-to-character into two stages, namely conver… ▽ More

    Submitted 7 November, 2021; v1 submitted 10 April, 2021; originally announced April 2021.

  8. arXiv:1912.06311  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Short-duration Speaker Verification (SdSV) Challenge 2021: the Challenge Evaluation Plan

    Authors: Hossein Zeinali, Kong Aik Lee, Jahangir Alam, Lukas Burget

    Abstract: This document describes the Short-duration Speaker Verification (SdSV) Challenge 2021. The main goal of the challenge is to evaluate new technologies for text-dependent (TD) and text-independent (TI) speaker verification (SV) in a short duration scenario. The proposed challenge evaluates SdSV with varying degree of phonetic overlap between the enrollment and test utterances (cross-lingual). It is… ▽ More

    Submitted 24 March, 2021; v1 submitted 12 December, 2019; originally announced December 2019.

  9. arXiv:1912.03627  [pdf, ps, other

    eess.AS cs.CL cs.SD

    A Multi Purpose and Large Scale Speech Corpus in Persian and English for Speaker and Speech Recognition: the DeepMine Database

    Authors: Hossein Zeinali, Lukáš Burget, Jan "Honza'' Černocký

    Abstract: DeepMine is a speech database in Persian and English designed to build and evaluate text-dependent, text-prompted, and text-independent speaker verification, as well as Persian speech recognition systems. It contains more than 1850 speakers and 540 thousand recordings overall, more than 480 hours of speech are transcribed. It is the first public large-scale speaker verification database in Persian… ▽ More

    Submitted 8 December, 2019; originally announced December 2019.

  10. arXiv:1910.12592  [pdf, ps, other

    eess.AS cs.CL cs.SD

    BUT System Description to VoxCeleb Speaker Recognition Challenge 2019

    Authors: Hossein Zeinali, Shuai Wang, Anna Silnova, Pavel Matějka, Oldřich Plchot

    Abstract: In this report, we describe the submission of Brno University of Technology (BUT) team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2019. We also provide a brief analysis of different systems on VoxCeleb-1 test sets. Submitted systems for both Fixed and Open conditions are a fusion of 4 Convolutional Neural Network (CNN) topologies. The first and second networks have ResNet34 topology an… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

  11. arXiv:1907.12908  [pdf, ps, other

    cs.CV cs.AI cs.CR

    Detecting Spoofing Attacks Using VGG and SincNet: BUT-Omilia Submission to ASVspoof 2019 Challenge

    Authors: Hossein Zeinali, Themos Stafylakis, Georgia Athanasopoulou, Johan Rohdin, Ioannis Gkinis, Lukáš Burget, Jan "Honza'' Černocký

    Abstract: In this paper, we present the system description of the joint efforts of Brno University of Technology (BUT) and Omilia -- Conversational Intelligence for the ASVSpoof2019 Spoofing and Countermeasures Challenge. The primary submission for Physical access (PA) is a fusion of two VGG networks, trained on single and two-channels features. For Logical access (LA), our primary system is a fusion of VGG… ▽ More

    Submitted 13 July, 2019; originally announced July 2019.

  12. arXiv:1907.07127  [pdf, ps, other

    eess.AS cs.SD

    Acoustic Scene Classification Using Fusion of Attentive Convolutional Neural Networks for DCASE2019 Challenge

    Authors: Hossein Zeinali, Lukáš Burget, Jan "Honza'' Černocký

    Abstract: In this report, the Brno University of Technology (BUT) team submissions for Task 1 (Acoustic Scene Classification, ASC) of the DCASE-2019 challenge are described. Also, the analysis of different methods is provided. The proposed approach is a fusion of three different Convolutional Neural Network (CNN) topologies. The first one is a VGG like two-dimensional CNNs. The second one is again a two-dim… ▽ More

    Submitted 13 July, 2019; originally announced July 2019.

    Comments: arXiv admin note: text overlap with arXiv:1810.04273

  13. arXiv:1907.06112  [pdf, ps, other

    eess.AS cs.CL cs.SD

    BUT VOiCES 2019 System Description

    Authors: Hossein Zeinali, Pavel Matějka, Ladislav Mošner, Oldřich Plchot, Anna Silnova, Ondřej Novotný, Ján Profant, Ondřej Glembek, Lukáš Burget

    Abstract: This is a description of our effort in VOiCES 2019 Speaker Recognition challenge. All systems in the fixed condition are based on the x-vector paradigm with different features and DNN topologies. The single best system reaches 1.2% EER and a fusion of 3 systems yields 1.0% EER, which is 15% relative improvement. The open condition allowed us to use external data which we did for the PLDA adaptatio… ▽ More

    Submitted 13 July, 2019; originally announced July 2019.

  14. arXiv:1907.06111  [pdf, other

    eess.AS cs.CL cs.SD

    Speaker Recognition with Random Digit Strings Using Uncertainty Normalized HMM-based i-vectors

    Authors: Nooshin Maghsoodi, Hossein Sameti, Hossein Zeinali, Themos~Stafylakis

    Abstract: In this paper, we combine Hidden Markov Models (HMMs) with i-vector extractors to address the problem of text-dependent speaker recognition with random digit strings. We employ digit-specific HMMs to segment the utterances into digits, to perform frame alignment to HMM states and to extract Baum-Welch statistics. By making use of the natural partition of input features into digits, we train digit-… ▽ More

    Submitted 13 July, 2019; originally announced July 2019.

  15. arXiv:1811.02331  [pdf, other

    eess.AS cs.SD

    Speaker verification using end-to-end adversarial language adaptation

    Authors: Johan Rohdin, Themos Stafylakis, Anna Silnova, Hossein Zeinali, Lukas Burget, Oldrich Plchot

    Abstract: In this paper we investigate the use of adversarial domain adaptation for addressing the problem of language mismatch between speaker recognition corpora. In the context of speaker verification, adversarial domain adaptation methods aim at minimizing certain divergences between the distribution that the utterance-level features follow (i.e. speaker embeddings) when drawn from source and target dom… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

  16. arXiv:1811.02066  [pdf, ps, other

    cs.SD cs.CL eess.AS

    How to Improve Your Speaker Embeddings Extractor in Generic Toolkits

    Authors: Hossein Zeinali, Lukas Burget, Johan Rohdin, Themos Stafylakis, Jan Cernocky

    Abstract: Recently, speaker embeddings extracted with deep neural networks became the state-of-the-art method for speaker verification. In this paper we aim to facilitate its implementation on a more generic toolkit than Kaldi, which we anticipate to enable further improvements on the method. We examine several tricks in training, such as the effects of normalizing input features and pooled statistics, diff… ▽ More

    Submitted 5 November, 2018; originally announced November 2018.

  17. arXiv:1810.04273  [pdf, ps, other

    eess.AS cs.SD

    Convolutional Neural Networks and x-vector Embedding for DCASE2018 Acoustic Scene Classification Challenge

    Authors: Hossein Zeinali, Lukas Burget, Jan Cernocky

    Abstract: In this paper, the Brno University of Technology (BUT) team submissions for Task 1 (Acoustic Scene Classification, ASC) of the DCASE-2018 challenge are described. Also, the analysis of different methods on the leaderboard set is provided. The proposed approach is a fusion of two different Convolutional Neural Network (CNN) topologies. The first one is the common two-dimensional CNNs which is mainl… ▽ More

    Submitted 1 October, 2018; originally announced October 2018.

    Journal ref: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018)

  18. arXiv:1809.11068  [pdf, other

    cs.SD cs.CL eess.AS

    Spoken Pass-Phrase Verification in the i-vector Space

    Authors: Hossein Zeinali, Lukas Burget, Hossein Sameti, Jan Cernocky

    Abstract: The task of spoken pass-phrase verification is to decide whether a test utterance contains the same phrase as given enrollment utterances. Beside other applications, pass-phrase verification can complement an independent speaker verification subsystem in text-dependent speaker verification. It can also be used for liveness detection by verifying that the user is able to correctly respond to a rand… ▽ More

    Submitted 28 September, 2018; originally announced September 2018.

    Journal ref: Proc. Odyssey 2018 The Speaker and Language Recognition Workshop

  19. arXiv:1706.05077  [pdf, ps, other

    cs.SD

    SUT System Description for NIST SRE 2016

    Authors: Hossein Zeinali, Hossein Sameti, Nooshin Maghsoodi

    Abstract: This paper describes the submission to fixed condition of NIST SRE 2016 by Sharif University of Technology (SUT) team. We provide a full description of the systems that were included in our submission. We start with an overview of the datasets that were used for training and development. It is followed by describing front-ends which contain different VAD and feature types. UBM and i-vector extract… ▽ More

    Submitted 8 June, 2017; originally announced June 2017.

    Comments: Presented in NIST SRE 2016 Evaluation Workshop