Zum Hauptinhalt springen

Showing 1–18 of 18 results for author: Sameti, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12426  [pdf, other

    cs.CL cs.AI

    Sharif-STR at SemEval-2024 Task 1: Transformer as a Regression Model for Fine-Grained Scoring of Textual Semantic Relations

    Authors: Seyedeh Fatemeh Ebrahimi, Karim Akhavan Azari, Amirmasoud Iravani, Hadi Alizadeh, Zeinab Sadat Taghavi, Hossein Sameti

    Abstract: Semantic Textual Relatedness holds significant relevance in Natural Language Processing, finding applications across various domains. Traditionally, approaches to STR have relied on knowledge-based and statistical methods. However, with the emergence of Large Language Models, there has been a paradigm shift, ushering in new methodologies. In this paper, we delve into the investigation of sentence-… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 10 pages, 9 figures, 4 tables

    Journal ref: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

  2. arXiv:2407.11774  [pdf, other

    cs.CL cs.AI

    Sharif-MGTD at SemEval-2024 Task 8: A Transformer-Based Approach to Detect Machine Generated Text

    Authors: Seyedeh Fatemeh Ebrahimi, Karim Akhavan Azari, Amirmasoud Iravani, Arian Qazvini, Pouya Sadeghi, Zeinab Sadat Taghavi, Hossein Sameti

    Abstract: Detecting Machine-Generated Text (MGT) has emerged as a significant area of study within Natural Language Processing. While language models generate text, they often leave discernible traces, which can be scrutinized using either traditional feature-based methods or more advanced neural language models. In this research, we explore the effectiveness of fine-tuning a RoBERTa-base transformer, a pow… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 8 pages, 3 figures, 2 tables. Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

  3. arXiv:2404.04845  [pdf, other

    cs.CL cs.AI

    SLPL SHROOM at SemEval2024 Task 06: A comprehensive study on models ability to detect hallucination

    Authors: Pouya Fallah, Soroush Gooran, Mohammad Jafarinasab, Pouya Sadeghi, Reza Farnia, Amirreza Tarabkhah, Zainab Sadat Taghavi, Hossein Sameti

    Abstract: Language models, particularly generative models, are susceptible to hallucinations, generating outputs that contradict factual knowledge or the source text. This study explores methods for detecting hallucinations in three SemEval-2024 Task 6 tasks: Machine Translation, Definition Modeling, and Paraphrase Generation. We evaluate two methods: semantic similarity between the generated text and factu… ▽ More

    Submitted 9 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

  4. arXiv:2312.04362  [pdf, other

    cs.CL cs.AI

    PCoQA: Persian Conversational Question Answering Dataset

    Authors: Hamed Hematian Hemati, Atousa Toghyani, Atena Souri, Sayed Hesam Alavian, Hossein Sameti, Hamid Beigy

    Abstract: Humans seek information regarding a specific topic through performing a conversation containing a series of questions and answers. In the pursuit of conversational question answering research, we introduce the PCoQA, the first \textbf{P}ersian \textbf{Co}nversational \textbf{Q}uestion \textbf{A}nswering dataset, a resource comprising information-seeking dialogs encompassing a total of 9,026 contex… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  5. arXiv:2308.10354  [pdf, other

    cs.AI cs.CL

    Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems

    Authors: Zeinab Sadat Taghavi, Soroush Gooran, Seyed Arshan Dalili, Hamidreza Amirzadeh, Mohammad Jalal Nematbakhsh, Hossein Sameti

    Abstract: In this paper, we introduce a novel Artificial Intelligence (AI) system inspired by the philosophical and psychoanalytical concept of imagination as a ``Re-construction of Experiences". Our AI system is equipped with an imagination-inspired module that bridges the gap between textual inputs and other modalities, enriching the derived information based on previously learned experiences. A unique fe… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: 18 pages,

  6. arXiv:2307.11584  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality Conversion

    Authors: Zeinab Sadat Taghavi, Ali Satvaty, Hossein Sameti

    Abstract: Speech Emotion Recognition (SER) is a challenging task. In this paper, we introduce a modality conversion concept aimed at enhancing emotion recognition performance on the MELD dataset. We assess our approach through two experiments: first, a method named Modality-Conversion that employs automatic speech recognition (ASR) systems, followed by a text classifier; second, we assume perfect ASR output… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

  7. arXiv:2304.11220  [pdf, other

    cs.CL

    Learn What NOT to Learn: Towards Generative Safety in Chatbots

    Authors: Leila Khalatbari, Yejin Bang, Dan Su, Willy Chung, Saeed Ghadimi, Hossein Sameti, Pascale Fung

    Abstract: Conversational models that are generative and open-domain are particularly susceptible to generating unsafe content since they are trained on web-based social data. Prior approaches to mitigating this issue have drawbacks, such as disrupting the flow of conversation, limited generalization to unseen toxic input contexts, and sacrificing the quality of the dialogue for the sake of safety. In this p… ▽ More

    Submitted 25 April, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

    Comments: 9 pages, 3 tables, 3 figures

  8. arXiv:2208.13486  [pdf, other

    cs.CL

    naab: A ready-to-use plug-and-play corpus for Farsi

    Authors: Sadra Sabouri, Elnaz Rahmati, Soroush Gooran, Hossein Sameti

    Abstract: Huge corpora of textual data are always known to be a crucial need for training deep models such as transformer-based ones. This issue is emerging more in lower resource languages - like Farsi. We propose naab, the biggest cleaned and ready-to-use open-source textual corpus in Farsi. It contains about 130GB of data, 250 million paragraphs, and 15 billion words. The project name is derived from the… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

    Comments: 6 pages, 2 figures

  9. arXiv:2208.01371  [pdf

    cs.CL

    Multi-Module G2P Converter for Persian Focusing on Relations between Words

    Authors: Mahdi Rezaei, Negar Nayeri, Saeed Farzi, Hossein Sameti

    Abstract: In this paper, we investigate the application of end-to-end and multi-module frameworks for G2P conversion for the Persian language. The results demonstrate that our proposed multi-module G2P system outperforms our end-to-end systems in terms of accuracy and speed. The system consists of a pronunciation dictionary as our look-up table, along with separate models to handle homographs, OOVs and ezaf… ▽ More

    Submitted 2 August, 2022; originally announced August 2022.

    Comments: 10 pages, 4 figures

    ACM Class: I.2.7

  10. Persian Keyphrase Generation Using Sequence-to-Sequence Models

    Authors: Ehsan Doostmohammadi, Mohammad Hadi Bokaei, Hossein Sameti

    Abstract: Keyphrases are a very short summary of an input text and provide the main subjects discussed in the text. Keyphrase extraction is a useful upstream task and can be used in various natural language processing problems, for example, text summarization and information retrieval, to name a few. However, not all the keyphrases are explicitly mentioned in the body of the text. In real-world examples the… ▽ More

    Submitted 25 September, 2020; originally announced September 2020.

  11. PerKey: A Persian News Corpus for Keyphrase Extraction and Generation

    Authors: Ehsan Doostmohammadi, Mohammad Hadi Bokaei, Hossein Sameti

    Abstract: Keyphrases provide an extremely dense summary of a text. Such information can be used in many Natural Language Processing tasks, such as information retrieval and text summarization. Since previous studies on Persian keyword or keyphrase extraction have not published their data, the field suffers from the lack of a human extracted keyphrase dataset. In this paper, we introduce PerKey, a corpus of… ▽ More

    Submitted 25 September, 2020; originally announced September 2020.

  12. Ghmerti at SemEval-2019 Task 6: A Deep Word- and Character-based Approach to Offensive Language Identification

    Authors: Ehsan Doostmohammadi, Hossein Sameti, Ali Saffar

    Abstract: This paper presents the models submitted by Ghmerti team for subtasks A and B of the OffensEval shared task at SemEval 2019. OffensEval addresses the problem of identifying and categorizing offensive language in social media in three subtasks; whether or not a content is offensive (subtask A), whether it is targeted (subtask B) towards an individual, a group, or other entities (subtask C). The pro… ▽ More

    Submitted 22 September, 2020; originally announced September 2020.

  13. arXiv:2001.03897  [pdf, other

    cs.CL cs.LG

    Stochastic Natural Language Generation Using Dependency Information

    Authors: Elham Seifossadat, Hossein Sameti

    Abstract: This article presents a stochastic corpus-based model for generating natural language text. Our model first encodes dependency relations from training data through a feature set, then concatenates these features to produce a new dependency tree for a given meaning representation, and finally generates a natural language utterance from the produced dependency tree. We test our model on nine domains… ▽ More

    Submitted 12 January, 2020; originally announced January 2020.

  14. arXiv:1910.13345  [pdf, other

    eess.AS cs.SD

    Replay Spoofing Countermeasure Using Autoencoder and Siamese Network on ASVspoof 2019 Challenge

    Authors: Mohammad Adiban, Hossein Sameti, Saeedreza Shehnepoor

    Abstract: Automatic Speaker Verification (ASV) is the process of identifying a person based on the voice presented to a system. Different synthetic approaches allow spoofing to deceive ASV systems (ASVs), whether using techniques to imitate a voice or recunstruct the features. Attackers try to beat up the ASVs using four general techniques; impersonation, speech synthesis, voice conversion, and replay. The… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

  15. arXiv:1907.06111  [pdf, other

    eess.AS cs.CL cs.SD

    Speaker Recognition with Random Digit Strings Using Uncertainty Normalized HMM-based i-vectors

    Authors: Nooshin Maghsoodi, Hossein Sameti, Hossein Zeinali, Themos~Stafylakis

    Abstract: In this paper, we combine Hidden Markov Models (HMMs) with i-vector extractors to address the problem of text-dependent speaker recognition with random digit strings. We employ digit-specific HMMs to segment the utterances into digits, to perform frame alignment to HMM states and to extract Baum-Welch statistics. By making use of the natural partition of input features into digits, we train digit-… ▽ More

    Submitted 13 July, 2019; originally announced July 2019.

  16. arXiv:1809.11068  [pdf, other

    cs.SD cs.CL eess.AS

    Spoken Pass-Phrase Verification in the i-vector Space

    Authors: Hossein Zeinali, Lukas Burget, Hossein Sameti, Jan Cernocky

    Abstract: The task of spoken pass-phrase verification is to decide whether a test utterance contains the same phrase as given enrollment utterances. Beside other applications, pass-phrase verification can complement an independent speaker verification subsystem in text-dependent speaker verification. It can also be used for liveness detection by verifying that the user is able to correctly respond to a rand… ▽ More

    Submitted 28 September, 2018; originally announced September 2018.

    Journal ref: Proc. Odyssey 2018 The Speaker and Language Recognition Workshop

  17. arXiv:1706.05077  [pdf, ps, other

    cs.SD

    SUT System Description for NIST SRE 2016

    Authors: Hossein Zeinali, Hossein Sameti, Nooshin Maghsoodi

    Abstract: This paper describes the submission to fixed condition of NIST SRE 2016 by Sharif University of Technology (SUT) team. We provide a full description of the systems that were included in our submission. We start with an overview of the datasets that were used for training and development. It is followed by describing front-ends which contain different VAD and feature types. UBM and i-vector extract… ▽ More

    Submitted 8 June, 2017; originally announced June 2017.

    Comments: Presented in NIST SRE 2016 Evaluation Workshop

  18. A Novel Method For Speech Segmentation Based On Speakers' Characteristics

    Authors: Behrouz Abdolali, Hossein Sameti

    Abstract: Speech Segmentation is the process change point detection for partitioning an input audio stream into regions each of which corresponds to only one audio source or one speaker. One application of this system is in Speaker Diarization systems. There are several methods for speaker segmentation; however, most of the Speaker Diarization Systems use BIC-based Segmentation methods. The main goal of thi… ▽ More

    Submitted 8 May, 2012; originally announced May 2012.

    Comments: 14 pages, 8 figures

    MSC Class: 92C55 ACM Class: C.3.4

    Journal ref: B. Abdolali, H. Sameti "A Novel Method for Speech Segmentation based on Speakers' Specifications", Signal & Image Processing: An International Journal (SIPIJ) Vol.3, No.2, pp. 65-78, April 2012