Zum Hauptinhalt springen

Showing 1–21 of 21 results for author: Orasan, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.11668  [pdf, other

    cs.CL

    Cyber Risks of Machine Translation Critical Errors : Arabic Mental Health Tweets as a Case Study

    Authors: Hadeel Saadany, Ashraf Tantawy, Constantin Orasan

    Abstract: With the advent of Neural Machine Translation (NMT) systems, the MT output has reached unprecedented accuracy levels which resulted in the ubiquity of MT tools on almost all online platforms with multilingual content. However, NMT systems, like other state-of-the-art AI generative systems, are prone to errors that are deemed machine hallucinations. The problem with NMT hallucinations is that they… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  2. arXiv:2402.04023  [pdf

    cs.CL

    Google Translate Error Analysis for Mental Healthcare Information: Evaluating Accuracy, Comprehensibility, and Implications for Multilingual Healthcare Communication

    Authors: Jaleh Delfani, Constantin Orasan, Hadeel Saadany, Ozlem Temizoz, Eleanor Taylor-Stilgoe, Diptesh Kanojia, Sabine Braun, Barbara Schouten

    Abstract: This study explores the use of Google Translate (GT) for translating mental healthcare (MHealth) information and evaluates its accuracy, comprehensibility, and implications for multilingual healthcare communication through analysing GT output in the MHealth domain from English to Persian, Arabic, Turkish, Romanian, and Spanish. Two datasets comprising MHealth information from the UK National Healt… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  3. arXiv:2312.00525  [pdf, other

    cs.CL cs.AI

    SurreyAI 2023 Submission for the Quality Estimation Shared Task

    Authors: Archchana Sindhujan, Diptesh Kanojia, Constantin Orasan, Tharindu Ranasinghe

    Abstract: Quality Estimation (QE) systems are important in situations where it is necessary to assess the quality of translations, but there is no reference available. This paper describes the approach adopted by the SurreyAI team for addressing the Sentence-Level Direct Assessment shared task in WMT23. The proposed approach builds upon the TransQuest framework, exploring various autoencoder pre-trained lan… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  4. arXiv:2306.11900  [pdf, other

    cs.CL

    Evaluation of Chinese-English Machine Translation of Emotion-Loaded Microblog Texts: A Human Annotated Dataset for the Quality Assessment of Emotion Translation

    Authors: Shenbin Qian, Constantin Orasan, Felix do Carmo, Qiuliang Li, Diptesh Kanojia

    Abstract: In this paper, we focus on how current Machine Translation (MT) tools perform on the translation of emotion-loaded texts by evaluating outputs from Google Translate according to a framework proposed in this paper. We propose this evaluation framework based on the Multidimensional Quality Metrics (MQM) and perform a detailed error analysis of the MT outputs. From our analysis, we observe that about… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

  5. arXiv:2211.17094  [pdf, other

    eess.AS cs.CL cs.SD

    Better Transcription of UK Supreme Court Hearings

    Authors: Hadeel Saadany, Catherine Breslin, Constantin Orăsan, Sophie Walker

    Abstract: Transcription of legal proceedings is very important to enable access to justice. However, speech transcription is an expensive and slow process. In this paper we describe part of a combined research and industrial project for building an automated transcription tool designed specifically for the Justice sector in the UK. We explain the challenges involved in transcribing court room hearings and t… ▽ More

    Submitted 22 December, 2022; v1 submitted 29 November, 2022; originally announced November 2022.

  6. arXiv:2210.11899  [pdf, other

    cs.CL

    A Semi-supervised Approach for a Better Translation of Sentiment in Dialectical Arabic UGT

    Authors: Hadeel Saadany, Constantin Orasan, Emad Mohamed, Ashraf Tantawy

    Abstract: In the online world, Machine Translation (MT) systems are extensively used to translate User-Generated Text (UGT) such as reviews, tweets, and social media posts, where the main message is often the author's positive or negative attitude towards the topic of the text. However, MT systems still lack accuracy in some low-resource languages and sometimes make critical translation errors that complete… ▽ More

    Submitted 8 June, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: WANLP2022 at EMNLP 2022

    Journal ref: Association for Computational Linguistics 2022

  7. arXiv:2205.00806  [pdf, other

    cs.IR

    Biographical: A Semi-Supervised Relation Extraction Dataset

    Authors: Alistair Plum, Tharindu Ranasinghe, Spencer Jones, Constantin Orasan, Ruslan Mitkov

    Abstract: Extracting biographical information from online documents is a popular research topic among the information extraction (IE) community. Various natural language processing (NLP) techniques such as text classification, text summarisation and relation extraction are commonly used to achieve this. Among these techniques, RE is the most common since it can be directly used to build biographical knowled… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: Accepted to ACM SIGIR 2022

  8. arXiv:2204.12061  [pdf, other

    cs.CL

    PLOD: An Abbreviation Detection Dataset for Scientific Documents

    Authors: Leonardo Zilio, Hadeel Saadany, Prashant Sharma, Diptesh Kanojia, Constantin Orăsan

    Abstract: The detection and extraction of abbreviations from unstructured texts can help to improve the performance of Natural Language Processing tasks, such as machine translation and information retrieval. However, in terms of publicly available datasets, there is not enough data for training deep-neural-networks-based models to the point of generalising well over data. This paper presents PLOD, a large-… ▽ More

    Submitted 28 April, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: Accepted at LREC 2022, 8 pages

  9. arXiv:2201.03026  [pdf, other

    cs.CL

    An Ensemble Approach to Acronym Extraction using Transformers

    Authors: Prashant Sharma, Hadeel Saadany, Leonardo Zilio, Diptesh Kanojia, Constantin Orăsan

    Abstract: Acronyms are abbreviated units of a phrase constructed by using initial components of the phrase in a text. Automatic extraction of acronyms from a text can help various Natural Language Processing tasks like machine translation, information retrieval, and text summarisation. This paper discusses an ensemble approach for the task of Acronym Extraction, which utilises two different methods to extra… ▽ More

    Submitted 9 January, 2022; originally announced January 2022.

    Comments: Published at SDU@AAAI-22

  10. Sentiment-Aware Measure (SAM) for Evaluating Sentiment Transfer by Machine Translation Systems

    Authors: Hadeel Saadany, Constantin Orasan, Emad Mohamed, Ashraf Tantawy

    Abstract: In translating text where sentiment is the main message, human translators give particular attention to sentiment-carrying words. The reason is that an incorrect translation of such words would miss the fundamental aspect of the source text, i.e. the author's sentiment. In the online world, MT systems are extensively used to translate User-Generated Content (UGC) such as reviews, tweets, and socia… ▽ More

    Submitted 5 October, 2021; v1 submitted 30 September, 2021; originally announced September 2021.

    Comments: Accepted for RANLP (Recent Advances in Natural Language Processing) 2021

    Journal ref: RANLP (RECENT ADVANCES IN NATURAL LANGUAGE PROCESSING) (2021)

  11. BLEU, METEOR, BERTScore: Evaluation of Metrics Performance in Assessing Critical Translation Errors in Sentiment-oriented Text

    Authors: Hadeel Saadany, Constantin Orasan

    Abstract: Social media companies as well as authorities make extensive use of artificial intelligence (AI) tools to monitor postings of hate speech, celebrations of violence or profanity. Since AI software requires massive volumes of data to train computers, Machine Translation (MT) of the online content is commonly used to process posts written in several languages and hence augment the data needed for tra… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

    Comments: Accepted for TRITON (TRanslation and Interpreting Technology ONline) 2021

    Journal ref: TRITON (2021) 48-56

  12. arXiv:2109.10859  [pdf, other

    cs.CL cs.AI

    Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation

    Authors: Diptesh Kanojia, Marina Fomicheva, Tharindu Ranasinghe, Frédéric Blain, Constantin Orăsan, Lucia Specia

    Abstract: Current Machine Translation (MT) systems achieve very good results on a growing variety of language pairs and datasets. However, they are known to produce fluent translation outputs that can contain important meaning errors, thus undermining their reliability in practice. Quality Estimation (QE) is the task of automatically assessing the performance of MT systems at test time. Thus, in order to be… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

    Comments: Accepted to WMT 2021 Conference co-located with EMNLP 2021. 14 pages with a 4 page appendix

  13. arXiv:2106.10719  [pdf, other

    cs.CL

    Challenges in Translation of Emotions in Multilingual User-Generated Content: Twitter as a Case Study

    Authors: Hadeel Saadany, Constantin Orasan, Rocio Caro Quintana, Felix do Carmo, Leonardo Zilio

    Abstract: Although emotions are universal concepts, transferring the different shades of emotion from one language to another may not always be straightforward for human translators, let alone for machine translation systems. Moreover, the cognitive states are established by verbal explanations of experience which is shaped by both the verbal and cultural contexts. There are a number of verbal contexts wher… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Journal ref: Linguistik International 2020

  14. arXiv:2106.00143  [pdf, other

    cs.CL cs.AI cs.LG

    An Exploratory Analysis of Multilingual Word-Level Quality Estimation with Cross-Lingual Transformers

    Authors: Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov

    Abstract: Most studies on word-level Quality Estimation (QE) of machine translation focus on language-specific models. The obvious disadvantages of these approaches are the need for labelled data for each language pair and the high cost required to maintain several language-specific models. To overcome these problems, we explore different approaches to multilingual, word-level QE. We show that these QE mode… ▽ More

    Submitted 31 May, 2021; originally announced June 2021.

    Comments: Accepted to appear at the ACL-IJCNLP 2021 Main conference

  15. arXiv:2011.01536  [pdf, other

    cs.CL cs.AI cs.LG

    TransQuest: Translation Quality Estimation with Cross-lingual Transformers

    Authors: Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov

    Abstract: Recent years have seen big advances in the field of sentence-level quality estimation (QE), largely as a result of using neural-based architectures. However, the majority of these methods work only on the language pair they are trained on and need retraining for new language pairs. This process can prove difficult from a technical point of view and is usually computationally expensive. In this pap… ▽ More

    Submitted 4 November, 2020; v1 submitted 1 November, 2020; originally announced November 2020.

    Comments: Accepted to COLING 2020. arXiv admin note: text overlap with arXiv:2010.05318

  16. arXiv:2011.00452  [pdf, other

    cs.CL

    Fake or Real? A Study of Arabic Satirical Fake News

    Authors: Hadeel Saadany, Emad Mohamed, Constantin Orasan

    Abstract: One very common type of fake news is satire which comes in a form of a news website or an online platform that parodies reputable real news agencies to create a sarcastic version of reality. This type of fake news is often disseminated by individuals on their online platforms as it has a much stronger effect in delivering criticism than through a straightforward message. However, when the satirica… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

    Comments: 11 pages

    Journal ref: Proceedings of the 3rd International Workshop on Rumours and Deception in Social Media (RDSM) 2020

  17. arXiv:2010.13814  [pdf, other

    cs.CL

    Is it Great or Terrible? Preserving Sentiment in Neural Machine Translation of Arabic Reviews

    Authors: Hadeel Saadany, Constantin Orasan

    Abstract: Since the advent of Neural Machine Translation (NMT) approaches there has been a tremendous improvement in the quality of automatic translation. However, NMT output still lacks accuracy in some low-resource languages and sometimes makes major errors that need extensive post-editing. This is particularly noticeable with texts that do not follow common lexico-grammatical standards, such as user gene… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of the Fifth Arabic Natural Language Processing Workshop WANLP 2020

  18. arXiv:2010.06281  [pdf, ps, other

    cs.CL cs.AI cs.LG

    RGCL at SemEval-2020 Task 6: Neural Approaches to Definition Extraction

    Authors: Tharindu Ranasinghe, Alistair Plum, Constantin Orasan, Ruslan Mitkov

    Abstract: This paper presents the RGCL team submission to SemEval 2020 Task 6: DeftEval, subtasks 1 and 2. The system classifies definitions at the sentence and token levels. It utilises state-of-the-art neural network architectures, which have some task-specific adaptations, including an automatically extended training set. Overall, the approach achieves acceptable evaluation scores, while maintaining flex… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

    Comments: Accepted to SemEval-2020 (International Workshop on Semantic Evaluation) at COLING 2020

  19. arXiv:2010.05318  [pdf, other

    cs.CL

    TransQuest at WMT2020: Sentence-Level Direct Assessment

    Authors: Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov

    Abstract: This paper presents the team TransQuest's participation in Sentence-Level Direct Assessment shared task in WMT 2020. We introduce a simple QE framework based on cross-lingual transformers, and we use it to implement and evaluate two different neural architectures. The proposed methods achieve state-of-the-art results surpassing the results obtained by OpenKiwi, the baseline used in the shared task… ▽ More

    Submitted 11 October, 2020; originally announced October 2020.

    Comments: Accepted to WMT 2020

  20. arXiv:2004.12894  [pdf, ps, other

    cs.CL

    Intelligent Translation Memory Matching and Retrieval with Sentence Encoders

    Authors: Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov

    Abstract: Matching and retrieving previously translated segments from a Translation Memory is the key functionality in Translation Memories systems. However this matching and retrieving process is still limited to algorithms based on edit distance which we have identified as a major drawback in Translation Memories systems. In this paper we introduce sentence encoders to improve the matching and retrieving… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: Accepted to EAMT 2020

  21. NP Animacy Identification for Anaphora Resolution

    Authors: R. J. Evans, C. Orasan

    Abstract: In anaphora resolution for English, animacy identification can play an integral role in the application of agreement restrictions between pronouns and candidates, and as a result, can improve the accuracy of anaphora resolution systems. In this paper, two methods for animacy identification are proposed and evaluated using intrinsic and extrinsic measures. The first method is a rule-based one which… ▽ More

    Submitted 10 October, 2011; originally announced October 2011.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 29, pages 79-103, 2007