Zum Hauptinhalt springen

Showing 1–9 of 9 results for author: Scherrer, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14167  [pdf, other

    cs.CL

    Definition generation for lexical semantic change detection

    Authors: Mariia Fedorova, Andrey Kutuzov, Yves Scherrer

    Abstract: We use contextualized word definitions generated by large language models as semantic representations in the task of diachronic lexical semantic change detection (LSCD). In short, generated definitions are used as `senses', and the change score of a target word is retrieved by comparing their distributions in two time periods under comparison. On the material of five datasets and three languages,… ▽ More

    Submitted 31 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024

  2. arXiv:2404.18510  [pdf, other

    cs.CL

    Explainability of machine learning approaches in forensic linguistics: a case study in geolinguistic authorship profiling

    Authors: Dana Roemling, Yves Scherrer, Aleksandra Miletic

    Abstract: Forensic authorship profiling uses linguistic markers to infer characteristics about an author of a text. This task is paralleled in dialect classification, where a prediction is made about the linguistic variety of a text based on the text itself. While there have been significant advances in recent years in variety classification, forensic linguistics rarely relies on these approaches due to the… ▽ More

    Submitted 1 July, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  3. arXiv:2305.20080  [pdf, other

    cs.CL

    Findings of the VarDial Evaluation Campaign 2023

    Authors: Noëmi Aepli, Çağrı Çöltekin, Rob Van Der Goot, Tommi Jauhiainen, Mourhaf Kazzaz, Nikola Ljubešić, Kai North, Barbara Plank, Yves Scherrer, Marcos Zampieri

    Abstract: This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2023. The campaign is part of the tenth workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with EACL 2023. Three separate shared tasks were included this year: Slot and intent detection for low-resource language varieties (SID4LR),… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Journal ref: In Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023), pages 251-261, Dubrovnik, Croatia. Association from Computational Linguistics

  4. arXiv:2212.01936  [pdf, other

    cs.CL

    Democratizing Neural Machine Translation with OPUS-MT

    Authors: Jörg Tiedemann, Mikko Aulamo, Daria Bakshandaeva, Michele Boggia, Stig-Arne Grönroos, Tommi Nieminen, Alessandro Raganato, Yves Scherrer, Raul Vazquez, Sami Virpioja

    Abstract: This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows. We discuss our on-going mission of increasing language coverage and translation quality, and also describe on-going work on the development of modular translation models and speed-opt… ▽ More

    Submitted 4 July, 2023; v1 submitted 4 December, 2022; originally announced December 2022.

  5. arXiv:2002.10260  [pdf, other

    cs.CL

    Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation

    Authors: Alessandro Raganato, Yves Scherrer, Jörg Tiedemann

    Abstract: Transformer-based models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the so-called multi-head attention mechanism, which allows the model to focus simultaneously on different parts of the input. However, recent works have shown that most attention heads learn simple, and often redundant, positional patterns. In this paper, we propos… ▽ More

    Submitted 5 October, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: Accepted to Findings of EMNLP 2020

  6. arXiv:1906.04040  [pdf, other

    cs.CL

    The University of Helsinki submissions to the WMT19 news translation task

    Authors: Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen, Jörg Tiedemann

    Abstract: In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English. This year, we focused first on cleaning and filtering the training data using multiple data-filtering approaches, resulting in much smaller and cleaner training sets. For English-German, we trained both senten… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: To appear in WMT19

  7. arXiv:1808.06826  [pdf, other

    cs.CL

    Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks

    Authors: Jörg Tiedemann, Yves Scherrer

    Abstract: In this paper, we investigate whether multilingual neural translation models learn stronger semantic abstractions of sentences than bilingual ones. We test this hypotheses by measuring the perplexity of such models when applied to paraphrases of the source language. The intuition is that an encoder produces better representations if a decoder is capable of recognizing synonymous sentences in the s… ▽ More

    Submitted 3 May, 2019; v1 submitted 21 August, 2018; originally announced August 2018.

  8. arXiv:1708.05943  [pdf, other

    cs.CL

    Neural Machine Translation with Extended Context

    Authors: Jörg Tiedemann, Yves Scherrer

    Abstract: We investigate the use of extended context in attention-based neural machine translation. We base our experiments on translated movie subtitles and discuss the effect of increasing the segments beyond single translation units. We study the use of extended source language context as well as bilingual context extensions. The models learn to distinguish between information from different segments and… ▽ More

    Submitted 20 August, 2017; originally announced August 2017.

    Comments: Proceedings of the Third Workshop on Discourse in Machine Translation (DiscoMT 2017) at EMNLP 2017, Copenhagen/Danmark

  9. arXiv:1708.05942  [pdf, other

    cs.CL

    The Helsinki Neural Machine Translation System

    Authors: Robert Östling, Yves Scherrer, Jörg Tiedemann, Gongbo Tang, Tommi Nieminen

    Abstract: We introduce the Helsinki Neural Machine Translation system (HNMT) and how it is applied in the news translation task at WMT 2017, where it ranked first in both the human and automatic evaluations for English--Finnish. We discuss the success of English--Finnish translations and the overall advantage of NMT over a strong SMT baseline. We also discuss our submissions for English--Latvian, English--C… ▽ More

    Submitted 20 August, 2017; originally announced August 2017.

    Comments: Proceedings of the Second Conference on Machine Translation (WMT 2017) at EMNLP 2017, Copenhagen/Danmark