Zum Hauptinhalt springen

Showing 1–14 of 14 results for author: Dementieva, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10724  [pdf, other

    cs.CL

    Crafting Tomorrow's Headlines: Neural News Generation and Detection in English, Turkish, Hungarian, and Persian

    Authors: Cem Üyük, Danica Rovó, Shaghayegh Kolli, Rabia Varol, Georg Groh, Daryna Dementieva

    Abstract: In the era dominated by information overload and its facilitation with Large Language Models (LLMs), the prevalence of misinformation poses a significant threat to public discourse and societal well-being. A critical concern at present involves the identification of machine-generated news. In this work, we take a significant step by introducing a benchmark dataset designed for neural news detectio… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2406.19543  [pdf, other

    cs.CL cs.SI

    Demarked: A Strategy for Enhanced Abusive Speech Moderation through Counterspeech, Detoxification, and Message Management

    Authors: Seid Muhie Yimam, Daryna Dementieva, Tim Fischer, Daniil Moskovskiy, Naquee Rizwan, Punyajoy Saha, Sarthak Roy, Martin Semmann, Alexander Panchenko, Chris Biemann, Animesh Mukherjee

    Abstract: Despite regulations imposed by nations and social media platforms, such as recent EU regulations targeting digital violence, abusive content persists as a significant challenge. Existing approaches primarily rely on binary solutions, such as outright blocking or banning, yet fail to address the complex nature of abusive speech. In this work, we propose a more comprehensive approach called Demarcat… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2404.17841  [pdf, other

    cs.CL

    Toxicity Classification in Ukrainian

    Authors: Daryna Dementieva, Valeriia Khylenko, Nikolay Babakov, Georg Groh

    Abstract: The task of toxicity detection is still a relevant task, especially in the context of safe and fair LMs development. Nevertheless, labeled binary toxicity classification corpora are not available for all languages, which is understandable given the resource-intensive nature of the annotation process. Ukrainian, in particular, is among the languages lacking such resources. To our knowledge, there h… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted to WOAH, NAACL, 2024. arXiv admin note: text overlap with arXiv:2404.02043

  4. arXiv:2404.02043  [pdf, other

    cs.CL cs.AI

    Ukrainian Texts Classification: Exploration of Cross-lingual Knowledge Transfer Approaches

    Authors: Daryna Dementieva, Valeriia Khylenko, Georg Groh

    Abstract: Despite the extensive amount of labeled datasets in the NLP text classification field, the persistent imbalance in data availability across various languages remains evident. Ukrainian, in particular, stands as a language that still can benefit from the continued refinement of cross-lingual methodologies. Due to our knowledge, there is a tremendous lack of Ukrainian corpora for typical text classi… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  5. arXiv:2404.02037  [pdf, other

    cs.CL cs.AI

    MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages

    Authors: Daryna Dementieva, Nikolay Babakov, Alexander Panchenko

    Abstract: Text detoxification is a textual style transfer (TST) task where a text is paraphrased from a toxic surface form, e.g. featuring rude words, to the neutral register. Recently, text detoxification methods found their applications in various task such as detoxification of Large Language Models (LLMs) (Leong et al., 2023; He et al., 2024; Tang et al., 2023) and toxic speech combating in social networ… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL2024

  6. arXiv:2311.13937  [pdf, other

    cs.CL

    Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification

    Authors: Daryna Dementieva, Daniil Moskovskiy, David Dale, Alexander Panchenko

    Abstract: Text detoxification is the task of transferring the style of text from toxic to neutral. While here are approaches yielding promising results in monolingual setup, e.g., (Dale et al., 2021; Hallinan et al., 2022), cross-lingual transfer for this task remains a challenging open problem (Moskovskiy et al., 2022). In this work, we present a large-scale study of strategies for cross-lingual text detox… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: AACL 2023, main conference, long paper

  7. arXiv:2305.08636  [pdf, other

    cs.CL cs.AI

    AdamR at SemEval-2023 Task 10: Solving the Class Imbalance Problem in Sexism Detection with Ensemble Learning

    Authors: Adam Rydelek, Daryna Dementieva, Georg Groh

    Abstract: The Explainable Detection of Online Sexism task presents the problem of explainable sexism detection through fine-grained categorisation of sexist cases with three subtasks. Our team experimented with different ways to combat class imbalance throughout the tasks using data augmentation and loss alteration techniques. We tackled the challenge by utilising ensembles of Transformer models trained on… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: One of the top solutions at the SemEval-2023 task "The Explainable Detection of Online Sexism"

  8. arXiv:2305.08625  [pdf, other

    cs.CL cs.AI

    Adam-Smith at SemEval-2023 Task 4: Discovering Human Values in Arguments with Ensembles of Transformer-based Models

    Authors: Daniel Schroter, Daryna Dementieva, Georg Groh

    Abstract: This paper presents the best-performing approach alias "Adam Smith" for the SemEval-2023 Task 4: "Identification of Human Values behind Arguments". The goal of the task was to create systems that automatically identify the values within textual arguments. We train transformer-based models until they reach their loss minimum or f1-score maximum. Ensembling the models by selecting one global decisio… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: The winner of SemEval-2023 Task 4: "Identification of Human Values behind Arguments"

  9. arXiv:2303.03124  [pdf, other

    cs.CL cs.AI

    IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models

    Authors: Edoardo Mosca, Daryna Dementieva, Tohid Ebrahim Ajdari, Maximilian Kummeth, Kirill Gringauz, Yutong Zhou, Georg Groh

    Abstract: Interpretability and human oversight are fundamental pillars of deploying complex NLP models into real-world applications. However, applying explainability and human-in-the-loop methods requires technical proficiency. Despite existing toolkits for model understanding and analysis, options to integrate human feedback are still limited. We propose IFAN, a framework for real-time explanation-based in… ▽ More

    Submitted 2 October, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Accepted to AACL 2023 Demonstration systems Track

  10. arXiv:2211.14279  [pdf, other

    cs.CL cs.IR

    Multiverse: Multilingual Evidence for Fake News Detection

    Authors: Daryna Dementieva, Mikhail Kuimov, Alexander Panchenko

    Abstract: Misleading information spreads on the Internet at an incredible speed, which can lead to irreparable consequences in some cases. It is becoming essential to develop fake news detection technologies. While substantial work has been done in this direction, one of the limitations of the current approaches is that these models are focused only on one language and do not use multilingual information. I… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: 24 pages, 10 figures, extended version of ACL SRW 2021 paper

  11. arXiv:2206.02252  [pdf, other

    cs.CL

    Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models

    Authors: Daniil Moskovskiy, Daryna Dementieva, Alexander Panchenko

    Abstract: Detoxification is a task of generating text in polite style while preserving meaning and fluency of the original toxic text. Existing detoxification methods are designed to work in one exact language. This work investigates multilingual and cross-lingual detoxification and the behavior of large multilingual models like in this setting. Unlike previous works we aim to make large language models abl… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

  12. arXiv:2204.08975  [pdf, ps, other

    cs.CL

    Detecting Text Formality: A Study of Text Classification Approaches

    Authors: Daryna Dementieva, Nikolay Babakov, Alexander Panchenko

    Abstract: Formality is one of the important characteristics of text documents. The automatic detection of the formality level of a text is potentially beneficial for various natural language processing tasks. Before, two large-scale datasets were introduced for multiple languages featuring formality annotation -- GYAFC and X-FORMAL. However, they were primarily used for the training of style transfer models… ▽ More

    Submitted 8 September, 2023; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: Published at RANLP2023

  13. arXiv:2109.08914  [pdf, other

    cs.CL cs.LG

    Text Detoxification using Large Pre-trained Neural Models

    Authors: David Dale, Anton Voronov, Daryna Dementieva, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko

    Abstract: We present two novel unsupervised methods for eliminating toxicity in text. Our first method combines two recent ideas: (1) guidance of the generation process with small style-conditional language models and (2) use of paraphrasing models to perform style transfer. We use a well-performing paraphraser guided by style-trained language models to keep the text content and remove toxicity. Our second… ▽ More

    Submitted 3 November, 2021; v1 submitted 18 September, 2021; originally announced September 2021.

    Comments: Accepted to the EMNLP 2021 conference

  14. arXiv:2105.09052  [pdf, other

    cs.CL cs.LG

    Methods for Detoxification of Texts for the Russian Language

    Authors: Daryna Dementieva, Daniil Moskovskiy, Varvara Logacheva, David Dale, Olga Kozlova, Nikita Semenov, Alexander Panchenko

    Abstract: We introduce the first study of automatic detoxification of Russian texts to combat offensive language. Such a kind of textual style transfer can be used, for instance, for processing toxic content in social media. While much work has been done for the English language in this field, it has never been solved for the Russian language yet. We test two types of models - unsupervised approach based on… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.