Zum Hauptinhalt springen

Showing 1–17 of 17 results for author: Cieliebak, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.03235  [pdf, other

    cs.CL cs.AI

    Error-preserving Automatic Speech Recognition of Young English Learners' Language

    Authors: Janick Michot, Manuela Hürlimann, Jan Deriu, Luzia Sauer, Katsiaryna Mlynchyk, Mark Cieliebak

    Abstract: One of the central skills that language learners need to practice is speaking the language. Currently, students in school do not get enough speaking opportunities and lack conversational practice. Recent advances in speech technology and natural language processing allow for the creation of novel tools to practice their speaking skills. In this work, we tackle the first component of such a pipelin… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024 Main Conference

  2. arXiv:2406.01131  [pdf, other

    cs.AI

    Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation

    Authors: Pius von Däniken, Jan Deriu, Don Tuggener, Mark Cieliebak

    Abstract: Generative AI systems have become ubiquitous for all kinds of modalities, which makes the issue of the evaluation of such models more pressing. One popular approach is preference ratings, where the generated outputs of different systems are shown to evaluators who choose their preferences. In recent years the field shifted towards the development of automated (trained) metrics to assess generated… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL Main Conference

  3. arXiv:2310.09088  [pdf, other

    cs.CL cs.AI

    Dialect Transfer for Swiss German Speech Translation

    Authors: Claudio Paonessa, Yanick Schraner, Jan Deriu, Manuela Hürlimann, Manfred Vogel, Mark Cieliebak

    Abstract: This paper investigates the challenges in building Swiss German speech translation systems, specifically focusing on the impact of dialect diversity and differences between Swiss German and Standard German. Swiss German is a spoken language with no formal writing system, it comprises many diverse dialects and is a low-resource language with only around 5 million speakers. The study is guided by tw… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  4. arXiv:2306.03866  [pdf, other

    cs.CL cs.AI

    Correction of Errors in Preference Ratings from Automated Metrics for Text Generation

    Authors: Jan Deriu, Pius von Däniken, Don Tuggener, Mark Cieliebak

    Abstract: A major challenge in the field of Text Generation is evaluation: Human evaluations are cost-intensive, and automated metrics often display considerable disagreement with human judgments. In this paper, we propose a statistical model of Text Generation evaluation that accounts for the error-proneness of automated metrics when used to generate preference rankings between system outputs. We show that… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

  5. arXiv:2305.18855  [pdf, other

    cs.CL cs.AI

    STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions

    Authors: Michel Plüss, Jan Deriu, Yanick Schraner, Claudio Paonessa, Julia Hartmann, Larissa Schmidt, Christian Scheller, Manuela Hürlimann, Tanja Samardžić, Manfred Vogel, Mark Cieliebak

    Abstract: We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swiss German speech, annotated with Standard German text at the sentence level. The data is collected using a web app in which the speakers are shown Standard German sentences, which they translate to Swiss German and record. We make the corpus publicly available. It contains 343 hours of speech from all dialect regions and is th… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  6. arXiv:2305.01633  [pdf, other

    cs.CL

    Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

    Authors: Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Anouck Braggaar, Mark Cieliebak, Elizabeth Clark, Kees van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Mingqi Gao, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Emiel Krahmer, Huiyuan Lai , et al. (17 additional authors not shown)

    Abstract: We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13\% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, a… ▽ More

    Submitted 7 August, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: 5 pages plus appendix, 4 tables, 1 figure. To appear at "Workshop on Insights from Negative Results in NLP" (co-located with EACL2023). Updated author list and acknowledgements

    MSC Class: 68 ACM Class: I.2.7

  7. arXiv:2210.13025  [pdf, other

    cs.CL cs.AI

    On the Effectiveness of Automated Metrics for Text Generation Systems

    Authors: Pius von Däniken, Jan Deriu, Don Tuggener, Mark Cieliebak

    Abstract: A major challenge in the field of Text Generation is evaluation because we lack a sound theory that can be leveraged to extract guidelines for evaluation campaigns. In this work, we propose a first step towards such a theory that incorporates different sources of uncertainty, such as imperfect automated metrics and insufficiently sized test sets. The theory has practical applications, such as dete… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  8. arXiv:2205.09501  [pdf, other

    cs.CL cs.AI

    SDS-200: A Swiss German Speech to Standard German Text Corpus

    Authors: Michel Plüss, Manuela Hürlimann, Marc Cuny, Alla Stöckli, Nikolaos Kapotis, Julia Hartmann, Malgorzata Anna Ulasik, Christian Scheller, Yanick Schraner, Amit Jain, Jan Deriu, Mark Cieliebak, Manfred Vogel

    Abstract: We present SDS-200, a corpus of Swiss German dialectal speech with Standard German text translations, annotated with dialect, age, and gender information of the speakers. The dataset allows for training speech translation, dialect recognition, and speech synthesis systems, among others. The data was collected using a web recording tool that is open to the public. Each participant was given a text… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

  9. arXiv:2202.13887  [pdf, other

    cs.AI cs.CL

    Probing the Robustness of Trained Metrics for Conversational Dialogue Systems

    Authors: Jan Deriu, Don Tuggener, Pius von Däniken, Mark Cieliebak

    Abstract: This paper introduces an adversarial method to stress-test trained metrics to evaluate conversational dialogue systems. The method leverages Reinforcement Learning to find response strategies that elicit optimal scores from the trained metrics. We apply our method to test recently proposed trained metrics. We find that they all are susceptible to giving high scores to responses generated by relati… ▽ More

    Submitted 28 February, 2022; originally announced February 2022.

  10. arXiv:2010.02140  [pdf, other

    cs.AI cs.CL

    Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems

    Authors: Jan Deriu, Don Tuggener, Pius von Däniken, Jon Ander Campos, Alvaro Rodrigo, Thiziri Belkacem, Aitor Soroa, Eneko Agirre, Mark Cieliebak

    Abstract: The lack of time-efficient and reliable evaluation methods hamper the development of conversational dialogue systems (chatbots). Evaluations requiring humans to converse with chatbots are time and cost-intensive, put high cognitive demands on the human judges, and yield low-quality results. In this work, we introduce \emph{Spot The Bot}, a cost-efficient and robust evaluation framework that replac… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

  11. arXiv:2005.01328  [pdf, other

    cs.CL

    DoQA -- Accessing Domain-Specific FAQs via Conversational QA

    Authors: Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre

    Abstract: The goal of this work is to build conversational Question Answering (QA) interfaces for the large body of domain-specific information available in FAQ sites. We present DoQA, a dataset with 2,437 dialogues and 10,917 QA pairs. The dialogues are collected from three Stack Exchange sites using the Wizard of Oz method with crowdsourcing. Compared to previous work, DoQA comprises well-defined informat… ▽ More

    Submitted 18 May, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

    Comments: Accepted at ACL 2020. 13 pages 4 figures

    Journal ref: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020

  12. arXiv:2004.07633  [pdf, other

    cs.AI cs.CL cs.LG

    A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation

    Authors: Jan Deriu, Katsiaryna Mlynchyk, Philippe Schläpfer, Alvaro Rodrigo, Dirk von Grünigen, Nicolas Kaiser, Kurt Stockinger, Eneko Agirre, Mark Cieliebak

    Abstract: In this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database called Operation Trees (OT). This representation allows us to invert the annotation process without losing flexibility in the types of queries that we generate. Furt… ▽ More

    Submitted 25 June, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Journal ref: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020

  13. arXiv:1909.12066  [pdf, other

    cs.AI cs.CL cs.LG

    Towards a Metric for Automated Conversational Dialogue System Evaluation and Improvement

    Authors: Jan Deriu, Mark Cieliebak

    Abstract: We present "AutoJudge", an automated evaluation method for conversational dialogue systems. The method works by first generating dialogues based on self-talk, i.e. dialogue systems talking to itself. Then, it uses human ratings on these dialogues to train an automated judgement model. Our experiments show that AutoJudge correlates well with the human ratings and can be used to automatically evalua… ▽ More

    Submitted 25 June, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: 8 Pages, To be published at the INLG 2019 converence

    Journal ref: Proceedings of the 12th International Conference on Natural Language Generation. 2019

  14. arXiv:1906.06550  [pdf, other

    cs.CL

    Towards Integration of Statistical Hypothesis Tests into Deep Neural Networks

    Authors: Ahmad Aghaebrahimian, Mark Cieliebak

    Abstract: We report our ongoing work about a new deep architecture working in tandem with a statistical test procedure for jointly training texts and their label descriptions for multi-label and multi-class classification tasks. A statistical hypothesis testing method is used to extract the most informative words for each given class. These words are used as a class description for more label-aware text cla… ▽ More

    Submitted 15 June, 2019; originally announced June 2019.

    Comments: Accepted to ACL 2019

  15. arXiv:1906.06465  [pdf, other

    cs.CL cs.LG cs.SI stat.ML

    Correlating Twitter Language with Community-Level Health Outcomes

    Authors: Arno Schneuwly, Ralf Grubenmann, Séverine Rion Logean, Mark Cieliebak, Martin Jaggi

    Abstract: We study how language on social media is linked to diseases such as atherosclerotic heart disease (AHD), diabetes and various types of cancer. Our proposed model leverages state-of-the-art sentence embeddings, followed by a regression model and clustering, without the need of additional labelled data. It allows to predict community-level medical outcomes from language, and thereby potentially tran… ▽ More

    Submitted 24 June, 2019; v1 submitted 12 June, 2019; originally announced June 2019.

    Comments: ACL SMM4H Workshop (Social Media Mining for Health Applications)

    ACM Class: J.3; J.4; I.2.7

  16. arXiv:1905.04071  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Survey on Evaluation Methods for Dialogue Systems

    Authors: Jan Deriu, Alvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen, Sophie Rosset, Eneko Agirre, Mark Cieliebak

    Abstract: In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of huma… ▽ More

    Submitted 26 June, 2020; v1 submitted 10 May, 2019; originally announced May 2019.

    Journal ref: Artificial Intelligence Review, June 2020

  17. arXiv:1703.02504  [pdf, other

    cs.CL cs.IR cs.LG

    Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification

    Authors: Jan Deriu, Aurelien Lucchi, Valeria De Luca, Aliaksei Severyn, Simon Müller, Mark Cieliebak, Thomas Hofmann, Martin Jaggi

    Abstract: This paper presents a novel approach for multi-lingual sentiment classification in short texts. This is a challenging task as the amount of training data in languages other than English is very limited. Previously proposed multi-lingual approaches typically require to establish a correspondence to English for which powerful classifiers are already available. In contrast, our method does not requir… ▽ More

    Submitted 7 March, 2017; originally announced March 2017.

    Comments: appearing at WWW 2017 - 26th International World Wide Web Conference

    ACM Class: I.2.7