Zum Hauptinhalt springen

Showing 1–7 of 7 results for author: de Marneffe, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.01931  [pdf, other

    cs.CL

    VariErr NLI: Separating Annotation Error from Human Label Variation

    Authors: Leon Weber-Genzel, Siyao Peng, Marie-Catherine de Marneffe, Barbara Plank

    Abstract: Human label variation arises when annotators assign different labels to the same item for valid reasons, while annotation errors occur when labels are assigned for invalid reasons. These two issues are prevalent in NLP benchmarks, yet existing research has studied them in isolation. To the best of our knowledge, there exists no prior work that focuses on teasing apart error from signal, especially… ▽ More

    Submitted 6 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 14 pages, accepted at ACL 2024 main

  2. arXiv:2310.13850  [pdf, other

    cs.CL

    Ecologically Valid Explanations for Label Variation in NLI

    Authors: Nan-Jiang Jiang, Chenhao Tan, Marie-Catherine de Marneffe

    Abstract: Human label variation, or annotation disagreement, exists in many natural language processing (NLP) tasks, including natural language inference (NLI). To gain direct evidence of how NLI label variation arises, we build LiveNLI, an English dataset of 1,415 ecologically valid explanations (annotators explain the NLI labels they chose) for 122 MNLI items (at least 10 explanations per item). The LiveN… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Findings at EMNLP 2023. Overlap with previous version arXiv:2304.12443

  3. arXiv:2304.12443  [pdf, other

    cs.CL

    Understanding and Predicting Human Label Variation in Natural Language Inference through Explanation

    Authors: Nan-Jiang Jiang, Chenhao Tan, Marie-Catherine de Marneffe

    Abstract: Human label variation (Plank 2022), or annotation disagreement, exists in many natural language processing (NLP) tasks. To be robust and trusted, NLP models need to identify such variation and be able to explain it. To this end, we created the first ecologically valid explanation dataset with diverse reasoning, LiveNLI. LiveNLI contains annotators' highlights and free-text explanations for the lab… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

  4. arXiv:2209.03392  [pdf, other

    cs.CL

    Investigating Reasons for Disagreement in Natural Language Inference

    Authors: Nan-Jiang Jiang, Marie-Catherine de Marneffe

    Abstract: We investigate how disagreement in natural language inference (NLI) annotation arises. We developed a taxonomy of disagreement sources with 10 categories spanning 3 high-level classes. We found that some disagreements are due to uncertainty in the sentence meaning, others to annotator biases and task artifacts, leading to different interpretations of the label distribution. We explore two modeling… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: accepted at TACL, pre-MIT Press publication version

  5. arXiv:2107.00807  [pdf, other

    cs.CL

    He Thinks He Knows Better than the Doctors: BERT for Event Factuality Fails on Pragmatics

    Authors: Nanjiang Jiang, Marie-Catherine de Marneffe

    Abstract: We investigate how well BERT performs on predicting factuality in several existing English datasets, encompassing various linguistic constructions. Although BERT obtains a strong performance on most datasets, it does so by exploiting common surface patterns that correlate with certain factuality labels, and it fails on instances where pragmatic reasoning is necessary. Contrary to what the high per… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: to be published in TACL, pre-MIT Press publication version

  6. arXiv:2004.10643  [pdf, other

    cs.CL

    Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection

    Authors: Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajič, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, Daniel Zeman

    Abstract: Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. The annotation consists in a linguistically motivated word segmentation; a morphological layer comprising lemmas, universal part-of-speech tags, and standardized morphological features; and a syntactic layer focusing on… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

    Comments: LREC 2020

  7. arXiv:1707.07212  [pdf, other

    cs.CL

    "i have a feeling trump will win..................": Forecasting Winners and Losers from User Predictions on Twitter

    Authors: Sandesh Swamy, Alan Ritter, Marie-Catherine de Marneffe

    Abstract: Social media users often make explicit predictions about upcoming events. Such statements vary in the degree of certainty the author expresses toward the outcome:"Leonardo DiCaprio will win Best Actor" vs. "Leonardo DiCaprio may win" or "No way Leonardo wins!". Can popular beliefs on social media predict who will win? To answer this question, we build a corpus of tweets annotated for veridicality… ▽ More

    Submitted 31 August, 2017; v1 submitted 22 July, 2017; originally announced July 2017.

    Comments: Accepted at EMNLP 2017 (long paper)