Zum Hauptinhalt springen

Showing 1–9 of 9 results for author: Liu, Y J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.13560  [pdf, other

    cs.CL

    eRST: A Signaled Graph Theory of Discourse Relations and Organization

    Authors: Amir Zeldes, Tatsuya Aoyama, Yang Janet Liu, Siyao Peng, Debopam Das, Luke Gessler

    Abstract: In this article we present Enhanced Rhetorical Structure Theory (eRST), a new theoretical framework for computational discourse analysis, based on an expansion of Rhetorical Structure Theory (RST). The framework encompasses discourse relation graphs with tree-breaking, non-projective and concurrent relations, as well as implicit and explicit signals which give explainable rationales to our analyse… ▽ More

    Submitted 28 August, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  2. arXiv:2309.04940  [pdf, other

    cs.CL

    What's Hard in English RST Parsing? Predictive Models for Error Analysis

    Authors: Yang Janet Liu, Tatsuya Aoyama, Amir Zeldes

    Abstract: Despite recent advances in Natural Language Processing (NLP), hierarchical discourse parsing in the framework of Rhetorical Structure Theory remains challenging, and our understanding of the reasons for this are as yet limited. In this paper, we examine and model some of the factors associated with parsing difficulties in previous work: the existence of implicit discourse relations, challenges in… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: SIGDIAL 2023 camera-ready; 12 pages

  3. arXiv:2306.11256  [pdf, other

    cs.CL

    GUMSum: Multi-Genre Data and Evaluation for English Abstractive Summarization

    Authors: Yang Janet Liu, Amir Zeldes

    Abstract: Automatic summarization with pre-trained language models has led to impressively fluent results, but is prone to 'hallucinations', low performance on non-news genres, and outputs which are not exactly summaries. Targeting ACL 2023's 'Reality Check' theme, we present GUMSum, a small but carefully crafted dataset of English summaries in 12 written and spoken genres for evaluation of abstractive summ… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

    Comments: Accepted to the Findings of ACL 2023; camera-ready version

  4. arXiv:2306.01966  [pdf, other

    cs.CL

    GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and Linguistic Evaluation

    Authors: Tatsuya Aoyama, Shabnam Behzad, Luke Gessler, Lauren Levine, Jessica Lin, Yang Janet Liu, Siyao Peng, Yilun Zhu, Amir Zeldes

    Abstract: We present GENTLE, a new mixed-genre English challenge corpus totaling 17K tokens and consisting of 8 unusual text types for out-of domain evaluation: dictionary entries, esports commentaries, legal documents, medical notes, poetry, mathematical proofs, syllabuses, and threat letters. GENTLE is manually annotated for a variety of popular NLP tasks, including syntactic dependency parsing, entity re… ▽ More

    Submitted 21 September, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Camera-ready for LAW-XVII collocated with ACL 2023

  5. arXiv:2302.06488  [pdf, other

    cs.CL

    Why Can't Discourse Parsing Generalize? A Thorough Investigation of the Impact of Data Diversity

    Authors: Yang Janet Liu, Amir Zeldes

    Abstract: Recent advances in discourse parsing performance create the impression that, as in other NLP tasks, performance for high-resource languages such as English is finally becoming reliable. In this paper we demonstrate that this is not the case, and thoroughly investigate the impact of data diversity on RST parsing stability. We show that state-of-the-art architectures trained on the standard English… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Comments: Accepted at EACL 2023 (main, long); camera-ready version

  6. arXiv:2212.06037  [pdf

    cs.CL

    Chinese Discourse Annotation Reference Manual

    Authors: Siyao Peng, Yang Janet Liu, Amir Zeldes

    Abstract: This document provides extensive guidelines and examples for Rhetorical Structure Theory (RST) annotation in Mandarin Chinese. The guideline is divided into three sections. We first introduce preprocessing steps to prepare data for RST annotation. Secondly, we discuss syntactic criteria to segment texts into Elementary Discourse Units (EDUs). Lastly, we provide examples to define and distinguish d… ▽ More

    Submitted 11 October, 2022; originally announced December 2022.

  7. arXiv:2210.10449  [pdf, other

    cs.CL

    GCDT: A Chinese RST Treebank for Multigenre and Multilingual Discourse Parsing

    Authors: Siyao Peng, Yang Janet Liu, Amir Zeldes

    Abstract: A lack of large-scale human-annotated data has hampered the hierarchical discourse parsing of Chinese. In this paper, we present GCDT, the largest hierarchical discourse treebank for Mandarin Chinese in the framework of Rhetorical Structure Theory (RST). GCDT covers over 60K tokens across five genres of freely available text, using the same relation inventory as contemporary RST treebanks for Engl… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: Accepted at AACL 2022

  8. arXiv:2109.09777  [pdf, other

    cs.CL

    DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection

    Authors: Luke Gessler, Shabnam Behzad, Yang Janet Liu, Siyao Peng, Yilun Zhu, Amir Zeldes

    Abstract: This paper describes our submission to the DISRPT2021 Shared Task on Discourse Unit Segmentation, Connective Detection, and Relation Classification. Our system, called DisCoDisCo, is a Transformer-based neural classifier which enhances contextualized word embeddings (CWEs) with hand-crafted features, relying on tokenwise sequence tagging for discourse segmentation and connective detection, and a f… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    Comments: System submission for the CODI-DISRPT 2021 Shared Task on Discourse Processing across Formalisms. 1st place in all subtasks

  9. arXiv:2108.01075  [pdf, other

    cs.CV

    Boundary Knowledge Translation based Reference Semantic Segmentation

    Authors: Lechao Cheng, Zunlei Feng, Xinchao Wang, Ya Jie Liu, Jie Lei, Mingli Song

    Abstract: Given a reference object of an unknown type in an image, human observers can effortlessly find the objects of the same category in another image and precisely tell their visual boundaries. Such visual cognition capability of humans seems absent from the current research spectrum of computer vision. Existing segmentation networks, for example, rely on a humongous amount of labeled data, which is la… ▽ More

    Submitted 1 August, 2021; originally announced August 2021.

    Comments: Accepted by IJCAI 2021. arXiv admin note: text overlap with arXiv:2108.00379