Skip to main content

Showing 1–50 of 64 results for author: Cheung, J C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12018  [pdf, other

    cs.CL

    CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling

    Authors: Yu Bai, Xiyuan Zou, Heyan Huang, Sanxing Chen, Marc-Antoine Rondeau, Yang Gao, Jackie Chi Kit Cheung

    Abstract: Long sequence modeling has gained broad interest as large language models (LLMs) continue to advance. Recent research has identified that a large portion of hidden states within the key-value caches of Transformer models can be discarded (also termed evicted) without affecting the perplexity performance in generating long sequences. However, we show that these methods, despite preserving perplexit… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Work in progress

  2. arXiv:2406.08723  [pdf, other

    cs.CL

    ECBD: Evidence-Centered Benchmark Design for NLP

    Authors: Yu Lu Liu, Su Lin Blodgett, Jackie Chi Kit Cheung, Q. Vera Liao, Alexandra Olteanu, Ziang Xiao

    Abstract: Benchmarking is seen as critical to assessing progress in NLP. However, creating a benchmark involves many design decisions (e.g., which datasets to include, which metrics to use) that often rely on tacit, untested assumptions about what the benchmark is intended to measure or is actually measuring. There is currently no principled way of analyzing these decisions and how they impact the validity… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  3. arXiv:2406.07640  [pdf, other

    cs.LG cs.AI

    When is an Embedding Model More Promising than Another?

    Authors: Maxime Darrin, Philippe Formont, Ismail Ben Ayed, Jackie CK Cheung, Pablo Piantanida

    Abstract: Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to perform various downstream tasks. The evaluation of embedding models typically depends on domain-specific empirical approaches utilizing downstream tasks, primarily because of the lack of a standardized framework for comparison. However, acquiring adequately la… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  4. arXiv:2406.07359  [pdf, other

    cs.CL

    GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews

    Authors: Maxime Darrin, Ines Arous, Pablo Piantanida, Jackie CK Cheung

    Abstract: Scientific peer review is essential for the quality of academic publications. However, the increasing number of paper submissions to conferences has strained the reviewing process. This surge poses a burden on area chairs who have to carefully read an ever-growing volume of reviews and discern each reviewer's main arguments as part of their decision process. In this paper, we introduce \sys, a sum… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  5. arXiv:2404.00727  [pdf, other

    cs.CL

    A Controlled Reevaluation of Coreference Resolution Models

    Authors: Ian Porada, Xiyuan Zou, Jackie Chi Kit Cheung

    Abstract: All state-of-the-art coreference resolution (CR) models involve finetuning a pretrained language model. Whether the superior performance of one CR model over another is due to the choice of language model or other factors, such as the task-specific architecture, is difficult or impossible to determine due to lack of a standardized experimental setup. To resolve this ambiguity, we systematically ev… ▽ More

    Submitted 22 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: LREC-COLING 2024

  6. arXiv:2403.18167  [pdf, other

    cs.CL cs.AI

    Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations

    Authors: Lei Yu, Meng Cao, Jackie Chi Kit Cheung, Yue Dong

    Abstract: State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that misalign with world knowledge. To explore the mechanistic causes of these hallucinations, we create diagnostic datasets with subject-relation queries and adapt interpretability methods to trace hallucinations through internal model representations. We discover two general and distinct mechanistic causes of ha… ▽ More

    Submitted 17 June, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  7. arXiv:2403.13213  [pdf, other

    cs.LG cs.CL cs.CY

    From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards

    Authors: Khaoula Chehbouni, Megha Roshan, Emmanuel Ma, Futian Andrew Wei, Afaf Taik, Jackie CK Cheung, Golnoosh Farnadi

    Abstract: Recent progress in large language models (LLMs) has led to their widespread adoption in various domains. However, these advancements have also introduced additional safety risks and raised concerns regarding their detrimental impact on already marginalized populations. Despite growing mitigation efforts to develop safety safeguards, such as supervised safety-oriented fine-tuning and leveraging saf… ▽ More

    Submitted 5 July, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 9 pages, 4 figures. Accepted to Findings of the Association for Computational Linguistics: ACL 2024

  8. arXiv:2402.19457  [pdf, other

    cs.CL cs.AI

    $\texttt{COSMIC}$: Mutual Information for Task-Agnostic Summarization Evaluation

    Authors: Maxime Darrin, Philippe Formont, Jackie Chi Kit Cheung, Pablo Piantanida

    Abstract: Assessing the quality of summarizers poses significant challenges. In response, we propose a novel task-oriented evaluation approach that assesses summarizers based on their capacity to produce summaries that are useful for downstream tasks, while preserving task outcomes. We theoretically establish a direct relationship between the resulting error probability of these tasks and the mutual informa… ▽ More

    Submitted 1 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  9. arXiv:2401.11323  [pdf, other

    cs.CL

    Identifying and Analyzing Task-Encoding Tokens in Large Language Models

    Authors: Yu Bai, Heyan Huang, Cesare Spinoso-Di Piano, Marc-Antoine Rondeau, Sanxing Chen, Yang Gao, Jackie Chi Kit Cheung

    Abstract: In-context learning (ICL) has become an effective solution for few-shot learning in natural language processing. However, our understanding of ICL's working mechanisms is limited, specifically regarding how models learn to perform tasks from ICL demonstrations. For example, unexpectedly large changes in performance can arise from small changes in the prompt, leaving prompt design a largely empiric… ▽ More

    Submitted 16 February, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

    Comments: Work in progress

  10. arXiv:2401.05914  [pdf, other

    cs.CL cs.AI

    How Teachers Can Use Large Language Models and Bloom's Taxonomy to Create Educational Quizzes

    Authors: Sabina Elkins, Ekaterina Kochmar, Jackie C. K. Cheung, Iulian Serban

    Abstract: Question generation (QG) is a natural language processing task with an abundance of potential benefits and use cases in the educational domain. In order for this potential to be realized, QG systems must be designed and validated with pedagogical needs in mind. However, little research has assessed or designed QG approaches with the input from real teachers or students. This paper applies a large… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 8 pages, 8 figures. Accepted to the main track of the EAAI-24: The 14th Symposium on Educational Advances in Artificial Intelligence

  11. arXiv:2312.01858  [pdf, other

    cs.CL

    Evaluating Dependencies in Fact Editing for Language Models: Specificity and Implication Awareness

    Authors: Zichao Li, Ines Arous, Siva Reddy, Jackie C. K. Cheung

    Abstract: The potential of using a large language model (LLM) as a knowledge base (KB) has sparked significant interest. To manage the knowledge acquired by LLMs, we need to ensure that the editing of learned facts respects internal logical constraints, which are known as dependency of knowledge. Existing work on editing LLMs has partially addressed the issue of dependency, when the editing of a fact should… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Findings of EMNLP2023

  12. arXiv:2311.11103  [pdf, other

    cs.CL

    Responsible AI Considerations in Text Summarization Research: A Review of Current Practices

    Authors: Yu Lu Liu, Meng Cao, Su Lin Blodgett, Jackie Chi Kit Cheung, Alexandra Olteanu, Adam Trischler

    Abstract: AI and NLP publication venues have increasingly encouraged researchers to reflect on possible ethical considerations, adverse impacts, and other responsible AI issues their work might engender. However, for specific NLP tasks our understanding of how prevalent such issues are, or when and why these issues are likely to arise, remains limited. Focusing on text summarization -- a common NLP task lar… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  13. arXiv:2311.04921  [pdf, other

    cs.CL cs.AI

    Successor Features for Efficient Multisubject Controlled Text Generation

    Authors: Meng Cao, Mehdi Fatemi, Jackie Chi Kit Cheung, Samira Shabanian

    Abstract: While large language models (LLMs) have achieved impressive performance in generating fluent and realistic text, controlling the generated text so that it exhibits properties such as safety, factuality, and non-toxicity remains challenging. % such as DExperts, GeDi, and rectification Existing decoding-based methods are static in terms of the dimension of control; if the target subject is changed,… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  14. arXiv:2310.01717  [pdf, other

    cs.CL cs.AI cs.LG

    Ensemble Distillation for Unsupervised Constituency Parsing

    Authors: Behzad Shayegh, Yanshuai Cao, Xiaodan Zhu, Jackie C. K. Cheung, Lili Mou

    Abstract: We investigate the unsupervised constituency parsing task, which organizes words and phrases of a sentence into a hierarchical structure without using linguistically annotated data. We observe that existing unsupervised parsers capture differing aspects of parsing structures, which can be leveraged to enhance unsupervised parsing performance. To this end, we propose a notion of "tree averaging," b… ▽ More

    Submitted 25 April, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted by International Conference on Learning Representations (ICLR) 2024

  15. arXiv:2305.05858  [pdf, other

    cs.CL

    Vārta: A Large-Scale Headline-Generation Dataset for Indic Languages

    Authors: Rahul Aralikatte, Ziling Cheng, Sumanth Doddapaneni, Jackie Chi Kit Cheung

    Abstract: We present Vārta, a large-scale multilingual dataset for headline generation in Indic languages. This dataset includes 41.8 million news articles in 14 different Indic languages (and English), which come from a variety of high-quality sources. To the best of our knowledge, this is the largest collection of curated articles for Indic languages currently available. We use the data collected in a ser… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: Findings of ACL 2023

  16. arXiv:2304.06638  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    How Useful are Educational Questions Generated by Large Language Models?

    Authors: Sabina Elkins, Ekaterina Kochmar, Jackie C. K. Cheung, Iulian Serban

    Abstract: Controllable text generation (CTG) by large language models has a huge potential to transform education for teachers and students alike. Specifically, high quality and diverse question generation can dramatically reduce the load on teachers and improve the quality of their educational content. Recent work in this domain has made progress with generation, but fails to show that real teachers judge… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: Accepted to AIED Late Breaking Results 2023 - to be published in their proceedings

  17. arXiv:2303.09092  [pdf, other

    cs.CL

    Challenges to Evaluating the Generalization of Coreference Resolution Models: A Measurement Modeling Perspective

    Authors: Ian Porada, Alexandra Olteanu, Kaheer Suleman, Adam Trischler, Jackie Chi Kit Cheung

    Abstract: It is increasingly common to evaluate the same coreference resolution (CR) model on multiple datasets. Do these multi-dataset evaluations allow us to draw meaningful conclusions about model generalization? Or, do they rather reflect the idiosyncrasies of a particular experimental setup (e.g., the specific datasets used)? To study this, we view evaluation through the lens of measurement modeling, a… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: ACL Findings 2024

  18. arXiv:2302.14003  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Systematic Rectification of Language Models via Dead-end Analysis

    Authors: Meng Cao, Mehdi Fatemi, Jackie Chi Kit Cheung, Samira Shabanian

    Abstract: With adversarial or otherwise normal prompts, existing large language models (LLM) can be pushed to generate toxic discourses. One way to reduce the risk of LLMs generating undesired discourses is to alter the training of the LLM. This can be very restrictive due to demanding computation requirements. Other methods rely on rule-based or prompt-based token elimination, which are limited as they dis… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: The Eleventh International Conference on Learning Representations, ICLR'23

    Journal ref: ICLR 2023

  19. arXiv:2302.09852  [pdf, other

    cs.CL cs.AI

    Unsupervised Layer-wise Score Aggregation for Textual OOD Detection

    Authors: Maxime Darrin, Guillaume Staerman, Eduardo Dadalto Câmara Gomes, Jackie CK Cheung, Pablo Piantanida, Pierre Colombo

    Abstract: Out-of-distribution (OOD) detection is a rapidly growing field due to new robustness and security requirements driven by an increased number of AI-based systems. Existing OOD textual detectors often rely on an anomaly score (e.g., Mahalanobis distance) computed on the embedding output of the last layer of the encoder. In this work, we observe that OOD detection performance varies greatly depending… ▽ More

    Submitted 21 February, 2024; v1 submitted 20 February, 2023; originally announced February 2023.

  20. arXiv:2302.08531  [pdf, other

    cs.CL

    Learning with Rejection for Abstractive Text Summarization

    Authors: Meng Cao, Yue Dong, Jingyi He, Jackie Chi Kit Cheung

    Abstract: State-of-the-art abstractive summarization systems frequently hallucinate content that is not supported by the source document, mainly due to noise in the training dataset. Existing methods opt to drop the noisy samples or tokens from the training set entirely, reducing the effective training set size and creating an artificial propensity to copy words from the source. In this work, we propose a t… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  21. arXiv:2302.06784  [pdf, other

    cs.CL

    The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis and Algorithm for Robust Natural Language Generation

    Authors: Kushal Arora, Timothy J. O'Donnell, Doina Precup, Jason Weston, Jackie C. K. Cheung

    Abstract: State-of-the-art language generation models can degenerate when applied to open-ended generation problems such as text completion, story generation, or dialog modeling. This degeneration usually shows up in the form of incoherence, lack of vocabulary diversity, and self-repetition or copying from the context. In this paper, we postulate that ``human-like'' generations usually lie in a narrow and n… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  22. arXiv:2212.08192  [pdf, other

    cs.CL cs.LG

    The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources in Natural Language Understanding Systems

    Authors: Akshatha Arodi, Martin Pömsl, Kaheer Suleman, Adam Trischler, Alexandra Olteanu, Jackie Chi Kit Cheung

    Abstract: Many state-of-the-art natural language understanding (NLU) models are based on pretrained neural language models. These models often make inferences using information from multiple sources. An important class of such inferences are those that require both background knowledge, presumably contained in a model's pretrained parameters, and instance-specific information that is supplied at inference t… ▽ More

    Submitted 22 May, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: Accepted at ACL 2023. Code available at https://github.com/mpoemsl/kitmus

  23. arXiv:2206.14145  [pdf, other

    cs.CL cs.AI

    Question Personalization in an Intelligent Tutoring System

    Authors: Sabina Elkins, Robert Belfer, Ekaterina Kochmar, Iulian Serban, Jackie C. K. Cheung

    Abstract: This paper investigates personalization in the field of intelligent tutoring systems (ITS). We hypothesize that personalization in the way questions are asked improves student learning outcomes. Previous work on dialogue-based ITS personalization has yet to address question phrasing. We show that generating versions of the questions suitable for students at different levels of subject proficiency… ▽ More

    Submitted 25 May, 2022; originally announced June 2022.

    Comments: To be published in AIED Late Breaking Results 2022

  24. arXiv:2205.12394  [pdf, other

    cs.CL

    MaskEval: Weighted MLM-Based Evaluation for Text Summarization and Simplification

    Authors: Yu Lu Liu, Rachel Bawden, Thomas Scialom, Benoît Sagot, Jackie Chi Kit Cheung

    Abstract: In text summarization and simplification, system outputs must be evaluated along multiple dimensions such as relevance, factual consistency, fluency, and grammaticality, and a wide range of possible outputs could be of high quality. These properties make the development of an adaptable, reference-less evaluation metric both necessary and challenging. We introduce MaskEval, a reference-less metric… ▽ More

    Submitted 13 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

  25. Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment

    Authors: Zichao Li, Prakhar Sharma, Xing Han Lu, Jackie C. K. Cheung, Siva Reddy

    Abstract: Most research on question answering focuses on the pre-deployment stage; i.e., building an accurate model for deployment. In this paper, we ask the question: Can we improve QA systems further \emph{post-}deployment based on user interactions? We focus on two kinds of improvements: 1) improving the QA system's performance itself, and 2) providing the model with the ability to explain the correctnes… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: ACL 2022 Findings

    Journal ref: Findings of the Association for Computational Linguistics: ACL (2022) 926-937

  26. arXiv:2204.01171  [pdf, other

    cs.CL cs.AI cs.LG

    Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation

    Authors: Kushal Arora, Layla El Asri, Hareesh Bahuleyan, Jackie Chi Kit Cheung

    Abstract: Current language generation models suffer from issues such as repetition, incoherence, and hallucinations. An often-repeated hypothesis is that this brittleness of generation models is caused by the training and the generation procedure mismatch, also referred to as exposure bias. In this paper, we verify this hypothesis by analyzing exposure bias from an imitation learning perspective. We show th… ▽ More

    Submitted 9 January, 2023; v1 submitted 3 April, 2022; originally announced April 2022.

    Comments: Accepted in Findings of ACL 2022. v2: Equation 7 updated, typo fixes

  27. arXiv:2112.08583  [pdf, other

    cs.CL

    Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge

    Authors: Ian Porada, Alessandro Sordoni, Jackie Chi Kit Cheung

    Abstract: Transformer models pre-trained with a masked-language-modeling objective (e.g., BERT) encode commonsense knowledge as evidenced by behavioral probes; however, the extent to which this knowledge is acquired by systematic inference over the semantics of the pre-training corpora is an open question. To answer this question, we selectively inject verbalized knowledge into the minibatches of a BERT mod… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

  28. arXiv:2109.09784  [pdf, other

    cs.CL

    Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization

    Authors: Meng Cao, Yue Dong, Jackie Chi Kit Cheung

    Abstract: State-of-the-art abstractive summarization systems often generate \emph{hallucinations}; i.e., content that is not directly inferable from the source text. Despite being assumed incorrect, we find that much hallucinated content is factual, namely consistent with world knowledge. These factual hallucinations can be beneficial in a summary by providing useful background information. In this work, we… ▽ More

    Submitted 6 December, 2021; v1 submitted 30 August, 2021; originally announced September 2021.

  29. arXiv:2104.10247  [pdf, other

    cs.CL

    Modeling Event Plausibility with Consistent Conceptual Abstraction

    Authors: Ian Porada, Kaheer Suleman, Adam Trischler, Jackie Chi Kit Cheung

    Abstract: Understanding natural language requires common sense, one aspect of which is the ability to discern the plausibility of events. While distributional models -- most recently pre-trained, Transformer language models -- have demonstrated improvements in modeling event plausibility, their performance still falls short of humans'. In this work, we show that Transformer-based plausibility models are mar… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

    Comments: NAACL-HLT 2021

  30. arXiv:2104.08664  [pdf, other

    cs.CL

    Characterizing Idioms: Conventionality and Contingency

    Authors: Michaela Socolof, Jackie Chi Kit Cheung, Michael Wagner, Timothy J. O'Donnell

    Abstract: Idioms are unlike most phrases in two important ways. First, the words in an idiom have non-canonical meanings. Second, the non-canonical meanings of words in an idiom are contingent on the presence of other words in the idiom. Linguistic theories differ on whether these properties depend on one another, as well as whether special theoretical machinery is needed to accommodate idioms. We define tw… ▽ More

    Submitted 14 September, 2022; v1 submitted 17 April, 2021; originally announced April 2021.

  31. arXiv:2104.08530  [pdf, other

    cs.CL

    The Topic Confusion Task: A Novel Scenario for Authorship Attribution

    Authors: Malik H. Altakrori, Jackie Chi Kit Cheung, Benjamin C. M. Fung

    Abstract: Authorship attribution is the problem of identifying the most plausible author of an anonymous text from a set of candidate authors. Researchers have investigated same-topic and cross-topic scenarios of authorship attribution, which differ according to whether new, unseen topics are used in the testing phase. However, neither scenario allows us to explain whether errors are caused by a failure to… ▽ More

    Submitted 9 September, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: 15 pages (9 + ref./appin.), 6 figures, Accepted to Findings of EMNLP 2021

  32. arXiv:2104.08419  [pdf, other

    cs.AI

    TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Completion

    Authors: Jiapeng Wu, Yishi Xu, Yingxue Zhang, Chen Ma, Mark Coates, Jackie Chi Kit Cheung

    Abstract: Reasoning in a temporal knowledge graph (TKG) is a critical task for information retrieval and semantic search. It is particularly challenging when the TKG is updated frequently. The model has to adapt to changes in the TKG for efficient training and inference while preserving its performance on historical knowledge. Recent work approaches TKG completion (TKGC) by augmenting the encoder-decoder fr… ▽ More

    Submitted 8 May, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: SIGIR 2021 long paper. 13 pages, 4 figures

  33. arXiv:2103.07785  [pdf, other

    cs.CL

    Deep Discourse Analysis for Generating Personalized Feedback in Intelligent Tutor Systems

    Authors: Matt Grenander, Robert Belfer, Ekaterina Kochmar, Iulian V. Serban, François St-Hilaire, Jackie C. K. Cheung

    Abstract: We explore creating automated, personalized feedback in an intelligent tutoring system (ITS). Our goal is to pinpoint correct and incorrect concepts in student answers in order to achieve better student learning gains. Although automatic methods for providing personalized feedback exist, they do not explicitly inform students about which concepts in their answers are correct or incorrect. Our appr… ▽ More

    Submitted 13 March, 2021; originally announced March 2021.

    Comments: Accepted at EAAI 2021

  34. arXiv:2101.00371  [pdf, other

    cs.CL

    On-the-Fly Attention Modulation for Neural Generation

    Authors: Yue Dong, Chandra Bhagavatula, Ximing Lu, Jena D. Hwang, Antoine Bosselut, Jackie Chi Kit Cheung, Yejin Choi

    Abstract: Despite considerable advancements with deep neural language models (LMs), neural text generation still suffers from degeneration: the generated text is repetitive, generic, self-contradictory, and often lacks commonsense. Our analyses on sentence-level attention patterns in LMs reveal that neural degeneration may be associated with insufficient learning of task-specific characteristics by the atte… ▽ More

    Submitted 13 October, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: 10 pages, 3 figures

  35. arXiv:2012.15355  [pdf, other

    cs.CL cs.LG

    Optimizing Deeper Transformers on Small Datasets

    Authors: Peng Xu, Dhruv Kumar, Wei Yang, Wenjie Zi, Keyi Tang, Chenyang Huang, Jackie Chi Kit Cheung, Simon J. D. Prince, Yanshuai Cao

    Abstract: It is a common belief that training deep transformers from scratch requires large datasets. Consequently, for small datasets, people usually use shallow and simple additional layers on top of pre-trained models during fine-tuning. This work shows that this does not always need to be the case: with proper initialization and optimization, the benefits of very deep transformers can carry over to chal… ▽ More

    Submitted 31 May, 2021; v1 submitted 30 December, 2020; originally announced December 2020.

    Comments: Accepted at ACL 2021 main conference

  36. arXiv:2011.07013  [pdf, other

    cs.CL cs.AI

    Deconstructing word embedding algorithms

    Authors: Kian Kenyon-Dean, Edward Newell, Jackie Chi Kit Cheung

    Abstract: Word embeddings are reliable feature representations of words used to obtain high quality results for various NLP applications. Uncontextualized word embeddings are used in many NLP tasks today, especially in resource-limited settings where high memory capacity and GPUs are not available. Given the historical success of word embeddings in NLP, we propose a retrospective on some of the most well-kn… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

    Comments: EMNLP 2020, 6 pages. arXiv admin note: substantial text overlap with arXiv:1911.13280

    MSC Class: 68T50

  37. arXiv:2011.04767  [pdf, other

    cs.CL cs.AI cs.LG

    An Analysis of Dataset Overlap on Winograd-Style Tasks

    Authors: Ali Emami, Adam Trischler, Kaheer Suleman, Jackie Chi Kit Cheung

    Abstract: The Winograd Schema Challenge (WSC) and variants inspired by it have become important benchmarks for common-sense reasoning (CSR). Model performance on the WSC has quickly progressed from chance-level to near-human using neural language models trained on massive corpora. In this paper, we analyze the effects of varying degrees of overlap between these training corpora and the test instances in WSC… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

    Comments: 11 pages with references, accepted at COLING 2020

    Journal ref: Coling2020

  38. arXiv:2011.02944  [pdf, other

    cs.CL

    Learning Efficient Task-Specific Meta-Embeddings with Word Prisms

    Authors: Jingyi He, KC Tsiolis, Kian Kenyon-Dean, Jackie Chi Kit Cheung

    Abstract: Word embeddings are trained to predict word cooccurrence statistics, which leads them to possess different lexical properties (syntactic, semantic, etc.) depending on the notion of context defined at training time. These properties manifest when querying the embedding space for the most similar vectors, and when used at the input layer of deep neural networks trained to solve downstream NLP proble… ▽ More

    Submitted 5 November, 2020; originally announced November 2020.

  39. arXiv:2010.08712  [pdf, ps, other

    cs.CL cs.AI

    Factual Error Correction for Abstractive Summarization Models

    Authors: Meng Cao, Yue Dong, Jiapeng Wu, Jackie Chi Kit Cheung

    Abstract: Neural abstractive summarization systems have achieved promising progress, thanks to the availability of large-scale datasets and models pre-trained with self-supervised methods. However, ensuring the factual consistency of the generated summaries for abstractive summarization systems is a challenge. We propose a post-editing corrector module to address this issue by identifying and correcting fac… ▽ More

    Submitted 1 April, 2021; v1 submitted 17 October, 2020; originally announced October 2020.

  40. arXiv:2010.03526  [pdf, other

    cs.LG cs.AI cs.CL

    TeMP: Temporal Message Passing for Temporal Knowledge Graph Completion

    Authors: Jiapeng Wu, Meng Cao, Jackie Chi Kit Cheung, William L. Hamilton

    Abstract: Inferring missing facts in temporal knowledge graphs (TKGs) is a fundamental and challenging task. Previous works have approached this problem by augmenting methods for static knowledge graphs to leverage time-dependent representations. However, these methods do not explicitly leverage multi-hop structural information and temporal facts from recent time steps to enhance their predictions. Addition… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

    Comments: 17 pages, 9 figures. EMNLP 2020 Long Paper

  41. arXiv:2010.02443  [pdf, other

    cs.CL

    Multi-Fact Correction in Abstractive Text Summarization

    Authors: Yue Dong, Shuohang Wang, Zhe Gan, Yu Cheng, Jackie Chi Kit Cheung, Jingjing Liu

    Abstract: Pre-trained neural abstractive summarization systems have dominated extractive strategies on news summarization performance, at least in terms of ROUGE. However, system-generated abstractive summaries often face the pitfall of factual inconsistency: generating incorrect facts with respect to the source text. To address this challenge, we propose Span-Fact, a suite of two factual correction models… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: 12 pages, accepted at EMNLP2020

  42. arXiv:2005.00513  [pdf, other

    cs.CL

    Discourse-Aware Unsupervised Summarization of Long Scientific Documents

    Authors: Yue Dong, Andrei Mircea, Jackie C. K. Cheung

    Abstract: We propose an unsupervised graph-based ranking model for extractive summarization of long scientific documents. Our method assumes a two-level hierarchical graph representation of the source document, and exploits asymmetrical positional cues to determine sentence importance. Results on the PubMed and arXiv datasets show that our approach outperforms strong unsupervised baselines by wide margins i… ▽ More

    Submitted 13 January, 2021; v1 submitted 1 May, 2020; originally announced May 2020.

    Comments: 9 pages, 3 figures, EACL 2021

  43. arXiv:1911.13280  [pdf, other

    cs.CL cs.LG

    Deconstructing and reconstructing word embedding algorithms

    Authors: Edward Newell, Kian Kenyon-Dean, Jackie Chi Kit Cheung

    Abstract: Uncontextualized word embeddings are reliable feature representations of words used to obtain high quality results for various NLP applications. Given the historical success of word embeddings in NLP, we propose a retrospective on some of the most well-known word embedding algorithms. In this work, we deconstruct Word2vec, GloVe, and others, into a common form, unveiling some of the necessary and… ▽ More

    Submitted 29 November, 2019; originally announced November 2019.

    Comments: 15 pages

  44. Can a Gorilla Ride a Camel? Learning Semantic Plausibility from Text

    Authors: Ian Porada, Kaheer Suleman, Jackie Chi Kit Cheung

    Abstract: Modeling semantic plausibility requires commonsense knowledge about the world and has been used as a testbed for exploring various knowledge representations. Previous work has focused specifically on modeling physical plausibility and shown that distributional methods fail when tested in a supervised setting. At the same time, distributional models, namely large pretrained language models, have le… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

    Comments: Accepted at COIN@EMNLP 2019

    Journal ref: Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing. (2019) 123-129

  45. arXiv:1911.03976  [pdf, other

    cs.LG cs.CL stat.ML

    On Posterior Collapse and Encoder Feature Dispersion in Sequence VAEs

    Authors: Teng Long, Yanshuai Cao, Jackie Chi Kit Cheung

    Abstract: Variational autoencoders (VAEs) hold great potential for modelling text, as they could in theory separate high-level semantic and syntactic properties from local regularities of natural language. Practically, however, VAEs with autoregressive decoders often suffer from posterior collapse, a phenomenon where the model learns to ignore the latent variables, causing the sequence VAE to degenerate int… ▽ More

    Submitted 10 November, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

  46. arXiv:1909.04028  [pdf, other

    cs.CL cs.LG

    Countering the Effects of Lead Bias in News Summarization via Multi-Stage Training and Auxiliary Losses

    Authors: Matt Grenander, Yue Dong, Jackie Chi Kit Cheung, Annie Louis

    Abstract: Sentence position is a strong feature for news summarization, since the lead often (but not always) summarizes the key points of the article. In this paper, we show that recent neural systems excessively exploit this trend, which although powerful for many inputs, is also detrimental when summarizing documents where important content should be extracted from later parts of the article. We propose… ▽ More

    Submitted 8 September, 2019; originally announced September 2019.

    Comments: 5 pages, accepted at EMNLP 2019

  47. arXiv:1909.01528  [pdf, ps, other

    cs.CL cs.LG

    Referring Expression Generation Using Entity Profiles

    Authors: Meng Cao, Jackie Chi Kit Cheung

    Abstract: Referring Expression Generation (REG) is the task of generating contextually appropriate references to entities. A limitation of existing REG systems is that they rely on entity-specific supervised training, which means that they cannot handle entities not seen during training. In this study, we address this in two ways. First, we propose task setups in which we specifically test a REG system's ab… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

  48. arXiv:1906.08104  [pdf, other

    cs.CL

    EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing

    Authors: Yue Dong, Zichao Li, Mehdi Rezagholizadeh, Jackie Chi Kit Cheung

    Abstract: We present the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-interpreter approach. Most current neural sentence simplification systems are variants of sequence-to-sequence models adopted from machine translation. These methods learn to simplify sentences as a byproduct of the fact that they are trained on complex-simple sen… ▽ More

    Submitted 19 June, 2019; originally announced June 2019.

    Comments: 9 pages, 1 figure, accepted at ACL2019

  49. arXiv:1905.11975  [pdf, other

    cs.CL cs.LG

    On Variational Learning of Controllable Representations for Text without Supervision

    Authors: Peng Xu, Jackie Chi Kit Cheung, Yanshuai Cao

    Abstract: The variational autoencoder (VAE) can learn the manifold of natural images on certain datasets, as evidenced by meaningful interpolating or extrapolating in the continuous latent space. However, on discrete data such as text, it is unclear if unsupervised learning can discover similar latent space that allows controllable manipulation. In this work, we find that sequence VAEs trained on text fail… ▽ More

    Submitted 7 August, 2020; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: ICML 2020 Camera Ready. Previous title: Unsupervised Controllable Text Generation with Global Variation Discovery and Disentanglement

  50. arXiv:1905.11912  [pdf, other

    cs.CL

    A Cross-Domain Transferable Neural Coherence Model

    Authors: Peng Xu, Hamidreza Saghir, Jin Sung Kang, Teng Long, Avishek Joey Bose, Yanshuai Cao, Jackie Chi Kit Cheung

    Abstract: Coherence is an important aspect of text quality and is crucial for ensuring its readability. One important limitation of existing coherence models is that training on one domain does not easily generalize to unseen categories of text. Previous work advocates for generative models for cross-domain generalization, because for discriminative models, the space of incoherent sentence orderings to disc… ▽ More

    Submitted 9 July, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: Accepted at ACL 2019