Zum Hauptinhalt springen

Showing 1–19 of 19 results for author: Suleman, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2303.09092  [pdf, other

    cs.CL

    Challenges to Evaluating the Generalization of Coreference Resolution Models: A Measurement Modeling Perspective

    Authors: Ian Porada, Alexandra Olteanu, Kaheer Suleman, Adam Trischler, Jackie Chi Kit Cheung

    Abstract: It is increasingly common to evaluate the same coreference resolution (CR) model on multiple datasets. Do these multi-dataset evaluations allow us to draw meaningful conclusions about model generalization? Or, do they rather reflect the idiosyncrasies of a particular experimental setup (e.g., the specific datasets used)? To study this, we view evaluation through the lens of measurement modeling, a… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: ACL Findings 2024

  2. arXiv:2212.08192  [pdf, other

    cs.CL cs.LG

    The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources in Natural Language Understanding Systems

    Authors: Akshatha Arodi, Martin Pömsl, Kaheer Suleman, Adam Trischler, Alexandra Olteanu, Jackie Chi Kit Cheung

    Abstract: Many state-of-the-art natural language understanding (NLU) models are based on pretrained neural language models. These models often make inferences using information from multiple sources. An important class of such inferences are those that require both background knowledge, presumably contained in a model's pretrained parameters, and instance-specific information that is supplied at inference t… ▽ More

    Submitted 22 May, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: Accepted at ACL 2023. Code available at https://github.com/mpoemsl/kitmus

  3. arXiv:2205.06828  [pdf, other

    cs.CL cs.AI

    Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications

    Authors: Kaitlyn Zhou, Su Lin Blodgett, Adam Trischler, Hal Daumé III, Kaheer Suleman, Alexandra Olteanu

    Abstract: There are many ways to express similar things in text, which makes evaluating natural language generation (NLG) systems difficult. Compounding this difficulty is the need to assess varying quality criteria depending on the deployment setting. While the landscape of NLG evaluation has been well-mapped, practitioners' goals, assumptions, and constraints -- which inform decisions about what, when, an… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: Camera Ready for NAACL 2022 (Main Conference)

  4. arXiv:2110.00768  [pdf, other

    cs.CL

    TopiOCQA: Open-domain Conversational Question Answering with Topic Switching

    Authors: Vaibhav Adlakha, Shehzaad Dhuliawala, Kaheer Suleman, Harm de Vries, Siva Reddy

    Abstract: In a conversational question answering scenario, a questioner seeks to extract information about a topic through a series of interdependent questions and answers. As the conversation progresses, they may switch to related topics, a phenomenon commonly observed in information-seeking search sessions. However, current datasets for conversational question answering are limiting in two ways: 1) they d… ▽ More

    Submitted 20 February, 2022; v1 submitted 2 October, 2021; originally announced October 2021.

    Comments: accepted at TACL

  5. arXiv:2104.10247  [pdf, other

    cs.CL

    Modeling Event Plausibility with Consistent Conceptual Abstraction

    Authors: Ian Porada, Kaheer Suleman, Adam Trischler, Jackie Chi Kit Cheung

    Abstract: Understanding natural language requires common sense, one aspect of which is the ability to discern the plausibility of events. While distributional models -- most recently pre-trained, Transformer language models -- have demonstrated improvements in modeling event plausibility, their performance still falls short of humans'. In this work, we show that Transformer-based plausibility models are mar… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

    Comments: NAACL-HLT 2021

  6. arXiv:2011.04767  [pdf, other

    cs.CL cs.AI cs.LG

    An Analysis of Dataset Overlap on Winograd-Style Tasks

    Authors: Ali Emami, Adam Trischler, Kaheer Suleman, Jackie Chi Kit Cheung

    Abstract: The Winograd Schema Challenge (WSC) and variants inspired by it have become important benchmarks for common-sense reasoning (CSR). Model performance on the WSC has quickly progressed from chance-level to near-human using neural language models trained on massive corpora. In this paper, we analyze the effects of varying degrees of overlap between these training corpora and the test instances in WSC… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

    Comments: 11 pages with references, accepted at COLING 2020

    Journal ref: Coling2020

  7. Can a Gorilla Ride a Camel? Learning Semantic Plausibility from Text

    Authors: Ian Porada, Kaheer Suleman, Jackie Chi Kit Cheung

    Abstract: Modeling semantic plausibility requires commonsense knowledge about the world and has been used as a testbed for exploring various knowledge representations. Previous work has focused specifically on modeling physical plausibility and shown that distributional methods fail when tested in a supervised setting. At the same time, distributional models, namely large pretrained language models, have le… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

    Comments: Accepted at COIN@EMNLP 2019

    Journal ref: Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing. (2019) 123-129

  8. arXiv:1909.03716  [pdf, ps, other

    cs.CL

    Improving Neural Question Generation using World Knowledge

    Authors: Deepak Gupta, Kaheer Suleman, Mahmoud Adada, Andrew McNamara, Justin Harris

    Abstract: In this paper, we propose a method for incorporating world knowledge (linked entities and fine-grained entity types) into a neural question generation model. This world knowledge helps to encode additional information related to the entities present in the passage required to generate human-like questions. We evaluate our models on both SQuAD and MS MARCO to demonstrate the usefulness of the world… ▽ More

    Submitted 10 September, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

  9. arXiv:1908.04660  [pdf, other

    cs.CL

    Playing log(N)-Questions over Sentences

    Authors: Peter Potash, Kaheer Suleman

    Abstract: We propose a two-agent game wherein a questioner must be able to conjure discerning questions between sentences, incorporate responses from an answerer, and keep track of a hypothesis state. The questioner must be able to understand the information required to make its final guess, while also being able to reason over the game's text environment based on the answerer's responses. We experiment wit… ▽ More

    Submitted 13 August, 2019; originally announced August 2019.

    Comments: 5 pages

  10. arXiv:1811.01778  [pdf, ps, other

    cs.LG cs.AI cs.CL stat.ML

    How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG

    Authors: Paul Trichelair, Ali Emami, Adam Trischler, Kaheer Suleman, Jackie Chi Kit Cheung

    Abstract: Recent studies have significantly improved the state-of-the-art on common-sense reasoning (CSR) benchmarks like the Winograd Schema Challenge (WSC) and SWAG. The question we ask in this paper is whether improved performance on these benchmarks represents genuine progress towards common-sense-enabled systems. We make case studies of both benchmarks and design protocols that clarify and qualify the… ▽ More

    Submitted 10 September, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

    Comments: 7 pages

  11. arXiv:1811.01747  [pdf, ps, other

    cs.CL cs.IR cs.LG stat.ML

    The Knowref Coreference Corpus: Removing Gender and Number Cues for Difficult Pronominal Anaphora Resolution

    Authors: Ali Emami, Paul Trichelair, Adam Trischler, Kaheer Suleman, Hannes Schulz, Jackie Chi Kit Cheung

    Abstract: We introduce a new benchmark for coreference resolution and NLI, Knowref, that targets common-sense understanding and world knowledge. Previous coreference resolution tasks can largely be solved by exploiting the number and gender of the antecedents, or have been handcrafted and do not reflect the diversity of naturally occurring text. We present a corpus of over 8,000 annotated text passages with… ▽ More

    Submitted 13 June, 2019; v1 submitted 2 November, 2018; originally announced November 2018.

    Comments: 9 pages (excluding references), accepted for ACL 2019

  12. arXiv:1810.01375  [pdf, ps, other

    cs.CL

    A Knowledge Hunting Framework for Common Sense Reasoning

    Authors: Ali Emami, Noelia De La Cruz, Adam Trischler, Kaheer Suleman, Jackie Chi Kit Cheung

    Abstract: We introduce an automatic system that achieves state-of-the-art results on the Winograd Schema Challenge (WSC), a common sense reasoning task that requires diverse, complex forms of inference and knowledge. Our method uses a knowledge hunting module to gather text from the web, which serves as evidence for candidate problem resolutions. Given an input problem, our system generates relevant queries… ▽ More

    Submitted 2 October, 2018; originally announced October 2018.

    Comments: 10 pages, accepted at EMNLP 2018

  13. arXiv:1704.00057  [pdf, other

    cs.CL

    Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems

    Authors: Layla El Asri, Hannes Schulz, Shikhar Sharma, Jeremie Zumer, Justin Harris, Emery Fine, Rahul Mehrotra, Kaheer Suleman

    Abstract: This paper presents the Frames dataset (Frames is available at http://datasets.maluuba.com/Frames), a corpus of 1369 human-human dialogues with an average of 15 turns per dialogue. We developed this dataset to study the role of memory in goal-oriented dialogue systems. Based on Frames, we introduce a task called frame tracking, which extends state tracking to a setting where several states are tra… ▽ More

    Submitted 13 April, 2017; v1 submitted 31 March, 2017; originally announced April 2017.

  14. arXiv:1611.09830  [pdf, other

    cs.CL cs.AI

    NewsQA: A Machine Comprehension Dataset

    Authors: Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, Kaheer Suleman

    Abstract: We present NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles. We collect this dataset through a four-stage process designed to solicit exploratory questions that require reas… ▽ More

    Submitted 7 February, 2017; v1 submitted 29 November, 2016; originally announced November 2016.

  15. arXiv:1607.00070  [pdf, other

    cs.CL

    A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems

    Authors: Layla El Asri, Jing He, Kaheer Suleman

    Abstract: User simulation is essential for generating enough data to train a statistical spoken dialogue system. Previous models for user simulation suffer from several drawbacks, such as the inability to take dialogue history into account, the need of rigid structure to ensure coherent user behaviour, heavy dependence on a specific domain, the inability to output several user intentions during one dialogue… ▽ More

    Submitted 30 June, 2016; originally announced July 2016.

    Comments: Accepted for publication at Interspeech 2016

  16. arXiv:1606.03632  [pdf, other

    cs.CL

    Natural Language Generation in Dialogue using Lexicalized and Delexicalized Data

    Authors: Shikhar Sharma, Jing He, Kaheer Suleman, Hannes Schulz, Philip Bachman

    Abstract: Natural language generation plays a critical role in spoken dialogue systems. We present a new approach to natural language generation for task-oriented dialogue using recurrent neural networks in an encoder-decoder framework. In contrast to previous work, our model uses both lexicalized and delexicalized components i.e. slot-value pairs for dialogue acts, with slots and corresponding values align… ▽ More

    Submitted 21 April, 2017; v1 submitted 11 June, 2016; originally announced June 2016.

  17. arXiv:1606.03152  [pdf, other

    cs.CL cs.AI

    Policy Networks with Two-Stage Training for Dialogue Systems

    Authors: Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, Kaheer Suleman

    Abstract: In this paper, we propose to use deep policy networks which are trained with an advantage actor-critic method for statistically optimised dialogue systems. First, we show that, on summary state and action spaces, deep Reinforcement Learning (RL) outperforms Gaussian Processes methods. Summary state and action spaces lead to good performance but require pre-engineering effort, RL knowledge, and dom… ▽ More

    Submitted 12 September, 2016; v1 submitted 9 June, 2016; originally announced June 2016.

    Comments: SIGDial 2016 (Submitted: May 2016; Accepted: Jun 30, 2016)

    Journal ref: Proceedings of the SIGDIAL 2016 Conference, pages 101--110, Los Angeles, USA, 13-15 September 2016. Association for Computational Linguistics

  18. arXiv:1606.02270  [pdf, other

    cs.CL

    Natural Language Comprehension with the EpiReader

    Authors: Adam Trischler, Zheng Ye, Xingdi Yuan, Kaheer Suleman

    Abstract: We present the EpiReader, a novel model for machine comprehension of text. Machine comprehension of unstructured, real-world text is a major research goal for natural language processing. Current tests of machine comprehension pose questions whose answers can be inferred from some supporting text, and evaluate a model's response to the questions. The EpiReader is an end-to-end neural model compris… ▽ More

    Submitted 10 June, 2016; v1 submitted 7 June, 2016; originally announced June 2016.

    Comments: 8 pages plus references. Submitted to EMNLP 2016

  19. arXiv:1603.08884  [pdf, other

    cs.CL

    A Parallel-Hierarchical Model for Machine Comprehension on Sparse Data

    Authors: Adam Trischler, Zheng Ye, Xingdi Yuan, Jing He, Phillip Bachman, Kaheer Suleman

    Abstract: Understanding unstructured text is a major goal within natural language processing. Comprehension tests pose questions based on short text passages to evaluate such understanding. In this work, we investigate machine comprehension on the challenging {\it MCTest} benchmark. Partly because of its limited size, prior work on {\it MCTest} has focused mainly on engineering better features. We tackle th… ▽ More

    Submitted 29 March, 2016; originally announced March 2016.

    Comments: 9 pages, submitted to ACL

    MSC Class: I.2.7