Skip to main content

Showing 1–18 of 18 results for author: Cattan, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16086  [pdf, other

    cs.CL

    SEAM: A Stochastic Benchmark for Multi-Document Tasks

    Authors: Gili Lior, Avi Caciularu, Arie Cattan, Shahar Levy, Ori Shapira, Gabriel Stanovsky

    Abstract: Various tasks, such as summarization, multi-hop question answering, or coreference resolution, are naturally phrased over collections of real-world documents. Such tasks present a unique set of challenges, revolving around the lack of coherent narrative structure across documents, which often leads to contradiction, omission, or repetition of information. Despite their real-world application and c… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2406.13632  [pdf, other

    cs.CL

    Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations

    Authors: Arie Cattan, Alon Jacovi, Alex Fabrikant, Jonathan Herzig, Roee Aharoni, Hannah Rashkin, Dror Marcus, Avinatan Hassidim, Yossi Matias, Idan Szpektor, Avi Caciularu

    Abstract: Despite recent advancements in Large Language Models (LLMs), their performance on tasks involving long contexts remains sub-optimal. In-Context Learning (ICL) with few-shot examples may be an appealing solution to enhance LLM performance in this scenario; However, naively adding ICL examples with long context introduces challenges, including substantial token overhead added for each few-shot examp… ▽ More

    Submitted 23 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  3. arXiv:2403.17104  [pdf, other

    cs.CL

    Attribute First, then Generate: Locally-attributable Grounded Text Generation

    Authors: Aviv Slobodkin, Eran Hirsch, Arie Cattan, Tal Schuster, Ido Dagan

    Abstract: Recent efforts to address hallucinations in Large Language Models (LLMs) have focused on attributed text generation, which supplements generated texts with citations of supporting sources for post-generation fact-checking and corrections. Yet, these citations often point to entire documents or paragraphs, burdening users with extensive verification work. In this paper, we introduce a locally-attri… ▽ More

    Submitted 4 July, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: ACL 2024

  4. arXiv:2311.11301  [pdf, other

    cs.CL

    CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies

    Authors: Arie Cattan, Tom Hope, Doug Downey, Roy Bar-Haim, Lilach Eden, Yoav Kantor, Ido Dagan

    Abstract: Various NLP tasks require a complex hierarchical structure over nodes, where each node is a cluster of items. Examples include generating entailment graphs, hierarchical cross-document coreference resolution, annotating event and subevent relations, etc. To enable efficient annotation of such hierarchical structures, we release CHAMP, an open source tool allowing to incrementally construct both cl… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023

  5. arXiv:2306.03853  [pdf, other

    cs.CL

    From Key Points to Key Point Hierarchy: Structured and Expressive Opinion Summarization

    Authors: Arie Cattan, Lilach Eden, Yoav Kantor, Roy Bar-Haim

    Abstract: Key Point Analysis (KPA) has been recently proposed for deriving fine-grained insights from collections of textual comments. KPA extracts the main points in the data as a list of concise sentences or phrases, termed key points, and quantifies their prevalence. While key points are more expressive than word clouds and key phrases, making sense of a long, flat list of key points, which often express… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: ACL 2023

  6. arXiv:2302.08464  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating and Improving the Coreference Capabilities of Machine Translation Models

    Authors: Asaf Yehudai, Arie Cattan, Omri Abend, Gabriel Stanovsky

    Abstract: Machine translation (MT) requires a wide range of linguistic capabilities, which current end-to-end models are expected to learn implicitly by observing aligned sentences in bilingual corpora. In this work, we ask: \emph{How well do MT models learn coreference resolution from implicit signal?} To answer this question, we develop an evaluation methodology that derives coreference clusters from MT o… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Comments: EACL paper

  7. arXiv:2210.12688  [pdf, other

    cs.CL

    How "Multi" is Multi-Document Summarization?

    Authors: Ruben Wolhandler, Arie Cattan, Ori Ernst, Ido Dagan

    Abstract: The task of multi-document summarization (MDS) aims at models that, given multiple documents as input, are able to generate a summary that combines disperse information, originally spread across these documents. Accordingly, it is expected that both reference summaries in MDS datasets, as well as system summaries, would indeed be based on such dispersed information. In this paper, we argue for qua… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  8. arXiv:2209.04280  [pdf, other

    cs.CL

    F-coref: Fast, Accurate and Easy to Use Coreference Resolution

    Authors: Shon Otmazgin, Arie Cattan, Yoav Goldberg

    Abstract: We introduce fastcoref, a python package for fast, accurate, and easy-to-use English coreference resolution. The package is pip-installable, and allows two modes: an accurate mode based on the LingMess architecture, providing state-of-the-art coreference accuracy, and a substantially faster model, F-coref, which is the focus of this work. F-coref allows to process 2.8K OntoNotes documents in 25 se… ▽ More

    Submitted 25 October, 2022; v1 submitted 9 September, 2022; originally announced September 2022.

    Comments: AACL 2022

  9. arXiv:2205.12644  [pdf, other

    cs.CL

    LingMess: Linguistically Informed Multi Expert Scorers for Coreference Resolution

    Authors: Shon Otmazgin, Arie Cattan, Yoav Goldberg

    Abstract: While coreference resolution typically involves various linguistic challenges, recent models are based on a single pairwise scorer for all types of pairs. We present LingMess, a new coreference model that defines different categories of coreference cases and optimize multiple pairwise scorers, where each scorer learns a specific set of linguistic challenges. Our model substantially improves pairwi… ▽ More

    Submitted 10 February, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: EACL 2023

  10. arXiv:2109.11621  [pdf, other

    cs.CL

    iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration

    Authors: Eran Hirsch, Alon Eirew, Ori Shapira, Avi Caciularu, Arie Cattan, Ori Ernst, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Ido Dagan

    Abstract: We introduce iFacetSum, a web application for exploring topical document sets. iFacetSum integrates interactive summarization together with faceted search, by providing a novel faceted navigation scheme that yields abstractive summaries for the user's selections. This approach offers both a comprehensive overview as well as concise details regarding subtopics of choice. Fine-grained facets are aut… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

    Comments: Proceedings of EMNLP 2021, System Demonstrations. 7 pages and an appendix

  11. arXiv:2106.04192  [pdf, other

    cs.CL

    Realistic Evaluation Principles for Cross-document Coreference Resolution

    Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan

    Abstract: We point out that common evaluation practices for cross-document coreference resolution have been unrealistically permissive in their assumed settings, yielding inflated results. We propose addressing this issue via two evaluation methodology principles. First, as in other tasks, models should be evaluated on predicted mentions rather than on gold mentions. Doing this raises a subtle issue regardi… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: *SEM 2021

  12. arXiv:2106.01210  [pdf, other

    cs.CL

    Cross-document Coreference Resolution over Predicted Mentions

    Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan

    Abstract: Coreference resolution has been mostly investigated within a single document scope, showing impressive progress in recent years based on end-to-end models. However, the more challenging task of cross-document (CD) coreference resolution remained relatively under-explored, with the few recent models applied only to gold mentions. Here, we introduce the first end-to-end model for CD coreference reso… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: Findings of ACL 2021

  13. arXiv:2104.08809  [pdf, other

    cs.CL cs.IR cs.LG

    SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts

    Authors: Arie Cattan, Sophie Johnson, Daniel Weld, Ido Dagan, Iz Beltagy, Doug Downey, Tom Hope

    Abstract: Determining coreference of concept mentions across multiple documents is a fundamental task in natural language understanding. Previous work on cross-document coreference resolution (CDCR) typically considers mentions of events in the news, which seldom involve abstract technical concepts that are prevalent in science and technology. These complex concepts take diverse or ambiguous forms and have… ▽ More

    Submitted 1 September, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: Accepted to AKBC 2021. Data and code available at https://scico.apps.allenai.org/

  14. arXiv:2104.05022  [pdf, other

    cs.CL

    WEC: Deriving a Large-scale Cross-document Event Coreference dataset from Wikipedia

    Authors: Alon Eirew, Arie Cattan, Ido Dagan

    Abstract: Cross-document event coreference resolution is a foundational task for NLP applications involving multi-text processing. However, existing corpora for this task are scarce and relatively small, while annotating only modest-size clusters of documents belonging to the same topic. To complement these resources and enhance future research, we present Wikipedia Event Coreference (WEC), an efficient met… ▽ More

    Submitted 30 April, 2021; v1 submitted 11 April, 2021; originally announced April 2021.

    Comments: NAACL 2021

  15. arXiv:2101.12637  [pdf, other

    cs.CL

    CD2CR: Co-reference Resolution Across Documents and Domains

    Authors: James Ravenscroft, Arie Cattan, Amanda Clare, Ido Dagan, Maria Liakata

    Abstract: Cross-document co-reference resolution (CDCR) is the task of identifying and linking mentions to entities and concepts across many text documents. Current state-of-the-art models for this task assume that all documents are of the same type (e.g. news articles) or fall under the same theme. However, it is also desirable to perform CDCR across different domains (type or theme). A particular use case… ▽ More

    Submitted 29 January, 2021; originally announced January 2021.

    Comments: 9 pages, 5 figures, accepted at EACL 2021

    ACM Class: I.2.7

  16. arXiv:2101.00406  [pdf, other

    cs.CL

    CDLM: Cross-Document Language Modeling

    Authors: Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew E. Peters, Arie Cattan, Ido Dagan

    Abstract: We introduce a new pretraining approach geared for multi-document language modeling, incorporating two key ideas into the masked language modeling self-supervised objective. First, instead of considering documents in isolation, we pretrain over sets of multiple related documents, encouraging the model to learn cross-document relationships. Second, we improve over recent long-range transformers by… ▽ More

    Submitted 2 September, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: EMNLP 2021, findings

  17. arXiv:2010.02588  [pdf, other

    cs.CL

    CoRefi: A Crowd Sourcing Suite for Coreference Annotation

    Authors: Aaron Bornstein, Arie Cattan, Ido Dagan

    Abstract: Coreference annotation is an important, yet expensive and time consuming, task, which often involved expert annotators trained on complex decision guidelines. To enable cheaper and more efficient annotation, we present CoRefi, a web-based coreference annotation suite, oriented for crowdsourcing. Beyond the core coreference annotation tool, CoRefi provides guided onboarding for the task as well as… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020 system demonstration paper

  18. arXiv:2009.11032  [pdf, other

    cs.CL

    Streamlining Cross-Document Coreference Resolution: Evaluation and Modeling

    Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan

    Abstract: Recent evaluation protocols for Cross-document (CD) coreference resolution have often been inconsistent or lenient, leading to incomparable results across works and overestimation of performance. To facilitate proper future research on this task, our primary contribution is proposing a pragmatic evaluation methodology which assumes access to only raw text -- rather than assuming gold mentions, dis… ▽ More

    Submitted 23 October, 2020; v1 submitted 23 September, 2020; originally announced September 2020.