Zum Hauptinhalt springen

Showing 1–16 of 16 results for author: Spangher, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13248  [pdf, other

    cs.CL

    Are Large Language Models Capable of Generating Human-Level Narratives?

    Authors: Yufei Tian, Tenghao Huang, Miri Liu, Derek Jiang, Alexander Spangher, Muhao Chen, Jonathan May, Nanyun Peng

    Abstract: This paper investigates the capability of LLMs in storytelling, focusing on narrative development and plot progression. We introduce a novel computational framework to analyze narratives through three discourse-level aspects: i) story arcs, ii) turning points, and iii) affective dimensions, including arousal and valence. By leveraging expert and automatic annotations, we uncover significant discre… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2311.09734  [pdf, other

    cs.CL

    Tracking the Newsworthiness of Public Documents

    Authors: Alexander Spangher, Emilio Ferrara, Ben Welsh, Nanyun Peng, Serdar Tumgoren, Jonathan May

    Abstract: Journalists must find stories in huge amounts of textual data (e.g. leaks, bills, press releases) as part of their jobs: determining when and why text becomes news can help us understand coverage patterns and help us build assistive tools. Yet, this is challenging because very few labelled links exist, language use between corpora is very different, and text may be covered for a variety of reasons… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 9 pages, 7 pages appendix

  3. arXiv:2306.17806  [pdf, other

    cs.CL cs.CV cs.LG

    Stay on topic with Classifier-Free Guidance

    Authors: Guillaume Sanchez, Honglu Fan, Alexander Spangher, Elad Levi, Pawan Sasanka Ammanamanchi, Stella Biderman

    Abstract: Classifier-Free Guidance (CFG) has recently emerged in text-to-image generation as a lightweight technique to encourage prompt-adherence in generations. In this work, we demonstrate that CFG can be used broadly as an inference-time technique in pure language modeling. We show that CFG (1) improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks: Q\&A, reasoning, c… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

  4. arXiv:2305.14904  [pdf, other

    cs.CL cs.AI cs.CY

    Identifying Informational Sources in News Articles

    Authors: Alexander Spangher, Nanyun Peng, Jonathan May, Emilio Ferrara

    Abstract: News articles are driven by the informational sources journalists use in reporting. Modeling when, how and why sources get used together in stories can help us better understand the information we consume and even help journalists with the task of producing it. In this work, we take steps toward this goal by constructing the largest and widest-ranging annotated dataset, to date, of informational s… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: 13 pages

  5. arXiv:2301.02299  [pdf, other

    cs.CL cs.AI cs.LG

    Sequentially Controlled Text Generation

    Authors: Alexander Spangher, Xinyu Hua, Yao Ming, Nanyun Peng

    Abstract: While GPT-2 generates sentences that are remarkably human-like, longer documents can ramble and do not follow human-like writing structure. We study the problem of imposing structure on long-range text. We propose a novel controlled text generation task, sequentially controlled text generation, and identify a dataset, NewsDiscourse as a starting point for this task. We develop a sequential control… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

    Comments: 19 pages. 10 pages main body, 3 pages references, 6 pages appendix

    Journal ref: Findings of the 2022 Conference on Empirical Methods in Natural Language Processing

  6. arXiv:2206.07115  [pdf, other

    cs.CL

    If it Bleeds, it Leads: A Computational Approach to Covering Crime in Los Angeles

    Authors: Alexander Spangher, Divya Choudhary

    Abstract: Developing and improving computational approaches to covering news can increase journalistic output and improve the way stories are covered. In this work we approach the problem of covering crime stories in Los Angeles. We present a machine-in-the-loop system that covers individual crimes by (1) learning the prototypical coverage archetypes from classical news articles on crime to learn their stru… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

  7. arXiv:2206.07106  [pdf, other

    cs.CL

    NewsEdits: A News Article Revision Dataset and a Document-Level Reasoning Challenge

    Authors: Alexander Spangher, Xiang Ren, Jonathan May, Nanyun Peng

    Abstract: News article revision histories provide clues to narrative and factual evolution in news articles. To facilitate analysis of this evolution, we present the first publicly available dataset of news revision histories, NewsEdits. Our dataset is large-scale and multilingual; it contains 1.2 million articles with 4.6 million versions from over 22 English- and French-language newspaper sources based in… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Journal ref: 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics

  8. arXiv:2205.12420  [pdf, other

    cs.CL

    Learning Action Conditions from Instructional Manuals for Instruction Understanding

    Authors: Te-Lin Wu, Caiqi Zhang, Qingyuan Hu, Alex Spangher, Nanyun Peng

    Abstract: The ability to infer pre- and postconditions of an action is vital for comprehending complex instructions, and is essential for applications such as autonomous instruction-guided agents and assistive AI that supports humans to perform physical tasks. In this work, we propose a task dubbed action condition inference, and collecting a high-quality, human annotated dataset of preconditions and postco… ▽ More

    Submitted 2 July, 2024; v1 submitted 24 May, 2022; originally announced May 2022.

  9. arXiv:2110.08486  [pdf, other

    cs.CL cs.CV

    Understanding Multimodal Procedural Knowledge by Sequencing Multimodal Instructional Manuals

    Authors: Te-Lin Wu, Alex Spangher, Pegah Alipoormolabashi, Marjorie Freedman, Ralph Weischedel, Nanyun Peng

    Abstract: The ability to sequence unordered events is an essential skill to comprehend and reason about real world task procedures, which often requires thorough understanding of temporal common sense and multimodal information, as these procedures are often communicated through a combination of texts and images. Such capability is essential for applications such as sequential task planning and multi-source… ▽ More

    Submitted 20 February, 2024; v1 submitted 16 October, 2021; originally announced October 2021.

    Comments: In Proceedings of the Conference of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), 2022

  10. arXiv:2104.10263  [pdf, other

    cs.CL cs.DL cs.HC

    StateCensusLaws.org: A Web Application for Consuming and Annotating Legal Discourse Learning

    Authors: Alexander Spangher, Jonathan May

    Abstract: In this work, we create a web application to highlight the output of NLP models trained to parse and label discourse segments in law text. Our system is built primarily with journalists and legal interpreters in mind, and we focus on state-level law that uses U.S. Census population numbers to allocate resources and organize government. Our system exposes a corpus we collect of 6,000 state-level… ▽ More

    Submitted 30 June, 2022; v1 submitted 20 April, 2021; originally announced April 2021.

  11. arXiv:2104.09656  [pdf, other

    cs.CL

    "Don't quote me on that": Finding Mixtures of Sources in News Articles

    Authors: Alexander Spangher, Nanyun Peng, Jonathan May, Emilio Ferrara

    Abstract: Journalists publish statements provided by people, or \textit{sources} to contextualize current events, help voters make informed decisions, and hold powerful individuals accountable. In this work, we construct an ontological labeling system for sources based on each source's \textit{affiliation} and \textit{role}. We build a probabilistic model to infer these attributes for named sources and to d… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

  12. arXiv:2104.09653  [pdf, other

    cs.CL cs.IR

    Modeling "Newsworthiness" for Lead-Generation Across Corpora

    Authors: Alexander Spangher, Nanyun Peng, Jonathan May, Emilio Ferrara

    Abstract: Journalists obtain "leads", or story ideas, by reading large corpora of government records: court cases, proposed bills, etc. However, only a small percentage of such records are interesting documents. We propose a model of "newsworthiness" aimed at surfacing interesting documents. We train models on automatically labeled corpora -- published newspaper articles -- to predict whether each article w… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

  13. arXiv:2104.09647  [pdf, other

    cs.CL cs.DL

    NewsEdits: A Dataset of Revision Histories for News Articles (Technical Report: Data Processing)

    Authors: Alexander Spangher, Jonathan May

    Abstract: News article revision histories have the potential to give us novel insights across varied fields of linguistics and social sciences. In this work, we present, to our knowledge, the first publicly available dataset of news article revision histories, or NewsEdits. Our dataset is multilingual; it contains 1,278,804 articles with 4,609,430 versions from over 22 English- and French-language newspap… ▽ More

    Submitted 30 June, 2022; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: 11 pages

  14. arXiv:2101.00389  [pdf, other

    cs.CL

    Multitask Learning for Class-Imbalanced Discourse Classification

    Authors: Alexander Spangher, Jonathan May, Sz-rung Shiang, Lingjia Deng

    Abstract: Small class-imbalanced datasets, common in many high-level semantic tasks like discourse analysis, present a particular challenge to current deep-learning architectures. In this work, we perform an extensive analysis on sentence-level classification approaches for the News Discourse dataset, one of the largest high-level semantic discourse datasets recently published. We show that a multitask appr… ▽ More

    Submitted 2 January, 2021; originally announced January 2021.

    Comments: 17 pages, 11 figures

  15. arXiv:1810.10033  [pdf, other

    cs.SI

    Analysis of Strategy and Spread of Russia-sponsored Content in the US in 2017

    Authors: Alexander Spangher, Gireeja Ranade, Besmira Nushi, Adam Fourney, Eric Horvitz

    Abstract: The Russia-based Internet Research Agency (IRA) carried out a broad information campaign in the U.S. before and after the 2016 presidential election. The organization created an expansive set of internet properties: web domains, Facebook pages, and Twitter bots, which received traffic via purchased Facebook ads, tweets, and search engines indexing their domains. We investigate the scope of IRA act… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

  16. Actionable Recourse in Linear Classification

    Authors: Berk Ustun, Alexander Spangher, Yang Liu

    Abstract: Machine learning models are increasingly used to automate decisions that affect humans - deciding who should receive a loan, a job interview, or a social service. In such applications, a person should have the ability to change the decision of a model. When a person is denied a loan by a credit score, for example, they should be able to alter its input variables in a way that guarantees approval.… ▽ More

    Submitted 8 November, 2019; v1 submitted 17 September, 2018; originally announced September 2018.

    Comments: Extended version. ACM Conference on Fairness, Accountability and Transparency [FAT2019]