Zum Hauptinhalt springen

Showing 1–30 of 30 results for author: Hase, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.14414  [pdf, other

    cs.AI cs.CL cs.LG

    System-1.x: Learning to Balance Fast and Slow Planning with Language Models

    Authors: Swarnadeep Saha, Archiki Prasad, Justin Chih-Yao Chen, Peter Hase, Elias Stengel-Eskin, Mohit Bansal

    Abstract: Language models can be used to solve long-horizon planning problems in two distinct modes: a fast 'System-1' mode, directly generating plans without any explicit search or backtracking, and a slow 'System-2' mode, planning step-by-step by explicitly searching over possible actions. While System-2 is typically more effective, it is also more computationally expensive, making it infeasible for long… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 29 pages (10 tables)

  2. arXiv:2406.19354  [pdf, other

    cs.CL cs.AI

    Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?

    Authors: Peter Hase, Thomas Hofweber, Xiang Zhou, Elias Stengel-Eskin, Mohit Bansal

    Abstract: The model editing problem concerns how language models should learn new facts about the world over time. While empirical research on model editing has drawn widespread attention, the conceptual foundations of model editing remain shaky -- perhaps unsurprisingly, since model editing is essentially belief revision, a storied problem in philosophy that has eluded succinct solutions for decades. Model… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 23 pages, 4 figures

  3. arXiv:2406.03442  [pdf, ps, other

    cs.CL cs.AI

    Are language models rational? The case of coherence norms and belief revision

    Authors: Thomas Hofweber, Peter Hase, Elias Stengel-Eskin, Mohit Bansal

    Abstract: Do norms of rationality apply to machine learning models, in particular language models? In this paper we investigate this question by focusing on a special subset of rational norms: coherence norms. We consider both logical coherence norms as well as coherence norms tied to the strength of belief. To make sense of the latter, we introduce the Minimal Assent Connection (MAC) and propose a new acco… ▽ More

    Submitted 10 August, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: added discussion and cross reference of new empirical work by the authors, updated references, fixed typos

  4. arXiv:2405.21028  [pdf, other

    cs.CL cs.AI

    LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models

    Authors: Elias Stengel-Eskin, Peter Hase, Mohit Bansal

    Abstract: When answering questions, LLMs can convey not only an answer, but a level of confidence about the answer being correct. This includes explicit confidence markers (e.g. giving a numeric score) as well as implicit markers, like an authoritative tone or elaborating with additional knowledge. For LLMs to be trustworthy knowledge sources, the confidence they convey should match their actual expertise;… ▽ More

    Submitted 3 July, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: 18 pages. Code: https://github.com/esteng/pragmatic_calibration

  5. arXiv:2404.09932  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    Foundational Challenges in Assuring Alignment and Safety of Large Language Models

    Authors: Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi , et al. (13 additional authors not shown)

    Abstract: This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions.

    Submitted 15 April, 2024; originally announced April 2024.

  6. arXiv:2402.08787  [pdf, other

    cs.LG cs.CL

    Rethinking Machine Unlearning for Large Language Models

    Authors: Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Yuguang Yao, Chris Yuhao Liu, Xiaojun Xu, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, Yang Liu

    Abstract: We explore machine unlearning (MU) in the domain of large language models (LLMs), referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence (e.g., sensitive or illegal information) and the associated model capabilities, while maintaining the integrity of essential knowledge generation and not affecting causally unrelated information. We envision LLM unlearning bec… ▽ More

    Submitted 14 July, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  7. arXiv:2401.06751  [pdf, other

    cs.CL cs.AI cs.LG

    The Unreasonable Effectiveness of Easy Training Data for Hard Tasks

    Authors: Peter Hase, Mohit Bansal, Peter Clark, Sarah Wiegreffe

    Abstract: How can we train models to perform well on hard test data when hard training data is by definition difficult to label correctly? This question has been termed the scalable oversight problem and has drawn increasing attention as language models have continually improved. In this paper, we present the surprising conclusion that current pretrained language models often generalize relatively well from… ▽ More

    Submitted 5 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: ACL 2024. 23 pages, 20 figures

  8. arXiv:2309.17410  [pdf, other

    cs.CL cs.AI cs.LG

    Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks

    Authors: Vaidehi Patil, Peter Hase, Mohit Bansal

    Abstract: Pretrained language models sometimes possess knowledge that we do not wish them to, including memorized personal information and knowledge that could be used to harm people. They can also output toxic or harmful text. To mitigate these safety and informational issues, we propose an attack-and-defense framework for studying the task of deleting sensitive information directly from model weights. We… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: Equal contribution from first two authors. 19 pages, 5 figures. Our code is available at: https://github.com/Vaidehi99/InfoDeletionAttacks

  9. arXiv:2308.03671  [pdf, other

    cs.DL cs.AI

    SemOpenAlex: The Scientific Landscape in 26 Billion RDF Triples

    Authors: Michael Färber, David Lamprecht, Johan Krause, Linn Aung, Peter Haase

    Abstract: We present SemOpenAlex, an extensive RDF knowledge graph that contains over 26 billion triples about scientific publications and their associated entities, such as authors, institutions, journals, and concepts. SemOpenAlex is licensed under CC0, providing free and open access to the data. We offer the data through multiple channels, including RDF dump files, a SPARQL endpoint, and as a data source… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: accepted at ISWC'23

  10. arXiv:2307.15217  [pdf, other

    cs.AI cs.CL cs.LG

    Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

    Authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen , et al. (7 additional authors not shown)

    Abstract: Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and rel… ▽ More

    Submitted 11 September, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

  11. arXiv:2306.09299  [pdf, other

    cs.CL cs.AI cs.LG

    Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Personalization

    Authors: Swarnadeep Saha, Peter Hase, Mohit Bansal

    Abstract: A hallmark property of explainable AI models is the ability to teach other agents, communicating knowledge of how to perform a task. While Large Language Models perform complex reasoning by generating explanations for their predictions, it is unclear whether they also make good teachers for weaker agents. To address this, we consider a student-teacher framework between two LLM agents and study if,… ▽ More

    Submitted 14 November, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 (23 pages, 12 figures). Our code is available at https://github.com/swarnaHub/ExplanationIntervention

  12. arXiv:2306.05963  [pdf, other

    cs.CV cs.AI cs.LG

    Adaptive Contextual Perception: How to Generalize to New Backgrounds and Ambiguous Objects

    Authors: Zhuofan Ying, Peter Hase, Mohit Bansal

    Abstract: Biological vision systems make adaptive use of context to recognize objects in new settings with novel contexts as well as occluded or blurry objects in familiar settings. In this paper, we investigate how vision models adaptively use context for out-of-distribution (OOD) generalization and leverage our analysis results to improve model OOD generalization. First, we formulate two distinct OOD sett… ▽ More

    Submitted 27 October, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: Published at NeurIPS 2023. 23 pages, 13 figures. Our code is available at https://github.com/zfying/AdaptiveContext

  13. arXiv:2301.04213  [pdf, other

    cs.LG cs.AI cs.CL

    Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models

    Authors: Peter Hase, Mohit Bansal, Been Kim, Asma Ghandeharioun

    Abstract: Language models learn a great quantity of factual information during pretraining, and recent work localizes this information to specific model weights like mid-layer MLP weights. In this paper, we find that we can change how a fact is stored in a model by editing weights that are in a different location than where existing methods suggest that the fact is stored. This is surprising because we woul… ▽ More

    Submitted 16 October, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

    Comments: NeurIPS 2023 (Spotlight). 26 pages, 22 figures

  14. arXiv:2211.07517  [pdf, other

    cs.CL cs.AI

    Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations

    Authors: Swarnadeep Saha, Peter Hase, Nazneen Rajani, Mohit Bansal

    Abstract: Recent work on explainable NLP has shown that few-shot prompting can enable large pretrained language models (LLMs) to generate grammatical and factual natural language explanations for data labels. In this work, we study the connection between explainability and sample hardness by investigating the following research question - "Are LLMs and humans equally good at explaining data labels for both… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: EMNLP 2022 (11 pages)

  15. arXiv:2209.10492  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees

    Authors: Swarnadeep Saha, Shiyue Zhang, Peter Hase, Mohit Bansal

    Abstract: Current abstractive summarization models either suffer from a lack of clear interpretability or provide incomplete rationales by only highlighting parts of the source document. To this end, we propose the Summarization Program (SP), an interpretable modular framework consisting of an (ordered) list of binary trees, each encoding the step-by-step generative process of an abstractive summary sentenc… ▽ More

    Submitted 1 February, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: ICLR 2023

  16. arXiv:2206.11212  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives

    Authors: Zhuofan Ying, Peter Hase, Mohit Bansal

    Abstract: Many past works aim to improve visual reasoning in models by supervising feature importance (estimated by model explanation techniques) with human annotations such as highlights of important image regions. However, recent work has shown that performance gains from feature importance (FI) supervision for Visual Question Answering (VQA) tasks persist even with random supervision, suggesting that the… ▽ More

    Submitted 25 October, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022 (first two authors contributed equally)

  17. arXiv:2204.04424  [pdf, other

    cs.LG cs.AI cs.CV cs.DC

    Adaptive Differential Filters for Fast and Communication-Efficient Federated Learning

    Authors: Daniel Becking, Heiner Kirchhoffer, Gerhard Tech, Paul Haase, Karsten Müller, Heiko Schwarz, Wojciech Samek

    Abstract: Federated learning (FL) scenarios inherently generate a large communication overhead by frequently transmitting neural network updates between clients and server. To minimize the communication cost, introducing sparsity in conjunction with differential updates is a commonly used technique. However, sparse model updates can slow down convergence speed or unintentionally skip certain update aspects,… ▽ More

    Submitted 9 April, 2022; originally announced April 2022.

    Comments: CVPR 2022 FedVision Workshop (CVPRW), 12 pages, 5 figures, 2 tables, supplementary material

  18. arXiv:2203.07281  [pdf, other

    cs.CL cs.AI cs.LG

    GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models

    Authors: Archiki Prasad, Peter Hase, Xiang Zhou, Mohit Bansal

    Abstract: Providing natural language instructions in prompts is a useful new paradigm for improving task performance of large language models in a zero-shot setting. Recent work has aimed to improve such prompts via manual rewriting or gradient-based tuning. However, manual rewriting is time-consuming and requires subjective interpretation, while gradient-based tuning can be extremely computationally demand… ▽ More

    Submitted 26 April, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

    Comments: EACL 2023 (20 pages)

  19. arXiv:2111.13654  [pdf, other

    cs.CL cs.AI cs.LG

    Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

    Authors: Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer

    Abstract: Do language models have beliefs about the world? Dennett (1995) famously argues that even thermostats have beliefs, on the view that a belief is simply an informational state decoupled from any motivational state. In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on meth… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

    Comments: 19 pages

  20. arXiv:2111.01235  [pdf, other

    cs.LG cs.AI cs.CL

    Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions

    Authors: Prateek Yadav, Peter Hase, Mohit Bansal

    Abstract: People affected by machine learning model decisions may benefit greatly from access to recourses, i.e. suggestions about what features they could change to receive a more favorable decision from the model. Current approaches try to optimize for the cost incurred by users when adopting a recourse, but they assume that all users share the same cost function. This is an unrealistic assumption because… ▽ More

    Submitted 21 February, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

    Comments: 27 pages

  21. arXiv:2106.00786  [pdf, other

    cs.LG cs.AI cs.CL

    The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations

    Authors: Peter Hase, Harry Xie, Mohit Bansal

    Abstract: Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time. For example, in the standard Sufficiency metric, only the top-k most important tokens are kept. In this paper, we study several under-explored dimensions of FI explanations, providing con… ▽ More

    Submitted 27 October, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 (25 pages)

  22. arXiv:2102.02201  [pdf, other

    cs.CL cs.AI cs.LG

    When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data

    Authors: Peter Hase, Mohit Bansal

    Abstract: Many methods now exist for conditioning model outputs on task instructions, retrieved documents, and user-provided explanations and feedback. Rather than relying solely on examples of task inputs and outputs, these approaches use valuable additional data for improving model correctness and aligning learned models with human priors. Meanwhile, a growing body of evidence suggests that some language… ▽ More

    Submitted 10 February, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

    Comments: 25 pages, 20 figures

  23. arXiv:2012.15781  [pdf, other

    cs.LG cs.AI cs.CL

    FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging

    Authors: Han Guo, Nazneen Fatema Rajani, Peter Hase, Mohit Bansal, Caiming Xiong

    Abstract: Influence functions approximate the "influences" of training data-points for test predictions and have a wide variety of applications. Despite the popularity, their computational cost does not scale well with model and training data size. We present FastIF, a set of simple modifications to influence functions that significantly improves their run-time. We use k-Nearest Neighbors (kNN) to narrow th… ▽ More

    Submitted 9 September, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: 18 pages

  24. arXiv:2010.04119  [pdf, other

    cs.CL cs.AI cs.LG

    Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?

    Authors: Peter Hase, Shiyue Zhang, Harry Xie, Mohit Bansal

    Abstract: Data collection for natural language (NL) understanding tasks has increasingly included human explanations alongside data points, allowing past works to introduce models that both perform a task and generate NL explanations for their outputs. Yet to date, model-generated explanations have been evaluated on the basis of surface-level similarities to human explanations, both through automatic metric… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020 Findings (17 pages)

  25. arXiv:2005.01831  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?

    Authors: Peter Hase, Mohit Bansal

    Abstract: Algorithmic approaches to interpreting machine learning models have proliferated in recent years. We carry out human subject tests that are the first of their kind to isolate the effect of algorithmic explanations on a key aspect of model interpretability, simulatability, while avoiding important confounding experimental factors. A model is simulatable when a person can predict its behavior on new… ▽ More

    Submitted 4 May, 2020; originally announced May 2020.

    Comments: ACL 2020 (13 pages)

  26. DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks

    Authors: Simon Wiedemann, Heiner Kirchoffer, Stefan Matlage, Paul Haase, Arturo Marban, Talmaj Marinc, David Neumann, Tung Nguyen, Ahmed Osman, Detlev Marpe, Heiko Schwarz, Thomas Wiegand, Wojciech Samek

    Abstract: The field of video compression has developed some of the most sophisticated and efficient compression algorithms known in the literature, enabling very high compressibility for little loss of information. Whilst some of these techniques are domain specific, many of their underlying principles are universal in that they can be adapted and applied for compressing different types of data. In this wor… ▽ More

    Submitted 27 July, 2019; originally announced July 2019.

  27. arXiv:1906.10651  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    Interpretable Image Recognition with Hierarchical Prototypes

    Authors: Peter Hase, Chaofan Chen, Oscar Li, Cynthia Rudin

    Abstract: Vision models are interpretable when they classify objects on the basis of features that a person can directly understand. Recently, methods relying on visual feature prototypes have been developed for this purpose. However, in contrast to how humans categorize objects, these approaches have not yet made use of any taxonomical organization of class labels. With such an approach, for instance, we m… ▽ More

    Submitted 24 August, 2019; v1 submitted 25 June, 2019; originally announced June 2019.

    Comments: Published as a full paper at HCOMP 2019

  28. arXiv:1905.08318  [pdf, other

    cs.LG cs.AI cs.IT

    DeepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression

    Authors: Simon Wiedemann, Heiner Kirchhoffer, Stefan Matlage, Paul Haase, Arturo Marban, Talmaj Marinc, David Neumann, Ahmed Osman, Detlev Marpe, Heiko Schwarz, Thomas Wiegand, Wojciech Samek

    Abstract: We present DeepCABAC, a novel context-adaptive binary arithmetic coder for compressing deep neural networks. It quantizes each weight parameter by minimizing a weighted rate-distortion function, which implicitly takes the impact of quantization on to the accuracy of the network into account. Subsequently, it compresses the quantized values into a bitstream representation with minimal redundancies.… ▽ More

    Submitted 15 May, 2019; originally announced May 2019.

    Comments: ICML 2019, Joint Workshop on On-Device Machine Learning and Compact Deep Neural Network Representations (ODML-CDNNR)

  29. arXiv:1811.05067  [pdf, ps, other

    cs.AI cs.CL

    Shall I Compare Thee to a Machine-Written Sonnet? An Approach to Algorithmic Sonnet Generation

    Authors: John Benhardt, Peter Hase, Liuyi Zhu, Cynthia Rudin

    Abstract: We provide an approach for generating beautiful poetry. Our sonnet-generation algorithm includes several novel elements that improve over the state of the art, leading to metrical, rhyming poetry with many human-like qualities. These novel elements include in-line punctuation, part of speech restrictions, and more appropriate training corpora. Our work is the winner of the 2018 PoetiX Literary Tur… ▽ More

    Submitted 10 September, 2019; v1 submitted 12 November, 2018; originally announced November 2018.

    Comments: 4 pages

  30. arXiv:1210.5403  [pdf, other

    cs.DB

    An Experience Report of Large Scale Federations

    Authors: Andreas Schwarte, Peter Haase, Michael Schmidt, Katja Hose, Ralf Schenkel

    Abstract: We present an experimental study of large-scale RDF federations on top of the Bio2RDF data sources, involving 29 data sets with more than four billion RDF triples deployed in a local federation. Our federation is driven by FedX, a highly optimized federation mediator for Linked Data. We discuss design decisions, technical aspects, and experiences made in setting up and optimizing the Bio2RDF feder… ▽ More

    Submitted 19 October, 2012; originally announced October 2012.

    ACM Class: H.2.3; H.2.4; H.3.4