Skip to main content

Showing 1–10 of 10 results for author: Scheurer, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04694  [pdf, other

    cs.CL cs.AI cs.LG

    Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

    Authors: Rudolf Laine, Bilal Chughtai, Jan Betley, Kaivalya Hariharan, Jeremy Scheurer, Mikita Balesni, Marius Hobbhahn, Alexander Meinke, Owain Evans

    Abstract: AI assistants such as ChatGPT are trained to respond to users by saying, "I am a large language model". This raises questions. Do such models know that they are LLMs and reliably act on this knowledge? Are they aware of their current circumstances, such as being deployed to the public? We refer to a model's knowledge of itself and its circumstances as situational awareness. To quantify situational… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 11 page main body, 98 page appendix, 58 figures

  2. arXiv:2401.14446  [pdf, other

    cs.CY cs.AI cs.CR

    Black-Box Access is Insufficient for Rigorous AI Audits

    Authors: Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell

    Abstract: External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workin… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: FAccT 2024

    Journal ref: The 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT '24), June 3-6, 2024, Rio de Janeiro, Brazil

  3. arXiv:2311.07590  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Models can Strategically Deceive their Users when Put Under Pressure

    Authors: Jérémy Scheurer, Mikita Balesni, Marius Hobbhahn

    Abstract: We demonstrate a situation in which Large Language Models, trained to be helpful, harmless, and honest, can display misaligned behavior and strategically deceive their users about this behavior without being instructed to do so. Concretely, we deploy GPT-4 as an agent in a realistic, simulated environment, where it assumes the role of an autonomous stock trading agent. Within this environment, the… ▽ More

    Submitted 15 July, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

  4. arXiv:2307.15217  [pdf, other

    cs.AI cs.CL cs.LG

    Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

    Authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen , et al. (7 additional authors not shown)

    Abstract: Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and rel… ▽ More

    Submitted 11 September, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

  5. arXiv:2303.16755  [pdf, other

    cs.CL cs.AI cs.LG

    Training Language Models with Language Feedback at Scale

    Authors: Jérémy Scheurer, Jon Ander Campos, Tomasz Korbak, Jun Shern Chan, Angelica Chen, Kyunghyun Cho, Ethan Perez

    Abstract: Pretrained language models often generate outputs that are not in line with human preferences, such as harmful text or factually incorrect summaries. Recent work approaches the above issues by learning from a simple form of human feedback: comparisons between pairs of model-generated outputs. However, comparison feedback only conveys limited information about human preferences. In this paper, we i… ▽ More

    Submitted 22 February, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

    Comments: Published in TMLR: https://openreview.net/forum?id=xo3hI5MwvU

  6. arXiv:2303.16749  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    Improving Code Generation by Training with Natural Language Feedback

    Authors: Angelica Chen, Jérémy Scheurer, Tomasz Korbak, Jon Ander Campos, Jun Shern Chan, Samuel R. Bowman, Kyunghyun Cho, Ethan Perez

    Abstract: The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development. We build upon this observation by formalizing an algorithm for learning from natural language feedback at training time instead, which we call Imitation learning from Language Feedback (ILF). ILF requires only a small amount of human-written feedbac… ▽ More

    Submitted 22 February, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

    Comments: Published in (and superceded by) TMLR: https://openreview.net/forum?id=xo3hI5MwvU

  7. arXiv:2208.01009  [pdf, other

    cs.CL cs.AI cs.LG

    Few-shot Adaptation Works with UnpredicTable Data

    Authors: Jun Shern Chan, Michael Pieler, Jonathan Jao, Jérémy Scheurer, Ethan Perez

    Abstract: Prior work on language models (LMs) shows that training on a large number of diverse tasks improves few-shot learning (FSL) performance on new tasks. We take this to the extreme, automatically extracting 413,299 tasks from internet tables - orders of magnitude more than the next-largest public datasets. Finetuning on the resulting dataset leads to improved FSL performance on Natural Language Proce… ▽ More

    Submitted 7 August, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

    Comments: Code at https://github.com/JunShern/few-shot-adaptation

  8. arXiv:2204.14146  [pdf, other

    cs.CL cs.AI cs.LG

    Training Language Models with Language Feedback

    Authors: Jérémy Scheurer, Jon Ander Campos, Jun Shern Chan, Angelica Chen, Kyunghyun Cho, Ethan Perez

    Abstract: Pretrained language models often do not perform tasks in ways that are in line with our preferences, e.g., generating offensive text or factually incorrect summaries. Recent work approaches the above issue by learning from a simple form of human evaluation: comparisons between pairs of model-generated task outputs. Comparison feedback conveys limited information about human preferences per human e… ▽ More

    Submitted 17 November, 2022; v1 submitted 29 April, 2022; originally announced April 2022.

    Comments: The First Workshop on Learning with Natural Language Supervision at ACL 2022

  9. arXiv:2202.04910  [pdf, other

    cs.LG math.OC

    Instance-wise algorithm configuration with graph neural networks

    Authors: Romeo Valentin, Claudio Ferrari, Jérémy Scheurer, Andisheh Amrollahi, Chris Wendler, Max B. Paulus

    Abstract: We present our submission for the configuration task of the Machine Learning for Combinatorial Optimization (ML4CO) NeurIPS 2021 competition. The configuration task is to predict a good configuration of the open-source solver SCIP to solve a mixed integer linear program (MILP) efficiently. We pose this task as a supervised learning problem: First, we compile a large dataset of the solver performan… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

    Comments: 5 pages, 3 figures

  10. Semantic Segmentation of Histopathological Slides for the Classification of Cutaneous Lymphoma and Eczema

    Authors: Jérémy Scheurer, Claudio Ferrari, Luis Berenguer Todo Bom, Michaela Beer, Werner Kempf, Luis Haug

    Abstract: Mycosis fungoides (MF) is a rare, potentially life threatening skin disease, which in early stages clinically and histologically strongly resembles Eczema, a very common and benign skin condition. In order to increase the survival rate, one needs to provide the appropriate treatment early on. To this end, one crucial step for specialists is the evaluation of histopathological slides (glass slides)… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

    Comments: Submitted to https://link.springer.com/chapter/10.1007/978-3-030-52791-4_3

    Journal ref: Papież B., Namburete A., Yaqub M., Noble J. (eds) Medical Image Understanding and Analysis. MIUA 2020. Communications in Computer and Information Science, vol 1248. Springer, Cham