Zum Hauptinhalt springen

Showing 1–12 of 12 results for author: Pfau, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15518  [pdf, other

    cs.CL cs.LG

    Steering Without Side Effects: Improving Post-Deployment Control of Language Models

    Authors: Asa Cooper Stickland, Alexander Lyzhov, Jacob Pfau, Salsabila Mahdi, Samuel R. Bowman

    Abstract: Language models (LMs) have been shown to behave unexpectedly post-deployment. For example, new jailbreaks continually arise, allowing model misuse, despite extensive red-teaming and adversarial training from developers. Given most model queries are unproblematic and frequent retraining results in unstable user experience, methods for mitigation of worst-case behavior should be targeted. One such m… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2404.15758  [pdf, other

    cs.CL cs.AI

    Let's Think Dot by Dot: Hidden Computation in Transformer Language Models

    Authors: Jacob Pfau, William Merrill, Samuel R. Bowman

    Abstract: Chain-of-thought responses from language models improve performance across most benchmarks. However, it remains unclear to what extent these performance gains can be attributed to human-like task decomposition or simply the greater computation that additional tokens allow. We show that transformers can use meaningless filler tokens (e.g., '......') in place of a chain of thought to solve two hard… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 17 pages, 10 figures

    ACM Class: I.2.6

  3. arXiv:2310.13439  [pdf, other

    cs.CL cs.AI

    Self-Consistency of Large Language Models under Ambiguity

    Authors: Henning Bartsch, Ole Jorgensen, Domenic Rosati, Jason Hoelscher-Obermaier, Jacob Pfau

    Abstract: Large language models (LLMs) that do not give consistent answers across contexts are problematic when used for tasks with expectations of consistency, e.g., question-answering, explanations, etc. Our work presents an evaluation benchmark for self-consistency in cases of under-specification where two or more answers can be correct. We conduct a series of behavioral experiments on the OpenAI model s… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: BlackboxNLP @ EMNLP 2023

  4. arXiv:2309.13214  [pdf, ps, other

    cs.AI cs.HC cs.LG

    Assessing the Impact of Personality on Affective States from Video Game Communication

    Authors: Atieh Kashani, Johannes Pfau, Magy Seif El-Nasr

    Abstract: Individual differences in personality determine our preferences, traits and values, which should similarly hold for the way we express ourselves. With current advancements and transformations of technology and society, text-based communication has become ordinary and often even surpasses natural voice conversations -- with distinct challenges and opportunities. In this exploratory work, we investi… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  5. arXiv:2308.14224  [pdf, ps, other

    cs.AI cs.HC cs.LG

    Modeling Player Personality Factors from In-Game Behavior and Affective Expression

    Authors: Reza Habibi, Johannes Pfau, Magy Seif El-Nasr

    Abstract: Developing a thorough understanding of the target audience (and/or single individuals) is a key factor for success - which is exceptionally important and powerful for the domain of video games that can not only benefit from informed decision making during development, but ideally even tailor game content, difficulty and player experience while playing. The granular assessment of individual persona… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

  6. arXiv:2308.07576  [pdf, other

    cs.HC

    On Video Game Balancing: Joining Player- and Data-Driven Analytics

    Authors: Johannes Pfau, Magy Seif El-Nasr

    Abstract: Balancing is, especially among players, a highly debated topic of video games. Whether a game is sufficiently balanced greatly influences its reception, player satisfaction, churn rates and success. Yet, conceptions about the definition of balance diverge across industry, academia and players, and different understandings of designing balance can lead to worse player experiences than actual imbala… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: 25 pages, 5 figures

  7. arXiv:2307.15217  [pdf, other

    cs.AI cs.CL cs.LG

    Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

    Authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen , et al. (7 additional authors not shown)

    Abstract: Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and rel… ▽ More

    Submitted 11 September, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

  8. arXiv:2302.09070  [pdf, other

    cs.HC cs.AI

    Empathetic AI for Empowering Resilience in Games

    Authors: Reza Habibi, Johannes Pfau, Jonattan Holmes, Magy Seif El-Nasr

    Abstract: Failure and resilience are important aspects of gameplay. This is especially important for serious and competitive games, where players need to adapt and cope with failure frequently. In such situations, emotion regulation -- the active process of modulating ones' emotions to cope and adapt to challenging situations -- becomes essential. It is one of the prominent aspects of human intelligence and… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  9. arXiv:2207.13749  [pdf, other

    cs.HC

    Nutzungsverhalten und Funktionsanforderungen digitaler Trainingsanwendungen während der Pandemie

    Authors: Freya Pfau, Johannes Pfau, Bastian Dänekas, Robert Porzel, Rainer Malaka, Melanie Krüger

    Abstract: Due to contact restrictions, closure of fitness centers and quarantine measures, the SARS-CoV-2 pandemic led to a considerable decline of sporting activities. The first relaxation of these restrictions allowed German citizens to mostly return to their normal training and exercise behavior, yet the long-term impact of the recurring measures (i.e. the "Lockdown", "Lockdown light" as well as the "Cor… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: in German language

  10. arXiv:2105.14111  [pdf, other

    cs.LG cs.AI

    Goal Misgeneralization in Deep Reinforcement Learning

    Authors: Lauro Langosco, Jack Koch, Lee Sharkey, Jacob Pfau, Laurent Orseau, David Krueger

    Abstract: We study goal misgeneralization, a type of out-of-distribution generalization failure in reinforcement learning (RL). Goal misgeneralization failures occur when an RL agent retains its capabilities out-of-distribution yet pursues the wrong goal. For instance, an agent might continue to competently avoid obstacles, but navigate to the wrong place. In contrast, previous works have typically focused… ▽ More

    Submitted 9 January, 2023; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: Published in ICML 2022. 9 Pages

  11. arXiv:2104.02768  [pdf, other

    stat.ML cs.AI cs.CV cs.LG

    Robust Semantic Interpretability: Revisiting Concept Activation Vectors

    Authors: Jacob Pfau, Albert T. Young, Jerome Wei, Maria L. Wei, Michael J. Keiser

    Abstract: Interpretability methods for image classification assess model trustworthiness by attempting to expose whether the model is systematically biased or attending to the same cues as a human would. Saliency methods for feature attribution dominate the interpretability literature, but these methods do not address semantic concepts such as the textures, colors, or genders of objects within an image. Our… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: ICML WHI 2020

  12. arXiv:1910.07604  [pdf, other

    cs.CV cs.LG

    Global Saliency: Aggregating Saliency Maps to Assess Dataset Artefact Bias

    Authors: Jacob Pfau, Albert T. Young, Maria L. Wei, Michael J. Keiser

    Abstract: In high-stakes applications of machine learning models, interpretability methods provide guarantees that models are right for the right reasons. In medical imaging, saliency maps have become the standard tool for determining whether a neural model has learned relevant robust features, rather than artefactual noise. However, saliency maps are limited to local model explanation because they interpre… ▽ More

    Submitted 3 December, 2019; v1 submitted 16 October, 2019; originally announced October 2019.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract