Skip to main content

Showing 1–50 of 51 results for author: Krueger, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15753  [pdf, other

    cs.LG cs.AI stat.ML

    The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

    Authors: Lukas Fluri, Leon Lang, Alessandro Abate, Patrick Forré, David Krueger, Joar Skalse

    Abstract: In reinforcement learning, specifying reward functions that capture the intended task can be very challenging. Reward learning aims to address this issue by learning the reward function. However, a learned reward model may have a low error on the training distribution, and yet subsequently produce a policy with large regret. We say that such a reward model has an error-regret mismatch. The main so… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 58 pages, 1 figure

  2. arXiv:2406.15371  [pdf, ps, other

    cs.CY cs.AI

    Affirmative safety: An approach to risk management for high-risk AI

    Authors: Akash R. Wasil, Joshua Clymer, David Krueger, Emily Dardaman, Simeon Campos, Evan R. Murphy

    Abstract: Prominent AI experts have suggested that companies developing high-risk AI systems should be required to show that such systems are safe before they can be developed or deployed. The goal of this paper is to expand on this idea and explore its implications for risk management. We argue that entities developing or deploying high-risk AI systems should be required to present evidence of affirmative… ▽ More

    Submitted 14 April, 2024; originally announced June 2024.

  3. arXiv:2406.12137  [pdf, other

    cs.AI

    IDs for AI Systems

    Authors: Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de Witt, Nitarshan Rajkumar, Lewis Hammond, David Krueger, Lennart Heim, Markus Anderljung

    Abstract: AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. A user may not be able to verify whether a system has certain safety certifications. An investigator may not know whom to investigate when a system causes an incident. It may not be clear whom to contact to shut down a malfunctioning system. Across a number of… ▽ More

    Submitted 18 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Work-in-progress

  4. arXiv:2405.19550  [pdf, other

    cs.LG cs.CL

    Stress-Testing Capability Elicitation With Password-Locked Models

    Authors: Ryan Greenblatt, Fabien Roger, Dmitrii Krasheninnikov, David Krueger

    Abstract: To determine the safety of large language models (LLMs), AI developers must be able to assess their dangerous capabilities. But simple prompting strategies often fail to elicit an LLM's full capabilities. One way to elicit capabilities more robustly is to fine-tune the LLM to complete the task. In this paper, we investigate the conditions under which fine-tuning-based elicitation suffices to elici… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  5. arXiv:2404.09932  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    Foundational Challenges in Assuring Alignment and Safety of Large Language Models

    Authors: Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi , et al. (13 additional authors not shown)

    Abstract: This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions.

    Submitted 15 April, 2024; originally announced April 2024.

  6. arXiv:2403.10462  [pdf, other

    cs.CY cs.AI

    Safety Cases: How to Justify the Safety of Advanced AI Systems

    Authors: Joshua Clymer, Nick Gabrieli, David Krueger, Thomas Larsen

    Abstract: As AI systems become more advanced, companies and regulators will make difficult decisions about whether it is safe to train and deploy them. To prepare for these decisions, we investigate how developers could make a 'safety case,' which is a structured rationale that AI systems are unlikely to cause a catastrophe. We propose a framework for organizing a safety case and discuss four categories of… ▽ More

    Submitted 18 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  7. arXiv:2403.01946  [pdf, other

    cs.LG

    A Generative Model of Symmetry Transformations

    Authors: James Urquhart Allingham, Bruno Kacper Mlodozeniec, Shreyas Padhy, Javier Antorán, David Krueger, Richard E. Turner, Eric Nalisnick, José Miguel Hernández-Lobato

    Abstract: Correctly capturing the symmetry transformations of data can lead to efficient models with strong generalization capabilities, though methods incorporating symmetries often require prior knowledge. While recent advancements have been made in learning those symmetries directly from the dataset, most of this work has focused on the discriminative setting. In this paper, we take inspiration from grou… ▽ More

    Submitted 20 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  8. arXiv:2401.14446  [pdf, other

    cs.CY cs.AI cs.CR

    Black-Box Access is Insufficient for Rigorous AI Audits

    Authors: Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell

    Abstract: External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workin… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: FAccT 2024

    Journal ref: The 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT '24), June 3-6, 2024, Rio de Janeiro, Brazil

  9. arXiv:2401.13138  [pdf, other

    cs.CY cs.AI

    Visibility into AI Agents

    Authors: Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, Markus Anderljung

    Abstract: Increased delegation of commercial, scientific, governmental, and personal activities to AI agents -- systems capable of pursuing complex goals with limited supervision -- may exacerbate existing societal risks and introduce new risks. Understanding and mitigating these risks involves critically evaluating existing governance structures, revising and adapting these structures where needed, and ens… ▽ More

    Submitted 17 May, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted to ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT 2024)

  10. arXiv:2312.14751  [pdf, other

    cs.LG cs.CY

    Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

    Authors: Alan Chan, Ben Bucknall, Herbie Bradley, David Krueger

    Abstract: Public release of the weights of pretrained foundation models, otherwise known as downloadable access \citep{solaiman_gradient_2023}, enables fine-tuning without the prohibitive expense of pretraining. Our work argues that increasingly accessible fine-tuning of downloadable models may increase hazards. First, we highlight research to improve the accessibility of fine-tuning. We split our discussio… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted as a spotlight workshop paper at the Socially Responsible Language Modelling Research (SoLaR) workshop, held at NeurIPS 2023

  11. arXiv:2311.12786  [pdf, other

    cs.LG

    Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

    Authors: Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktäschel, David Scott Krueger

    Abstract: Fine-tuning large pre-trained models has become the de facto strategy for developing both task-specific and general-purpose machine learning systems, including developing models that are safe to deploy. Despite its clear importance, there has been minimal work that explains how fine-tuning alters the underlying capabilities learned by a model during pretraining: does fine-tuning yield entirely nov… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  12. arXiv:2310.17688  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Managing extreme AI risks amid rapid progress

    Authors: Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann

    Abstract: Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI's impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although rese… ▽ More

    Submitted 22 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: Published in Science: https://www.science.org/doi/10.1126/science.adn0117

  13. arXiv:2310.15047  [pdf, other

    cs.LG cs.AI

    Implicit meta-learning may lead language models to trust more reliable sources

    Authors: Dmitrii Krasheninnikov, Egor Krasheninnikov, Bruno Mlodozeniec, Tegan Maharaj, David Krueger

    Abstract: We demonstrate that LLMs may learn indicators of document usefulness and modulate their updates accordingly. We introduce random strings ("tags") as indicators of usefulness in a synthetic fine-tuning dataset. Fine-tuning on this dataset leads to implicit meta-learning (IML): in further fine-tuning, the model updates to make more use of text that is tagged as useful. We perform a thorough empirica… ▽ More

    Submitted 12 July, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  14. arXiv:2310.02743  [pdf, other

    cs.LG

    Reward Model Ensembles Help Mitigate Overoptimization

    Authors: Thomas Coste, Usman Anwar, Robert Kirk, David Krueger

    Abstract: Reinforcement learning from human feedback (RLHF) is a standard approach for fine-tuning large language models to follow instructions. As part of this process, learned reward models are used to approximately model human preferences. However, as imperfect representations of the "true" reward, these learned reward models are susceptible to overoptimization. Gao et al. (2023) studied this phenomenon… ▽ More

    Submitted 10 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted at ICLR 2024

  15. arXiv:2307.15217  [pdf, other

    cs.AI cs.CL cs.LG

    Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

    Authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen , et al. (7 additional authors not shown)

    Abstract: Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and rel… ▽ More

    Submitted 11 September, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

  16. arXiv:2307.14993  [pdf, other

    cs.AI cs.LG

    Thinker: Learning to Plan and Act

    Authors: Stephen Chung, Ivan Anokhin, David Krueger

    Abstract: We propose the Thinker algorithm, a novel approach that enables reinforcement learning agents to autonomously interact with and utilize a learned world model. The Thinker algorithm wraps the environment with a world model and introduces new actions designed for interacting with the world model. These model-interaction actions enable agents to perform planning by proposing alternative plans to the… ▽ More

    Submitted 26 October, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: 38 pages

    ACM Class: I.2.6; I.2.8; I.5.1

  17. arXiv:2304.09358  [pdf, other

    cs.CV cs.AI cs.LG

    Investigating the Nature of 3D Generalization in Deep Neural Networks

    Authors: Shoaib Ahmed Siddiqui, David Krueger, Thomas Breuel

    Abstract: Visual object recognition systems need to generalize from a set of 2D training views to novel views. The question of how the human visual system can generalize to novel views has been studied and modeled in psychology, computer vision, and neuroscience. Modern deep learning architectures for object recognition generalize well to novel views, but the mechanisms are not well understood. In this pape… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Comments: 15 pages, 15 figures, CVPR format

  18. arXiv:2303.09387  [pdf, other

    cs.CY

    Characterizing Manipulation from AI Systems

    Authors: Micah Carroll, Alan Chan, Henry Ashton, David Krueger

    Abstract: Manipulation is a common concern in many domains, such as social media, advertising, and chatbots. As AI systems mediate more of our interactions with the world, it is important to understand the degree to which AI systems might manipulate humans without the intent of the system designers. Our work clarifies challenges in defining and measuring manipulation in the context of AI systems. Firstly, w… ▽ More

    Submitted 30 October, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Presented at EAAMO 2023; The first two authors contributed equally; author order was decided with a coin flip

  19. arXiv:2303.06173  [pdf, other

    cs.LG cs.AI

    Unifying Grokking and Double Descent

    Authors: Xander Davies, Lauro Langosco, David Krueger

    Abstract: A principled understanding of generalization in deep learning may require unifying disparate observations under a single conceptual framework. Previous work has studied \emph{grokking}, a training dynamic in which a sustained period of near-perfect training performance and near-chance test performance is eventually followed by generalization, as well as the superficially similar \emph{double desce… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: ML Safety Workshop, 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  20. Harms from Increasingly Agentic Algorithmic Systems

    Authors: Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, Michelle Lin, Alex Mayhew, Katherine Collins, Maryam Molamohammadi, John Burden, Wanru Zhao, Shalaleh Rismani, Konstantinos Voudouris, Umang Bhatt, Adrian Weller, David Krueger, Tegan Maharaj

    Abstract: Research in Fairness, Accountability, Transparency, and Ethics (FATE) has established many sources and forms of algorithmic harm, in domains as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate the serious harms of these systems, particularly those disproportionately affecting marginalized communities. Despite these ongoing harms, new systems… ▽ More

    Submitted 11 May, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

    Comments: Accepted at FAccT 2023

  21. arXiv:2302.01647  [pdf, other

    cs.CV cs.AI cs.LG

    Blockwise Self-Supervised Learning at Scale

    Authors: Shoaib Ahmed Siddiqui, David Krueger, Yann LeCun, Stéphane Deny

    Abstract: Current state-of-the-art deep networks are all powered by backpropagation. In this paper, we explore alternatives to full backpropagation in the form of blockwise learning rules, leveraging the latest developments in self-supervised learning. We show that a blockwise pretraining procedure consisting of training independently the 4 main blocks of layers of a ResNet-50 with Barlow Twins' loss functi… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

  22. arXiv:2301.03652  [pdf, other

    cs.LG cs.AI

    On The Fragility of Learned Reward Functions

    Authors: Lev McKinney, Yawen Duan, David Krueger, Adam Gleave

    Abstract: Reward functions are notoriously difficult to specify, especially for tasks with complex goals. Reward learning approaches attempt to infer reward functions from human feedback and preferences. Prior works on reward learning have mainly focused on the performance of policies trained alongside the reward function. This practice, however, may fail to detect learned rewards that are not capable of tr… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

    Comments: 5 pages, 2 figures, presented at the NeurIPS Deep RL and ML Safety Workshops

  23. arXiv:2211.14827  [pdf, other

    cs.LG cs.AI stat.ML

    Domain Generalization for Robust Model-Based Offline Reinforcement Learning

    Authors: Alan Clark, Shoaib Ahmed Siddiqui, Robert Kirk, Usman Anwar, Stephen Chung, David Krueger

    Abstract: Existing offline reinforcement learning (RL) algorithms typically assume that training data is either: 1) generated by a known policy, or 2) of entirely unknown origin. We consider multi-demonstrator offline RL, a middle ground where we know which demonstrators generated each dataset, but make no assumptions about the underlying policies of the demonstrators. This is the most natural setting when… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

    Comments: Accepted to the NeurIPS 2022 Workshops on Distribution Shifts and Offline Reinforcement Learning

  24. arXiv:2211.08422  [pdf, other

    cs.LG cs.CV

    Mechanistic Mode Connectivity

    Authors: Ekdeep Singh Lubana, Eric J. Bigelow, Robert P. Dick, David Krueger, Hidenori Tanaka

    Abstract: We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved via training on a dataset are connected via simple paths of low loss. Specifically, we ask the following question: are minimizers that rely on different mechanisms for making their predictions connected via simple paths of low loss? We provide a definition of… ▽ More

    Submitted 1 June, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: ICML, 2023

  25. arXiv:2210.14891  [pdf, other

    cs.LG cs.AI

    Broken Neural Scaling Laws

    Authors: Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger

    Abstract: We present a smoothly broken power law functional form (that we refer to as a Broken Neural Scaling Law (BNSL)) that accurately models & extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as amount of compute used for training (or inference), number of model parameters, training dataset size, model input size, number of training steps, or… ▽ More

    Submitted 23 July, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Published as a conference paper at International Conference on Learning Representations (ICLR) 2023

    Journal ref: International Conference on Learning Representations (ICLR), 2023

  26. arXiv:2210.03150  [pdf, other

    cs.LG cs.AI

    Towards Out-of-Distribution Adversarial Robustness

    Authors: Adam Ibrahim, Charles Guille-Escuret, Ioannis Mitliagkas, Irina Rish, David Krueger, Pouya Bashivan

    Abstract: Adversarial robustness continues to be a major challenge for deep learning. A core issue is that robustness to one type of attack often fails to transfer to other attacks. While prior work establishes a theoretical trade-off in robustness against different $L_p$ norms, we show that there is potential for improvement against many commonly used attacks by adopting a domain generalisation approach. C… ▽ More

    Submitted 26 June, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: Version of NeurIPS 2023 submission

  27. arXiv:2209.13085  [pdf, other

    cs.LG stat.ML

    Defining and Characterizing Reward Hacking

    Authors: Joar Skalse, Nikolaus H. R. Howe, Dmitrii Krasheninnikov, David Krueger

    Abstract: We provide the first formal definition of reward hacking, a phenomenon where optimizing an imperfect proxy reward function, $\mathcal{\tilde{R}}$, leads to poor performance according to the true reward function, $\mathcal{R}$. We say that a proxy is unhackable if increasing the expected proxy return can never decrease the expected true return. Intuitively, it might be possible to create an unhacka… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

  28. arXiv:2209.10015  [pdf, other

    cs.LG cs.AI

    Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics

    Authors: Shoaib Ahmed Siddiqui, Nitarshan Rajkumar, Tegan Maharaj, David Krueger, Sara Hooker

    Abstract: Modern machine learning research relies on relatively few carefully curated datasets. Even in these datasets, and typically in `untidy' or raw data, practitioners are faced with significant issues of data quality and diversity which can be prohibitively labor intensive to address. Existing methods for dealing with these challenges tend to make strong assumptions about the particular issues at play… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

  29. arXiv:2204.14226  [pdf, other

    eess.IV cs.AI cs.CV cs.LG physics.med-ph

    Recommendations on test datasets for evaluating AI solutions in pathology

    Authors: André Homeyer, Christian Geißler, Lars Ole Schwen, Falk Zakrzewski, Theodore Evans, Klaus Strohmenger, Max Westphal, Roman David Bülow, Michaela Kargl, Aray Karjauv, Isidre Munné-Bertran, Carl Orge Retzlaff, Adrià Romero-López, Tomasz Sołtysiński, Markus Plass, Rita Carvalho, Peter Steinbach, Yu-Chia Lan, Nassim Bouteldja, David Haber, Mateo Rojas-Carulla, Alireza Vafaei Sadr, Matthias Kraft, Daniel Krüger, Rutger Fick , et al. (5 additional authors not shown)

    Abstract: Artificial intelligence (AI) solutions that automatically extract information from digital histology images have shown great promise for improving pathological diagnosis. Prior to routine use, it is important to evaluate their predictive performance and obtain regulatory approval. This assessment requires appropriate test datasets. However, compiling such datasets is challenging and specific recom… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

    Journal ref: Mod Pathol (2022)

  30. arXiv:2112.13734  [pdf, ps, other

    cs.CV

    Multi-Domain Balanced Sampling Improves Out-of-Distribution Generalization of Chest X-ray Pathology Prediction Models

    Authors: Enoch Tetteh, Joseph Viviano, Yoshua Bengio, David Krueger, Joseph Paul Cohen

    Abstract: Learning models that generalize under different distribution shifts in medical imaging has been a long-standing research challenge. There have been several proposals for efficient and robust visual representation learning among vision research practitioners, especially in the sensitive and critical biomedical domain. In this paper, we propose an idea for out-of-distribution generalization of chest… ▽ More

    Submitted 27 December, 2021; v1 submitted 27 December, 2021; originally announced December 2021.

    Comments: MED-NEURIPS 2021

  31. Filling gaps in trustworthy development of AI

    Authors: Shahar Avin, Haydn Belfield, Miles Brundage, Gretchen Krueger, Jasmine Wang, Adrian Weller, Markus Anderljung, Igor Krawczuk, David Krueger, Jonathan Lebensold, Tegan Maharaj, Noa Zilberman

    Abstract: The range of application of artificial intelligence (AI) is vast, as is the potential for harm. Growing awareness of potential risks from AI systems has spurred action to address those risks, while eroding confidence in AI systems and the organizations that develop them. A 2019 study found over 80 organizations that published and adopted "AI ethics principles'', and more have joined since. But the… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

    Journal ref: Science (2021) Vol 374, Issue 6573, pp. 1327-1329

  32. arXiv:2105.14111  [pdf, other

    cs.LG cs.AI

    Goal Misgeneralization in Deep Reinforcement Learning

    Authors: Lauro Langosco, Jack Koch, Lee Sharkey, Jacob Pfau, Laurent Orseau, David Krueger

    Abstract: We study goal misgeneralization, a type of out-of-distribution generalization failure in reinforcement learning (RL). Goal misgeneralization failures occur when an RL agent retains its capabilities out-of-distribution yet pursues the wrong goal. For instance, an agent might continue to competently avoid obstacles, but navigate to the wrong place. In contrast, previous works have typically focused… ▽ More

    Submitted 9 January, 2023; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: Published in ICML 2022. 9 Pages

  33. arXiv:2011.06709  [pdf, other

    cs.LG cs.AI stat.ML

    Active Reinforcement Learning: Observing Rewards at a Cost

    Authors: David Krueger, Jan Leike, Owain Evans, John Salvatier

    Abstract: Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0. The central question of ARL is how to quantify the long-term value of reward information. Even in multi-armed bandits, computing the value of this information is intractable and we have to rely on heuristics. We propose and evaluate sev… ▽ More

    Submitted 24 November, 2020; v1 submitted 12 November, 2020; originally announced November 2020.

    Comments: Originally appeared at the NeurIPS 2016 "Future of Interactive Learning Machines (FILM)" workshop

  34. arXiv:2009.09153  [pdf, other

    cs.LG cs.AI stat.ML

    Hidden Incentives for Auto-Induced Distributional Shift

    Authors: David Krueger, Tegan Maharaj, Jan Leike

    Abstract: Decisions made by machine learning systems have increasing influence on the world, yet it is common for machine learning algorithms to assume that no such influence exists. An example is the use of the i.i.d. assumption in content recommendation. In fact, the (choice of) content displayed can change users' perceptions and preferences, or even drive them away, causing a shift in the distribution of… ▽ More

    Submitted 18 September, 2020; originally announced September 2020.

  35. arXiv:2006.04948  [pdf, other

    cs.CY cs.AI cs.LG

    AI Research Considerations for Human Existential Safety (ARCHES)

    Authors: Andrew Critch, David Krueger

    Abstract: Framed in positive terms, this report examines how technical AI research might be steered in a manner that is more attentive to humanity's long-term prospects for survival as a species. In negative terms, we ask what existential risks humanity might face from AI development in the next century, and by what principles contemporary technical research might be directed to address those risks. A key… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

    MSC Class: 68T01 ACM Class: I.2.0

  36. arXiv:2004.07213  [pdf, ps, other

    cs.CY

    Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

    Authors: Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensold, Cullen O'Keefe, Mark Koren, Théo Ryffel, JB Rubinovitz, Tamay Besiroglu, Federica Carugati, Jack Clark, Peter Eckersley , et al. (34 additional authors not shown)

    Abstract: With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they… ▽ More

    Submitted 20 April, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

  37. arXiv:2003.00688  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Out-of-Distribution Generalization via Risk Extrapolation (REx)

    Authors: David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, Aaron Courville

    Abstract: Distributional shift is one of the major obstacles when transferring machine learning prediction systems from the lab to the real world. To tackle this problem, we assume that variation across training domains is representative of the variation we might encounter at test time, but also that shifts at test time may be more extreme in magnitude. In particular, we show that reducing differences in ri… ▽ More

    Submitted 25 February, 2021; v1 submitted 2 March, 2020; originally announced March 2020.

  38. arXiv:1811.07871  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Scalable agent alignment via reward modeling: a research direction

    Authors: Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg

    Abstract: One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions. Designing such reward functions is difficult in part because the user only has an implicit understanding of the task objective. This gives rise to the agent alignment problem: how do we create agents that behave in accordance with the user's intentions? We outline a high-leve… ▽ More

    Submitted 19 November, 2018; originally announced November 2018.

  39. arXiv:1806.07528  [pdf, other

    stat.ML cs.LG

    Uncertainty in Multitask Transfer Learning

    Authors: Alexandre Lacoste, Boris Oreshkin, Wonchang Chung, Thomas Boquet, Negar Rostamzadeh, David Krueger

    Abstract: Using variational Bayes neural networks, we develop an algorithm capable of accumulating knowledge into a prior from multiple different tasks. The result is a rich and meaningful prior capable of few-shot learning on new tasks. The posterior can go beyond the mean field approximation and yields good uncertainty on the performed experiments. Analysis on toy tasks shows that it can learn from signif… ▽ More

    Submitted 6 July, 2018; v1 submitted 19 June, 2018; originally announced June 2018.

  40. arXiv:1804.00779  [pdf, other

    cs.LG stat.ML

    Neural Autoregressive Flows

    Authors: Chin-Wei Huang, David Krueger, Alexandre Lacoste, Aaron Courville

    Abstract: Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via Masked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate… ▽ More

    Submitted 2 April, 2018; originally announced April 2018.

    Comments: 16 pages, 10 figures, 3 tables

  41. arXiv:1801.10308  [pdf, other

    cs.CL cs.LG

    Nested LSTMs

    Authors: Joel Ruben Antony Moniz, David Krueger

    Abstract: We propose Nested LSTMs (NLSTM), a novel RNN architecture with multiple levels of memory. Nested LSTMs add depth to LSTMs via nesting as opposed to stacking. The value of a memory cell in an NLSTM is computed by an LSTM cell, which has its own inner memory cell. Specifically, instead of computing the value of the (outer) memory cell as $c^{outer}_t = f_t \odot c_{t-1} + i_t \odot g_t$, NLSTM memor… ▽ More

    Submitted 31 January, 2018; originally announced January 2018.

    Comments: Accepted at ACML 2017

    Journal ref: Proceedings of the Ninth Asian Conference on Machine Learning, PMLR 77:530-544, 2017

  42. arXiv:1712.05016  [pdf, other

    stat.ML cs.LG

    Deep Prior

    Authors: Alexandre Lacoste, Thomas Boquet, Negar Rostamzadeh, Boris Oreshkin, Wonchang Chung, David Krueger

    Abstract: The recent literature on deep learning offers new tools to learn a rich probability distribution over high dimensional data such as images or sounds. In this work we investigate the possibility of learning the prior distribution over neural network parameters using such tools. Our resulting variational Bayes algorithm generalizes well to new tasks, even when very few training examples are provided… ▽ More

    Submitted 15 December, 2017; v1 submitted 13 December, 2017; originally announced December 2017.

    Comments: Workshop paper, Accepted at Bayesian Deep Learning workshop, NIPS 2017

  43. arXiv:1710.04759  [pdf, other

    stat.ML cs.AI cs.LG

    Bayesian Hypernetworks

    Authors: David Krueger, Chin-Wei Huang, Riashat Islam, Ryan Turner, Alexandre Lacoste, Aaron Courville

    Abstract: We study Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork $\h$ is a neural network which learns to transform a simple noise distribution, $p(\vecε) = \N(\vec 0,\mat I)$, to a distribution $q(\pp) := q(h(\vecε))$ over the parameters $\pp$ of another neural network (the "primary network")\@. We train $q$ with variational inference, us… ▽ More

    Submitted 24 April, 2018; v1 submitted 12 October, 2017; originally announced October 2017.

    Comments: David Krueger and Chin-Wei Huang contributed equally

  44. arXiv:1706.05394  [pdf, other

    stat.ML cs.LG

    A Closer Look at Memorization in Deep Networks

    Authors: Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien

    Abstract: We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. r… ▽ More

    Submitted 1 July, 2017; v1 submitted 16 June, 2017; originally announced June 2017.

    Comments: Appears in Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, and David Krueger contributed equally to this work

  45. arXiv:1606.01305  [pdf, other

    cs.NE cs.CL cs.LG

    Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

    Authors: David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron Courville, Chris Pal

    Abstract: We propose zoneout, a novel method for regularizing RNNs. At each timestep, zoneout stochastically forces some hidden units to maintain their previous values. Like dropout, zoneout uses random noise to train a pseudo-ensemble, improving generalization. But by preserving instead of dropping hidden units, gradient information and state information are more readily propagated through time, as in feed… ▽ More

    Submitted 22 September, 2017; v1 submitted 3 June, 2016; originally announced June 2016.

    Comments: David Krueger and Tegan Maharaj contributed equally to this work

  46. arXiv:1511.08400  [pdf, other

    cs.NE cs.CL cs.LG stat.ML

    Regularizing RNNs by Stabilizing Activations

    Authors: David Krueger, Roland Memisevic

    Abstract: We stabilize the activations of Recurrent Neural Networks (RNNs) by penalizing the squared distance between successive hidden states' norms. This penalty term is an effective regularizer for RNNs including LSTMs and IRNNs, improving performance on character-level language modeling and phoneme recognition, and outperforming weight noise and dropout. We achieve competitive performance (18.6\% PE… ▽ More

    Submitted 26 April, 2016; v1 submitted 26 November, 2015; originally announced November 2015.

  47. arXiv:1510.08949  [pdf, other

    cs.LG

    Testing Visual Attention in Dynamic Environments

    Authors: Philip Bachman, David Krueger, Doina Precup

    Abstract: We investigate attention as the active pursuit of useful information. This contrasts with attention as a mechanism for the attenuation of irrelevant information. We also consider the role of short-term memory, whose use is critical to any model incapable of simultaneously perceiving all information on which its output depends. We present several simple synthetic tasks, which become considerably mo… ▽ More

    Submitted 29 October, 2015; originally announced October 2015.

  48. arXiv:1509.08628  [pdf, ps, other

    cs.CC

    Often harder than in the Constructive Case: Destructive Bribery in CP-nets

    Authors: Britta Dorn, Dominikus Krüger, Patrick Scharpfenecker

    Abstract: We study the complexity of the destructive bribery problem---an external agent tries to prevent a disliked candidate from winning by bribery actions---in voting over combinatorial domains, where the set of candidates is the Cartesian product of several issues. This problem is related to the concept of the margin of victory of an election which constitutes a measure of robustness of the election ou… ▽ More

    Submitted 29 September, 2015; originally announced September 2015.

    Comments: 22 pages

  49. arXiv:1410.8516  [pdf, other

    cs.LG

    NICE: Non-linear Independent Components Estimation

    Authors: Laurent Dinh, David Krueger, Yoshua Bengio

    Abstract: We propose a deep learning framework for modeling complex high-dimensional densities called Non-linear Independent Component Estimation (NICE). It is based on the idea that a good representation is one in which the data has a distribution that is easy to model. For this purpose, a non-linear deterministic transformation of the data is learned that maps it to a latent space so as to make the transf… ▽ More

    Submitted 10 April, 2015; v1 submitted 30 October, 2014; originally announced October 2014.

    Comments: 11 pages and 2 pages Appendix, workshop paper at ICLR 2015

  50. On the Hardness of Bribery Variants in Voting with CP-Nets

    Authors: Britta Dorn, Dominikus Krüger

    Abstract: We continue previous work by Mattei et al. (Mattei, N., Pini, M., Rossi, F., Venable, K.: Bribery in voting with CP-nets. Ann. of Math. and Artif. Intell. pp. 1--26 (2013)) in which they study the computational complexity of bribery schemes when voters have conditional preferences that are modeled by CP-nets. For most of the cases they considered, they could show that the bribery problem is solvab… ▽ More

    Submitted 18 May, 2016; v1 submitted 20 October, 2014; originally announced October 2014.

    Comments: improved readability; identified Cheapest Subsets to be the enumeration variant of K.th Largest Subset, so we renamed it to K-Smallest Subsets and point to the literatur; some more typos fixed