Zum Hauptinhalt springen

Showing 1–34 of 34 results for author: Everitt, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.15058  [pdf, other

    cs.CY cs.AI

    A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI

    Authors: Seliem El-Sayed, Canfer Akbulut, Amanda McCroskery, Geoff Keeling, Zachary Kenton, Zaria Jalan, Nahema Marchal, Arianna Manzini, Toby Shevlane, Shannon Vallor, Daniel Susser, Matija Franklin, Sophie Bridgers, Harry Law, Matthew Rahtz, Murray Shanahan, Michael Henry Tessler, Arthur Douillard, Tom Everitt, Sasha Brown

    Abstract: Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, high… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  2. arXiv:2402.10877  [pdf, other

    cs.AI cs.LG

    Robust agents learn causal world models

    Authors: Jonathan Richens, Tom Everitt

    Abstract: It has long been hypothesised that causal reasoning plays a fundamental role in robust and general intelligence. However, it is not known if agents must learn causal models in order to generalise to new domains, or if other inductive biases are sufficient. We answer this question, showing that any agent capable of satisfying a regret bound under a large set of distributional shifts must have learn… ▽ More

    Submitted 19 July, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: ICLR 2024 (oral). Updated agents section, new corollary

  3. arXiv:2402.07221  [pdf, other

    cs.AI

    The Reasons that Agents Act: Intention and Instrumental Goals

    Authors: Francis Rhys Ward, Matt MacDermott, Francesco Belardinelli, Francesca Toni, Tom Everitt

    Abstract: Intention is an important and challenging concept in AI. It is important because it underlies many other concepts we care about, such as agency, manipulation, legal responsibility, and blame. However, ascribing intent to AI systems is contentious, and there is no universally accepted theory of intention applicable to AI agents. We operationalise the intention with which an agent acts, relating to… ▽ More

    Submitted 15 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: AAMAS24

  4. arXiv:2312.01350  [pdf, other

    cs.AI

    Honesty Is the Best Policy: Defining and Mitigating AI Deception

    Authors: Francis Rhys Ward, Francesco Belardinelli, Francesca Toni, Tom Everitt

    Abstract: Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals (for instance, in our experiments with language models, the goal of being evaluated as truthful). There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no o… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: Accepted as a spotlight at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  5. arXiv:2307.10987  [pdf, ps, other

    cs.AI cs.GT

    Characterising Decision Theories with Mechanised Causal Graphs

    Authors: Matt MacDermott, Tom Everitt, Francesco Belardinelli

    Abstract: How should my own decisions affect my beliefs about the outcomes I expect to achieve? If taking a certain action makes me view myself as a certain type of person, it might affect how I think others view me, and how I view others who are similar to me. This can influence my expected utility calculations and change which action I perceive to be best. Whether and how it should is subject to debate, w… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  6. arXiv:2305.19861  [pdf, other

    cs.AI

    Human Control: Definitions and Algorithms

    Authors: Ryan Carey, Tom Everitt

    Abstract: How can humans stay in control of advanced artificial intelligence systems? One proposal is corrigibility, which requires the agent to follow the instructions of a human overseer, without inappropriately influencing them. In this paper, we formally define a variant of corrigibility called shutdown instructability, and show that it implies appropriate shutdown behavior, retention of human autonomy,… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: UAI 2023

  7. Reasoning about Causality in Games

    Authors: Lewis Hammond, James Fox, Tom Everitt, Ryan Carey, Alessandro Abate, Michael Wooldridge

    Abstract: Causal reasoning and game-theoretic reasoning are fundamental topics in artificial intelligence, among many other disciplines: this paper is concerned with their intersection. Despite their importance, a formal framework that supports both these forms of reasoning has, until now, been lacking. We offer a solution in the form of (structural) causal games, which can be seen as extending Pearl's caus… ▽ More

    Submitted 17 April, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

    Comments: Published in Artificial Intelligence (2023)

  8. arXiv:2208.08345  [pdf, other

    cs.AI cs.LG

    Discovering Agents

    Authors: Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, Matt MacDermott, Tom Everitt

    Abstract: Causal models of agents have been used to analyse the safety aspects of machine learning systems. But identifying agents is non-trivial -- often the causal model is just assumed by the modeler without much justification -- and modelling failures can lead to mistakes in the safety analysis. This paper proposes the first formal causal definition of agents -- roughly that agents are systems that woul… ▽ More

    Submitted 24 August, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

    Comments: Some typos corrected

  9. arXiv:2204.10018  [pdf, other

    cs.AI stat.ML

    Path-Specific Objectives for Safer Agent Incentives

    Authors: Sebastian Farquhar, Ryan Carey, Tom Everitt

    Abstract: We present a general framework for training safe agents whose naive incentives are unsafe. As an example, manipulative or deceptive behaviour can improve rewards but should be avoided. Most approaches fail here: agents maximize expected return by any means necessary. We formally describe settings with 'delicate' parts of the state which should not be used as a means to an end. We then train agents… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

    Comments: Presented at AAAI 2022

  10. arXiv:2202.11629  [pdf, other

    cs.AI stat.ML

    A Complete Criterion for Value of Information in Soluble Influence Diagrams

    Authors: Chris van Merwijk, Ryan Carey, Tom Everitt

    Abstract: Influence diagrams have recently been used to analyse the safety and fairness properties of AI systems. A key building block for this analysis is a graphical criterion for value of information (VoI). This paper establishes the first complete graphical criterion for VoI in influence diagrams with multiple decisions. Along the way, we establish two important techniques for proving properties of mult… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: In Proceedings of the AAAI 2022 Conference

  11. arXiv:2202.10816  [pdf, other

    cs.LG

    Why Fair Labels Can Yield Unfair Predictions: Graphical Conditions for Introduced Unfairness

    Authors: Carolyn Ashurst, Ryan Carey, Silvia Chiappa, Tom Everitt

    Abstract: In addition to reproducing discriminatory relationships in the training data, machine learning systems can also introduce or amplify discriminatory effects. We refer to this as introduced unfairness, and investigate the conditions under which it may arise. To this end, we propose introduced total variation as a measure of introduced unfairness, and establish graphical conditions under which it may… ▽ More

    Submitted 23 February, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

    Comments: In Proceedings of the AAAI 2022 Conference

  12. arXiv:2110.10819  [pdf, other

    cs.LG cs.AI

    Shaking the foundations: delusions in sequence models for interaction and control

    Authors: Pedro A. Ortega, Markus Kunesch, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Joel Veness, Jonas Buchli, Jonas Degrave, Bilal Piot, Julien Perolat, Tom Everitt, Corentin Tallec, Emilio Parisotto, Tom Erez, Yutian Chen, Scott Reed, Marcus Hutter, Nando de Freitas, Shane Legg

    Abstract: The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains. One important problem class that has remained relatively elusive however is purposeful adaptive behavior. Currently there is a common perception that sequence models "lack the understanding of the cause and effect of… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

    Comments: DeepMind Tech Report, 16 pages, 4 figures

  13. arXiv:2103.14659  [pdf, other

    cs.AI cs.LG

    Alignment of Language Agents

    Authors: Zachary Kenton, Tom Everitt, Laura Weidinger, Iason Gabriel, Vladimir Mikulik, Geoffrey Irving

    Abstract: For artificial intelligence to be beneficial to humans the behaviour of AI agents needs to be aligned with what humans want. In this paper we discuss some behavioural issues for language agents, arising from accidental misspecification by the system designer. We highlight some ways that misspecification can occur and discuss some behavioural issues that could arise from misspecification, including… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

  14. arXiv:2102.07716  [pdf, other

    cs.AI

    How RL Agents Behave When Their Actions Are Modified

    Authors: Eric D. Langlois, Tom Everitt

    Abstract: Reinforcement learning in complex environments may require supervision to prevent the agent from attempting dangerous actions. As a result of supervisor intervention, the executed action may differ from the action specified by the policy. How does this affect learning? We present the Modified-Action Markov Decision Process, an extension of the MDP model that allows actions to differ from the polic… ▽ More

    Submitted 30 June, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

    Comments: 10 pages (+6 appendix); 7 figures. Published in the AAAI 2021 Conference on AI. Code is available at https://github.com/edlanglois/mamdp

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 35(13), 11586-11594 (2021)

  15. arXiv:2102.05008  [pdf, other

    cs.MA cs.AI cs.GT

    Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

    Authors: Lewis Hammond, James Fox, Tom Everitt, Alessandro Abate, Michael Wooldridge

    Abstract: Multi-agent influence diagrams (MAIDs) are a popular form of graphical model that, for certain classes of games, have been shown to offer key complexity and explainability advantages over traditional extensive form game (EFG) representations. In this paper, we extend previous work on MAIDs by introducing the concept of a MAID subgame, as well as subgame perfect and trembling hand perfect equilibri… ▽ More

    Submitted 9 February, 2021; originally announced February 2021.

    Comments: Accepted to the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-21)

  16. arXiv:2102.01685  [pdf, ps, other

    cs.AI cs.LG

    Agent Incentives: A Causal Perspective

    Authors: Tom Everitt, Ryan Carey, Eric Langlois, Pedro A Ortega, Shane Legg

    Abstract: We present a framework for analysing agent incentives using causal influence diagrams. We establish that a well-known criterion for value of information is complete. We propose a new graphical criterion for value of control, establishing its soundness and completeness. We also introduce two new concepts for incentive analysis: response incentives indicate which changes in the environment affect an… ▽ More

    Submitted 15 March, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

    Comments: In Proceedings of the AAAI 2021 Conference. Supersedes arXiv:1902.09980, arXiv:2001.07118

  17. arXiv:2011.08827  [pdf, other

    cs.LG cs.AI

    Avoiding Tampering Incentives in Deep RL via Decoupled Approval

    Authors: Jonathan Uesato, Ramana Kumar, Victoria Krakovna, Tom Everitt, Richard Ngo, Shane Legg

    Abstract: How can we design agents that pursue a given objective when all feedback mechanisms are influenceable by the agent? Standard RL algorithms assume a secure reward function, and can thus perform poorly in settings where agents can tamper with the reward-generating mechanism. We present a principled solution to the problem of learning from influenceable feedback, which combines approval with a decoup… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

  18. arXiv:2011.08820  [pdf, other

    cs.LG cs.AI

    REALab: An Embedded Perspective on Tampering

    Authors: Ramana Kumar, Jonathan Uesato, Richard Ngo, Tom Everitt, Victoria Krakovna, Shane Legg

    Abstract: This paper describes REALab, a platform for embedded agency research in reinforcement learning (RL). REALab is designed to model the structure of tampering problems that may arise in real-world deployments of RL. Standard Markov Decision Process (MDP) formulations of RL and simulated environments mirroring the MDP structure assume secure access to feedback (e.g., rewards). This may be unrealistic… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

  19. arXiv:2001.07118  [pdf, ps, other

    cs.AI cs.LG

    The Incentives that Shape Behaviour

    Authors: Ryan Carey, Eric Langlois, Tom Everitt, Shane Legg

    Abstract: Which variables does an agent have an incentive to control with its decision, and which variables does it have an incentive to respond to? We formalise these incentives, and demonstrate unique graphical criteria for detecting them in any single decision causal influence diagram. To this end, we introduce structural causal influence models, a hybrid of the influence diagram and structural causal mo… ▽ More

    Submitted 15 March, 2021; v1 submitted 20 January, 2020; originally announced January 2020.

    Comments: In SafeAI workshop at AAAI. Superseded by arXiv:2102.01685

    ACM Class: I.2.6; I.2.8

  20. arXiv:1908.04734  [pdf, ps, other

    cs.AI cs.LG

    Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective

    Authors: Tom Everitt, Marcus Hutter, Ramana Kumar, Victoria Krakovna

    Abstract: Can humans get arbitrarily capable reinforcement learning (RL) agents to do their bidding? Or will sufficiently capable RL agents always find ways to bypass their intended objectives by shortcutting their reward signal? This question impacts how far RL can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we study… ▽ More

    Submitted 26 March, 2021; v1 submitted 13 August, 2019; originally announced August 2019.

    Comments: Accepted to Synthese, March 2021

  21. arXiv:1906.08663  [pdf, other

    cs.AI

    Modeling AGI Safety Frameworks with Causal Influence Diagrams

    Authors: Tom Everitt, Ramana Kumar, Victoria Krakovna, Shane Legg

    Abstract: Proposals for safe AGI systems are typically made at the level of frameworks, specifying how the components of the proposed system should be trained and interact with each other. In this paper, we model and compare the most promising AGI safety frameworks using causal influence diagrams. The diagrams show the optimization objective and causal assumptions of the framework. The unified representatio… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

    Comments: IJCAI 2019 AI Safety Workshop

  22. arXiv:1902.09980  [pdf, ps, other

    cs.AI cs.LG

    Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings

    Authors: Tom Everitt, Pedro A. Ortega, Elizabeth Barnes, Shane Legg

    Abstract: Agents are systems that optimize an objective function in an environment. Together, the goal and the environment induce secondary objectives, incentives. Modeling the agent-environment interaction using causal influence diagrams, we can answer two fundamental questions about an agent's incentives directly from the graph: (1) which nodes can the agent have an incentivize to observe, and (2) which n… ▽ More

    Submitted 20 January, 2022; v1 submitted 26 February, 2019; originally announced February 2019.

    Comments: Mostly superseded by arXiv:2102.01685

    ACM Class: I.2.6; I.2.8

  23. arXiv:1811.07871  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Scalable agent alignment via reward modeling: a research direction

    Authors: Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg

    Abstract: One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions. Designing such reward functions is difficult in part because the user only has an implicit understanding of the task objective. This gives rise to the agent alignment problem: how do we create agents that behave in accordance with the user's intentions? We outline a high-leve… ▽ More

    Submitted 19 November, 2018; originally announced November 2018.

  24. arXiv:1805.01109  [pdf, other

    cs.AI

    AGI Safety Literature Review

    Authors: Tom Everitt, Gary Lea, Marcus Hutter

    Abstract: The development of Artificial General Intelligence (AGI) promises to be a major event. Along with its many potential benefits, it also raises serious safety concerns (Bostrom, 2014). The intention of this paper is to provide an easily accessible and up-to-date collection of references for the emerging field of AGI safety. A significant number of safety problems for AGI have been identified. We lis… ▽ More

    Submitted 21 May, 2018; v1 submitted 3 May, 2018; originally announced May 2018.

    Comments: Published in International Joint Conference on Artificial Intelligence (IJCAI), 2018

  25. arXiv:1711.09883  [pdf, other

    cs.LG cs.AI

    AI Safety Gridworlds

    Authors: Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg

    Abstract: We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries. To measure compliance with the intended safe behavior, we equip each environ… ▽ More

    Submitted 28 November, 2017; v1 submitted 27 November, 2017; originally announced November 2017.

  26. arXiv:1708.03871  [pdf, ps, other

    cs.GT

    A Game-Theoretic Analysis of the Off-Switch Game

    Authors: Tobias Wängberg, Mikael Böörs, Elliot Catt, Tom Everitt, Marcus Hutter

    Abstract: The off-switch game is a game theoretic model of a highly intelligent robot interacting with a human. In the original paper by Hadfield-Menell et al. (2016), the analysis is not fully game-theoretic as the human is modelled as an irrational player, and the robot's best action is only calculated under unrealistic normality and soft-max assumptions. In this paper, we make the analysis fully game the… ▽ More

    Submitted 13 August, 2017; originally announced August 2017.

    Journal ref: Artificial General Intelligence: 10th International Conference, AGI 2017, Melbourne, VIC, Australia, August 15-18, 2017, Proceedings, pages 167-177

  27. arXiv:1706.08090  [pdf, other

    cs.AI

    Count-Based Exploration in Feature Space for Reinforcement Learning

    Authors: Jarryd Martin, Suraj Narayanan Sasikumar, Tom Everitt, Marcus Hutter

    Abstract: We introduce a new count-based optimistic exploration algorithm for Reinforcement Learning (RL) that is feasible in environments with high-dimensional state-action spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited sta… ▽ More

    Submitted 25 June, 2017; originally announced June 2017.

    Comments: Conference: Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17), 8 pages, 1 figure

    ACM Class: I.2.6

  28. arXiv:1705.08417  [pdf, other

    cs.AI cs.LG stat.ML

    Reinforcement Learning with a Corrupted Reward Channel

    Authors: Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg

    Abstract: No real-world reward function is perfect. Sensory errors and software bugs may result in RL agents observing higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward… ▽ More

    Submitted 19 August, 2017; v1 submitted 23 May, 2017; originally announced May 2017.

    Comments: A shorter version of this report was accepted to IJCAI 2017 AI and Autonomy track

    ACM Class: I.2.6; I.2.8

  29. Free Lunch for Optimisation under the Universal Distribution

    Authors: Tom Everitt, Tor Lattimore, Marcus Hutter

    Abstract: Function optimisation is a major challenge in computer science. The No Free Lunch theorems state that if all functions with the same histogram are assumed to be equally probable then no algorithm outperforms any other in expectation. We argue against the uniform assumption and suggest a universal prior exists for which there is a free lunch, but where no particular class of functions is favoured o… ▽ More

    Submitted 16 August, 2016; originally announced August 2016.

    ACM Class: G.1.6

    Journal ref: Proceedings of 2014 IEEE Congress on Evolutionary Computation (CEC), July 6-11, 2014, Beijing, China, pp. 167-174

  30. arXiv:1606.00652  [pdf, ps, other

    cs.AI

    Death and Suicide in Universal Artificial Intelligence

    Authors: Jarryd Martin, Tom Everitt, Marcus Hutter

    Abstract: Reinforcement learning (RL) is a general paradigm for studying intelligent behaviour, with applications ranging from artificial intelligence to psychology and economics. AIXI is a universal solution to the RL problem; it can learn any computable environment. A technical subtlety of AIXI is that it is defined using a mixture over semimeasures that need not sum to 1, rather than over proper probabil… ▽ More

    Submitted 2 June, 2016; originally announced June 2016.

    Comments: Conference: Artificial General Intelligence (AGI) 2016 13 pages, 2 figures

    ACM Class: I.2.0; I.2.6

  31. arXiv:1605.03143  [pdf, other

    cs.AI

    Avoiding Wireheading with Value Reinforcement Learning

    Authors: Tom Everitt, Marcus Hutter

    Abstract: How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) is a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward -- the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL,… ▽ More

    Submitted 10 May, 2016; originally announced May 2016.

    Comments: Artificial General Intelligence (AGI) 2016

  32. arXiv:1605.03142  [pdf, other

    cs.AI

    Self-Modification of Policy and Utility Function in Rational Agents

    Authors: Tom Everitt, Daniel Filan, Mayank Daswani, Marcus Hutter

    Abstract: Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify -- for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intell… ▽ More

    Submitted 10 May, 2016; originally announced May 2016.

    Comments: Artificial General Intelligence (AGI) 2016

  33. arXiv:1509.02709  [pdf, other

    cs.AI

    A Topological Approach to Meta-heuristics: Analytical Results on the BFS vs. DFS Algorithm Selection Problem

    Authors: Tom Everitt, Marcus Hutter

    Abstract: Search is a central problem in artificial intelligence, and breadth-first search (BFS) and depth-first search (DFS) are the two most fundamental ways to search. In this paper we derive estimates for average BFS and DFS runtime. The average runtime estimates can be used to allocate resources or judge the hardness of a problem. They can also be used for selecting the best graph representation, and f… ▽ More

    Submitted 12 April, 2018; v1 submitted 9 September, 2015; originally announced September 2015.

    Comments: Main results published in 28th Australian Joint Conference on Artificial Intelligence, 2015

    ACM Class: I.2.8

  34. arXiv:1506.07359  [pdf, ps, other

    cs.AI

    Sequential Extensions of Causal and Evidential Decision Theory

    Authors: Tom Everitt, Jan Leike, Marcus Hutter

    Abstract: Moving beyond the dualistic view in AI where agent and environment are separated incurs new challenges for decision making, as calculation of expected utility is no longer straightforward. The non-dualistic decision theory literature is split between causal decision theory and evidential decision theory. We extend these decision algorithms to the sequential setting where the agent alternates betwe… ▽ More

    Submitted 24 June, 2015; originally announced June 2015.

    Comments: ADT 2015