Search | arXiv e-print repository

arXiv:2207.02098 [pdf, other]

Neural Networks and the Chomsky Hierarchy

Authors: Grégoire Delétang, Anian Ruoss, Jordi Grau-Moya, Tim Genewein, Li Kevin Wenliang, Elliot Catt, Chris Cundy, Marcus Hutter, Shane Legg, Joel Veness, Pedro A. Ortega

Abstract: Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neural networks generalize remains one of the most important unsolved problems in the field. In this work, we conduct an extensive empirical study (20'910 models, 15 tasks) to investigate whether insights from the theory of computation can predict the limits of neural network generalization in practice… ▽ More Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neural networks generalize remains one of the most important unsolved problems in the field. In this work, we conduct an extensive empirical study (20'910 models, 15 tasks) to investigate whether insights from the theory of computation can predict the limits of neural network generalization in practice. We demonstrate that grouping tasks according to the Chomsky hierarchy allows us to forecast whether certain architectures will be able to generalize to out-of-distribution inputs. This includes negative results where even extensive amounts of data and training time never lead to any non-trivial generalization, despite models having sufficient capacity to fit the training data perfectly. Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, LSTMs can solve regular and counter-language tasks, and only networks augmented with structured memory (such as a stack or memory tape) can successfully generalize on context-free and context-sensitive tasks. △ Less

Submitted 28 February, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

arXiv:2111.02907 [pdf, other]

Model-Free Risk-Sensitive Reinforcement Learning

Authors: Grégoire Delétang, Jordi Grau-Moya, Markus Kunesch, Tim Genewein, Rob Brekelmans, Shane Legg, Pedro A. Ortega

Abstract: We extend temporal-difference (TD) learning in order to obtain risk-sensitive, model-free reinforcement learning algorithms. This extension can be regarded as modification of the Rescorla-Wagner rule, where the (sigmoidal) stimulus is taken to be either the event of over- or underestimating the TD target. As a result, one obtains a stochastic approximation rule for estimating the free energy from… ▽ More We extend temporal-difference (TD) learning in order to obtain risk-sensitive, model-free reinforcement learning algorithms. This extension can be regarded as modification of the Rescorla-Wagner rule, where the (sigmoidal) stimulus is taken to be either the event of over- or underestimating the TD target. As a result, one obtains a stochastic approximation rule for estimating the free energy from i.i.d. samples generated by a Gaussian distribution with unknown mean and variance. Since the Gaussian free energy is known to be a certainty-equivalent sensitive to the mean and the variance, the learning rule has applications in risk-sensitive decision-making. △ Less

Submitted 4 November, 2021; originally announced November 2021.

Comments: DeepMind Tech Report: 13 pages, 4 figures

arXiv:2110.10819 [pdf, other]

Shaking the foundations: delusions in sequence models for interaction and control

Authors: Pedro A. Ortega, Markus Kunesch, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Joel Veness, Jonas Buchli, Jonas Degrave, Bilal Piot, Julien Perolat, Tom Everitt, Corentin Tallec, Emilio Parisotto, Tom Erez, Yutian Chen, Scott Reed, Marcus Hutter, Nando de Freitas, Shane Legg

Abstract: The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains. One important problem class that has remained relatively elusive however is purposeful adaptive behavior. Currently there is a common perception that sequence models "lack the understanding of the cause and effect of… ▽ More The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains. One important problem class that has remained relatively elusive however is purposeful adaptive behavior. Currently there is a common perception that sequence models "lack the understanding of the cause and effect of their actions" leading them to draw incorrect inferences due to auto-suggestive delusions. In this report we explain where this mismatch originates, and show that it can be resolved by treating actions as causal interventions. Finally, we show that in supervised learning, one can teach a system to condition or intervene on data by training with factual and counterfactual error signals respectively. △ Less

Submitted 20 October, 2021; originally announced October 2021.

Comments: DeepMind Tech Report, 16 pages, 4 figures

arXiv:2103.03938 [pdf, other]

Causal Analysis of Agent Behavior for AI Safety

Authors: Grégoire Déletang, Jordi Grau-Moya, Miljan Martic, Tim Genewein, Tom McGrath, Vladimir Mikulik, Markus Kunesch, Shane Legg, Pedro A. Ortega

Abstract: As machine learning systems become more powerful they also become increasingly unpredictable and opaque. Yet, finding human-understandable explanations of how they work is essential for their safe deployment. This technical report illustrates a methodology for investigating the causal mechanisms that drive the behaviour of artificial agents. Six use cases are covered, each addressing a typical que… ▽ More As machine learning systems become more powerful they also become increasingly unpredictable and opaque. Yet, finding human-understandable explanations of how they work is essential for their safe deployment. This technical report illustrates a methodology for investigating the causal mechanisms that drive the behaviour of artificial agents. Six use cases are covered, each addressing a typical question an analyst might ask about an agent. In particular, we show that each question cannot be addressed by pure observation alone, but instead requires conducting experiments with systematically chosen manipulations so as to generate the correct causal evidence. △ Less

Submitted 5 March, 2021; originally announced March 2021.

Comments: 16 pages, 16 figures, 6 tables

arXiv:2102.01685 [pdf, ps, other]

Agent Incentives: A Causal Perspective

Authors: Tom Everitt, Ryan Carey, Eric Langlois, Pedro A Ortega, Shane Legg

Abstract: We present a framework for analysing agent incentives using causal influence diagrams. We establish that a well-known criterion for value of information is complete. We propose a new graphical criterion for value of control, establishing its soundness and completeness. We also introduce two new concepts for incentive analysis: response incentives indicate which changes in the environment affect an… ▽ More We present a framework for analysing agent incentives using causal influence diagrams. We establish that a well-known criterion for value of information is complete. We propose a new graphical criterion for value of control, establishing its soundness and completeness. We also introduce two new concepts for incentive analysis: response incentives indicate which changes in the environment affect an optimal decision, while instrumental control incentives establish whether an agent can influence its utility via a variable X. For both new concepts, we provide sound and complete graphical criteria. We show by example how these results can help with evaluating the safety and fairness of an AI system. △ Less

Submitted 15 March, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

Comments: In Proceedings of the AAAI 2021 Conference. Supersedes arXiv:1902.09980, arXiv:2001.07118

arXiv:2010.12237 [pdf, other]

Algorithms for Causal Reasoning in Probability Trees

Authors: Tim Genewein, Tom McGrath, Grégoire Déletang, Vladimir Mikulik, Miljan Martic, Shane Legg, Pedro A. Ortega

Abstract: Probability trees are one of the simplest models of causal generative processes. They possess clean semantics and -- unlike causal Bayesian networks -- they can represent context-specific causal dependencies, which are necessary for e.g. causal induction. Yet, they have received little attention from the AI and ML community. Here we present concrete algorithms for causal reasoning in discrete prob… ▽ More Probability trees are one of the simplest models of causal generative processes. They possess clean semantics and -- unlike causal Bayesian networks -- they can represent context-specific causal dependencies, which are necessary for e.g. causal induction. Yet, they have received little attention from the AI and ML community. Here we present concrete algorithms for causal reasoning in discrete probability trees that cover the entire causal hierarchy (association, intervention, and counterfactuals), and operate on arbitrary propositional and causal events. Our work expands the domain of causal reasoning to a very general class of discrete stochastic processes. △ Less

Submitted 11 November, 2020; v1 submitted 23 October, 2020; originally announced October 2020.

Comments: (2nd version with correction to algorithm) 11 pages, 8 figures, 5 algorithms. A companion Colaboratory tutorial is available at https://github.com/deepmind/deepmind-research/tree/master/causal_reasoning

arXiv:2010.11223 [pdf, other]

Meta-trained agents implement Bayes-optimal agents

Authors: Vladimir Mikulik, Grégoire Delétang, Tom McGrath, Tim Genewein, Miljan Martic, Shane Legg, Pedro A. Ortega

Abstract: Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical… ▽ More Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical computer science, we show that meta-learned and Bayes-optimal agents not only behave alike, but they even share a similar computational structure, in the sense that one agent system can approximately simulate the other. Furthermore, we show that Bayes-optimal agents are fixed points of the meta-learning dynamics. Our results suggest that memory-based meta-learning might serve as a general technique for numerically approximating Bayes-optimal agents - that is, even for task distributions for which we currently don't possess tractable models. △ Less

Submitted 21 October, 2020; originally announced October 2020.

Comments: Published at 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

arXiv:2009.01791 [pdf, other]

Action and Perception as Divergence Minimization

Authors: Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess

Abstract: To learn directed behaviors in complex environments, intelligent agents need to optimize objective functions. Various objectives are known for designing artificial agents, including task rewards and intrinsic motivation. However, it is unclear how the known objectives relate to each other, which objectives remain yet to be discovered, and which objectives better describe the behavior of humans. We… ▽ More To learn directed behaviors in complex environments, intelligent agents need to optimize objective functions. Various objectives are known for designing artificial agents, including task rewards and intrinsic motivation. However, it is unclear how the known objectives relate to each other, which objectives remain yet to be discovered, and which objectives better describe the behavior of humans. We introduce the Action Perception Divergence (APD), an approach for categorizing the space of possible objective functions for embodied agents. We show a spectrum that reaches from narrow to general objectives. While the narrow objectives correspond to domain-specific rewards as typical in reinforcement learning, the general objectives maximize information with the environment through latent variable models of input sequences. Intuitively, these agents use perception to align their beliefs with the world and use actions to align the world with their beliefs. They infer representations that are informative of past inputs, explore future inputs that are informative of their representations, and select actions or skills that maximally influence future inputs. This explains a wide range of unsupervised objectives from a single principle, including representation learning, information gain, empowerment, and skill discovery. Our findings suggest leveraging powerful world models for unsupervised exploration as a path toward highly adaptive agents that seek out large niches in their environments, rendering task rewards optional. △ Less

Submitted 12 February, 2022; v1 submitted 3 September, 2020; originally announced September 2020.

Comments: Website: https://danijar.com/apd

arXiv:1905.06424 [pdf, other]

Meta reinforcement learning as task inference

Authors: Jan Humplik, Alexandre Galashov, Leonard Hasenclever, Pedro A. Ortega, Yee Whye Teh, Nicolas Heess

Abstract: Humans achieve efficient learning by relying on prior knowledge about the structure of naturally occurring tasks. There is considerable interest in designing reinforcement learning (RL) algorithms with similar properties. This includes proposals to learn the learning algorithm itself, an idea also known as meta learning. One formal interpretation of this idea is as a partially observable multi-tas… ▽ More Humans achieve efficient learning by relying on prior knowledge about the structure of naturally occurring tasks. There is considerable interest in designing reinforcement learning (RL) algorithms with similar properties. This includes proposals to learn the learning algorithm itself, an idea also known as meta learning. One formal interpretation of this idea is as a partially observable multi-task RL problem in which task information is hidden from the agent. Such unknown task problems can be reduced to Markov decision processes (MDPs) by augmenting an agent's observations with an estimate of the belief about the task based on past experience. However estimating the belief state is intractable in most partially-observed MDPs. We propose a method that separately learns the policy and the task belief by taking advantage of various kinds of privileged information. Our approach can be very effective at solving standard meta-RL environments, as well as a complex continuous control environment with sparse rewards and requiring long-term memory. △ Less

Submitted 22 October, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

arXiv:1905.03030 [pdf, other]

Meta-learning of Sequential Strategies

Authors: Pedro A. Ortega, Jane X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alex Pritzel, Pablo Sprechmann, Siddhant M. Jayakumar, Tom McGrath, Kevin Miller, Mohammad Azar, Ian Osband, Neil Rabinowitz, András György, Silvia Chiappa, Simon Osindero, Yee Whye Teh, Hado van Hasselt, Nando de Freitas, Matthew Botvinick, Shane Legg

Abstract: In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal pred… ▽ More In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal predictors and reinforcement learners which behave as if they had a probabilistic model that allowed them to efficiently exploit task structure. Furthermore, we recast memory-based meta-learning within a Bayesian framework, showing that the meta-learned strategies are near-optimal because they amortize Bayes-filtered data, where the adaptation is implemented in the memory dynamics as a state-machine of sufficient statistics. Essentially, memory-based meta-learning translates the hard problem of probabilistic sequential inference into a regression problem. △ Less

Submitted 18 July, 2019; v1 submitted 8 May, 2019; originally announced May 2019.

Comments: DeepMind Technical Report (15 pages, 6 figures). Version V1.1

arXiv:1902.09980 [pdf, ps, other]

Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings

Authors: Tom Everitt, Pedro A. Ortega, Elizabeth Barnes, Shane Legg

Abstract: Agents are systems that optimize an objective function in an environment. Together, the goal and the environment induce secondary objectives, incentives. Modeling the agent-environment interaction using causal influence diagrams, we can answer two fundamental questions about an agent's incentives directly from the graph: (1) which nodes can the agent have an incentivize to observe, and (2) which n… ▽ More Agents are systems that optimize an objective function in an environment. Together, the goal and the environment induce secondary objectives, incentives. Modeling the agent-environment interaction using causal influence diagrams, we can answer two fundamental questions about an agent's incentives directly from the graph: (1) which nodes can the agent have an incentivize to observe, and (2) which nodes can the agent have an incentivize to control? The answers tell us which information and influence points need extra protection. For example, we may want a classifier for job applications to not use the ethnicity of the candidate, and a reinforcement learning agent not to take direct control of its reward mechanism. Different algorithms and training paradigms can lead to different causal influence diagrams, so our method can be used to identify algorithms with problematic incentives and help in designing algorithms with better incentives. △ Less

Submitted 20 January, 2022; v1 submitted 26 February, 2019; originally announced February 2019.

Comments: Mostly superseded by arXiv:2102.01685

ACM Class: I.2.6; I.2.8

arXiv:1810.08647 [pdf, other]

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

Authors: Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

Abstract: We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions. Causal influence is assessed using counterfactual reasoning. At each timestep, an agent simulates alternate actions that it could have taken, and computes their effect on the behavior of other agen… ▽ More We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions. Causal influence is assessed using counterfactual reasoning. At each timestep, an agent simulates alternate actions that it could have taken, and computes their effect on the behavior of other agents. Actions that lead to bigger changes in other agents' behavior are considered influential and are rewarded. We show that this is equivalent to rewarding agents for having high mutual information between their actions. Empirical results demonstrate that influence leads to enhanced coordination and communication in challenging social dilemma environments, dramatically increasing the learning curves of the deep RL agents, and leading to more meaningful learned communication protocols. The influence rewards for all agents can be computed in a decentralized way by enabling agents to learn a model of other agents using deep neural networks. In contrast, key previous works on emergent communication in the MARL setting were unable to learn diverse policies in a decentralized manner and had to resort to centralized training. Consequently, the influence reward opens up a window of new opportunities for research in this area. △ Less

Submitted 18 June, 2019; v1 submitted 19 October, 2018; originally announced October 2018.

arXiv:1807.00196 [pdf, other]

Modeling Friends and Foes

Authors: Pedro A. Ortega, Shane Legg

Abstract: How can one detect friendly and adversarial behavior from raw data? Detecting whether an environment is a friend, a foe, or anything in between, remains a poorly understood yet desirable ability for safe and robust agents. This paper proposes a definition of these environmental "attitudes" based on an characterization of the environment's ability to react to the agent's private strategy. We define… ▽ More How can one detect friendly and adversarial behavior from raw data? Detecting whether an environment is a friend, a foe, or anything in between, remains a poorly understood yet desirable ability for safe and robust agents. This paper proposes a definition of these environmental "attitudes" based on an characterization of the environment's ability to react to the agent's private strategy. We define an objective function for a one-shot game that allows deriving the environment's probability distribution under friendly and adversarial assumptions alongside the agent's optimal strategy. Furthermore, we present an algorithm to compute these equilibrium strategies, and show experimentally that both friendly and adversarial environments possess non-trivial optimal strategies. △ Less

Submitted 30 June, 2018; originally announced July 2018.

Comments: 13 pages, 9 figures

arXiv:1711.09883 [pdf, other]

AI Safety Gridworlds

Authors: Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg

Abstract: We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries. To measure compliance with the intended safe behavior, we equip each environ… ▽ More We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries. To measure compliance with the intended safe behavior, we equip each environment with a performance function that is hidden from the agent. This allows us to categorize AI safety problems into robustness and specification problems, depending on whether the performance function corresponds to the observed reward function. We evaluate A2C and Rainbow, two recent deep reinforcement learning agents, on our environments and show that they are not able to solve them satisfactorily. △ Less

Submitted 28 November, 2017; v1 submitted 27 November, 2017; originally announced November 2017.

arXiv:1610.01698 [pdf, other]

Human Decision-Making under Limited Time

Authors: Pedro A. Ortega, Alan A. Stocker

Abstract: Subjective expected utility theory assumes that decision-makers possess unlimited computational resources to reason about their choices; however, virtually all decisions in everyday life are made under resource constraints - i.e. decision-makers are bounded in their rationality. Here we experimentally tested the predictions made by a formalization of bounded rationality based on ideas from statist… ▽ More Subjective expected utility theory assumes that decision-makers possess unlimited computational resources to reason about their choices; however, virtually all decisions in everyday life are made under resource constraints - i.e. decision-makers are bounded in their rationality. Here we experimentally tested the predictions made by a formalization of bounded rationality based on ideas from statistical mechanics and information-theory. We systematically tested human subjects in their ability to solve combinatorial puzzles under different time limitations. We found that our bounded-rational model accounts well for the data. The decomposition of the fitted model parameter into the subjects' expected utility function and resource parameter provide interesting insight into the subjects' information capacity limits. Our results confirm that humans gradually fall back on their learned prior choice patterns when confronted with increasing resource limitations. △ Less

Submitted 5 October, 2016; originally announced October 2016.

Comments: 9 pages, 4 figures, NIPS Advances in Neural Information Processing Systems 29, 2016

arXiv:1604.05129 [pdf, other]

Memory shapes time perception and intertemporal choices

Authors: Pedro A. Ortega, Naftali Tishby

Abstract: There is a consensus that human and non-human subjects experience temporal distortions in many stages of their perceptual and decision-making systems. Similarly, intertemporal choice research has shown that decision-makers undervalue future outcomes relative to immediate ones. Here we combine techniques from information theory and artificial intelligence to show how both temporal distortions and i… ▽ More There is a consensus that human and non-human subjects experience temporal distortions in many stages of their perceptual and decision-making systems. Similarly, intertemporal choice research has shown that decision-makers undervalue future outcomes relative to immediate ones. Here we combine techniques from information theory and artificial intelligence to show how both temporal distortions and intertemporal choice preferences can be explained as a consequence of the coding efficiency of sensorimotor representation. In particular, the model implies that interactions that constrain future behavior are perceived as being both longer in duration and more valuable. Furthermore, using simulations of artificial agents, we investigate how memory constraints enforce a renormalization of the perceived timescales. Our results show that qualitatively different discount functions, such as exponential and hyperbolic discounting, arise as a consequence of an agent's probabilistic model of the world. △ Less

Submitted 29 May, 2016; v1 submitted 18 April, 2016; originally announced April 2016.

Comments: 24 pages, 4 figures, 2 tables. Submitted

arXiv:1512.06789 [pdf, other]

Information-Theoretic Bounded Rationality

Authors: Pedro A. Ortega, Daniel A. Braun, Justin Dyer, Kee-Eung Kim, Naftali Tishby

Abstract: Bounded rationality, that is, decision-making and planning under resource limitations, is widely regarded as an important open problem in artificial intelligence, reinforcement learning, computational neuroscience and economics. This paper offers a consolidated presentation of a theory of bounded rationality based on information-theoretic ideas. We provide a conceptual justification for using the… ▽ More Bounded rationality, that is, decision-making and planning under resource limitations, is widely regarded as an important open problem in artificial intelligence, reinforcement learning, computational neuroscience and economics. This paper offers a consolidated presentation of a theory of bounded rationality based on information-theoretic ideas. We provide a conceptual justification for using the free energy functional as the objective function for characterizing bounded-rational decisions. This functional possesses three crucial properties: it controls the size of the solution space; it has Monte Carlo planners that are exact, yet bypass the need for exhaustive search; and it captures model uncertainty arising from lack of evidence or from interacting with other agents having unknown intentions. We discuss the single-step decision-making case, and show how to extend it to sequential decisions using equivalence transformations. This extension yields a very general class of decision problems that encompass classical decision rules (e.g. EXPECTIMAX and MINIMAX) as limit cases, as well as trust- and risk-sensitive planning. △ Less

Submitted 21 December, 2015; originally announced December 2015.

Comments: 47 pages, 19 figures

arXiv:1505.07067 [pdf, other]

Belief Flows of Robust Online Learning

Authors: Pedro A. Ortega, Koby Crammer, Daniel D. Lee

Abstract: This paper introduces a new probabilistic model for online learning which dynamically incorporates information from stochastic gradients of an arbitrary loss function. Similar to probabilistic filtering, the model maintains a Gaussian belief over the optimal weight parameters. Unlike traditional Bayesian updates, the model incorporates a small number of gradient evaluations at locations chosen usi… ▽ More This paper introduces a new probabilistic model for online learning which dynamically incorporates information from stochastic gradients of an arbitrary loss function. Similar to probabilistic filtering, the model maintains a Gaussian belief over the optimal weight parameters. Unlike traditional Bayesian updates, the model incorporates a small number of gradient evaluations at locations chosen using Thompson sampling, making it computationally tractable. The belief is then transformed via a linear flow field which optimally updates the belief distribution using rules derived from information theoretic principles. Several versions of the algorithm are shown using different constraints on the flow field and compared with conventional online learning algorithms. Results are given for several classification tasks including logistic regression and multilayer neural networks. △ Less

Submitted 26 May, 2015; originally announced May 2015.

Comments: Appears in Workshop on Information Theory and Applications (ITA), February 2015

arXiv:1407.4139 [pdf, other]

Subjectivity, Bayesianism, and Causality

Authors: Pedro A. Ortega

Abstract: Bayesian probability theory is one of the most successful frameworks to model reasoning under uncertainty. Its defining property is the interpretation of probabilities as degrees of belief in propositions about the state of the world relative to an inquiring subject. This essay examines the notion of subjectivity by drawing parallels between Lacanian theory and Bayesian probability theory, and con… ▽ More Bayesian probability theory is one of the most successful frameworks to model reasoning under uncertainty. Its defining property is the interpretation of probabilities as degrees of belief in propositions about the state of the world relative to an inquiring subject. This essay examines the notion of subjectivity by drawing parallels between Lacanian theory and Bayesian probability theory, and concludes that the latter must be enriched with causal interventions to model agency. The central contribution of this work is an abstract model of the subject that accommodates causal interventions in a measure-theoretic formalisation. This formalisation is obtained through a game-theoretic Ansatz based on modelling the inside and outside of the subject as an extensive-form game with imperfect information between two players. Finally, I illustrate the expressiveness of this model with an example of causal induction. △ Less

Submitted 24 April, 2015; v1 submitted 15 July, 2014; originally announced July 2014.

Comments: 21 pages, 21 figures. Submitted to Special Issue of Pattern Recognition Letters on "Philosophical aspects of pattern recognition"

arXiv:1404.5668 [pdf, other]

An Adversarial Interpretation of Information-Theoretic Bounded Rationality

Authors: Pedro A. Ortega, Daniel D. Lee

Abstract: Recently, there has been a growing interest in modeling planning with information constraints. Accordingly, an agent maximizes a regularized expected utility known as the free energy, where the regularizer is given by the information divergence from a prior to a posterior policy. While this approach can be justified in various ways, including from statistical mechanics and information theory, it i… ▽ More Recently, there has been a growing interest in modeling planning with information constraints. Accordingly, an agent maximizes a regularized expected utility known as the free energy, where the regularizer is given by the information divergence from a prior to a posterior policy. While this approach can be justified in various ways, including from statistical mechanics and information theory, it is still unclear how it relates to decision-making against adversarial environments. This connection has previously been suggested in work relating the free energy to risk-sensitive control and to extensive form games. Here, we show that a single-agent free energy optimization is equivalent to a game between the agent and an imaginary adversary. The adversary can, by paying an exponential penalty, generate costs that diminish the decision maker's payoffs. It turns out that the optimal strategy of the adversary consists in choosing costs so as to render the decision maker indifferent among its choices, which is a definining property of a Nash equilibrium, thus tightening the connection between free energy optimization and game theory. △ Less

Submitted 22 April, 2014; originally announced April 2014.

Comments: 7 pages, 4 figures. Proceedings of AAAI-14

arXiv:1303.4431 [pdf, ps, other]

doi 10.1186/2194-3206-2-2

Generalized Thompson Sampling for Sequential Decision-Making and Causal Inference

Authors: Pedro A. Ortega, Daniel A. Braun

Abstract: Recently, it has been shown how sampling actions from the predictive distribution over the optimal action-sometimes called Thompson sampling-can be applied to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution can then be constructed by a Bayesian superposition of the optimal policies weighted by their posterior p… ▽ More Recently, it has been shown how sampling actions from the predictive distribution over the optimal action-sometimes called Thompson sampling-can be applied to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution can then be constructed by a Bayesian superposition of the optimal policies weighted by their posterior probability that is updated by Bayesian inference and causal calculus. Here we discuss three important features of this approach. First, we discuss in how far such Thompson sampling can be regarded as a natural consequence of the Bayesian modeling of policy uncertainty. Second, we show how Thompson sampling can be used to study interactions between multiple adaptive agents, thus, opening up an avenue of game-theoretic analysis. Third, we show how Thompson sampling can be applied to infer causal relationships when interacting with an environment in a sequential fashion. In summary, our results suggest that Thompson sampling might not merely be a useful heuristic, but a principled method to address problems of adaptive sequential decision-making and causal inference. △ Less

Submitted 18 March, 2013; originally announced March 2013.

Comments: 28 pages, 5 figures

Journal ref: Complex Adaptive Systems Modeling 2014, 2:2

arXiv:1206.1898 [pdf, ps, other]

A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

Authors: Pedro A. Ortega, Jordi Grau-Moya, Tim Genewein, David Balduzzi, Daniel A. Braun

Abstract: We propose a novel Bayesian approach to solve stochastic optimization problems that involve finding extrema of noisy, nonlinear functions. Previous work has focused on representing possible functions explicitly, which leads to a two-step procedure of first, doing inference over the function space and second, finding the extrema of these functions. Here we skip the representation step and directly… ▽ More We propose a novel Bayesian approach to solve stochastic optimization problems that involve finding extrema of noisy, nonlinear functions. Previous work has focused on representing possible functions explicitly, which leads to a two-step procedure of first, doing inference over the function space and second, finding the extrema of these functions. Here we skip the representation step and directly model the distribution over extrema. To this end, we devise a non-parametric conjugate prior based on a kernel regressor. The resulting posterior distribution directly captures the uncertainty over the maximum of the unknown function. We illustrate the effectiveness of our model by optimizing a noisy, high-dimensional, non-convex objective function. △ Less

Submitted 10 November, 2012; v1 submitted 8 June, 2012; originally announced June 2012.

Comments: 9 pages, 5 figures

Journal ref: Neural Information Processing Systems (NIPS) 2012

arXiv:1205.3997 [pdf, ps, other]

Free Energy and the Generalized Optimality Equations for Sequential Decision Making

Authors: Pedro A. Ortega, Daniel A. Braun

Abstract: The free energy functional has recently been proposed as a variational principle for bounded rational decision-making, since it instantiates a natural trade-off between utility gains and information processing costs that can be axiomatically derived. Here we apply the free energy principle to general decision trees that include both adversarial and stochastic environments. We derive generalized se… ▽ More The free energy functional has recently been proposed as a variational principle for bounded rational decision-making, since it instantiates a natural trade-off between utility gains and information processing costs that can be axiomatically derived. Here we apply the free energy principle to general decision trees that include both adversarial and stochastic environments. We derive generalized sequential optimality equations that not only include the Bellman optimality equations as a limit case, but also lead to well-known decision-rules such as Expectimax, Minimax and Expectiminimax. We show how these decision-rules can be derived from a single free energy principle that assigns a resource parameter to each node in the decision tree. These resource parameters express a concrete computational cost that can be measured as the amount of samples that are needed from the distribution that belongs to each node. The free energy principle therefore provides the normative basis for generalized optimality equations that account for both adversarial and stochastic environments. △ Less

Submitted 17 May, 2012; originally announced May 2012.

Comments: 10 pages, 2 figures

Journal ref: European Workshop on Reinforcement Learning 2012

arXiv:1204.6481 [pdf, ps, other]

doi 10.1098/rspa.2012.0683

Thermodynamics as a theory of decision-making with information processing costs

Authors: Pedro A. Ortega, Daniel A. Braun

Abstract: Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here we propose an information-theoretic formalization of bounded rational decision-making where decision-makers trade off expected utility and information processing costs. Such bounded rational decision-makers can be thought of as thermodynamic machines… ▽ More Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here we propose an information-theoretic formalization of bounded rational decision-making where decision-makers trade off expected utility and information processing costs. Such bounded rational decision-makers can be thought of as thermodynamic machines that undergo physical state changes when they compute. Their behavior is governed by a free energy functional that trades off changes in internal energy-as a proxy for utility-and entropic changes representing computational costs induced by changing states. As a result, the bounded rational decision-making problem can be rephrased in terms of well-known concepts from statistical physics. In the limit when computational costs are ignored, the maximum expected utility principle is recovered. We discuss the relation to satisficing decision-making procedures as well as links to existing theoretical frameworks and human decision-making experiments that describe deviations from expected utility theory. Since most of the mathematical machinery can be borrowed from statistical physics, the main contribution is to axiomatically derive and interpret the thermodynamic free energy as a model of bounded rational decision-making. △ Less

Submitted 30 July, 2012; v1 submitted 29 April, 2012; originally announced April 2012.

Comments: 26 pages, 5 figures, (under revision since February 2012)

arXiv:1202.4482 [pdf, other]

Metabolic cost as an organizing principle for cooperative learning

Authors: David Balduzzi, Pedro A Ortega, Michel Besserve

Abstract: This paper investigates how neurons can use metabolic cost to facilitate learning at a population level. Although decision-making by individual neurons has been extensively studied, questions regarding how neurons should behave to cooperate effectively remain largely unaddressed. Under assumptions that capture a few basic features of cortical neurons, we show that constraining reward maximization… ▽ More This paper investigates how neurons can use metabolic cost to facilitate learning at a population level. Although decision-making by individual neurons has been extensively studied, questions regarding how neurons should behave to cooperate effectively remain largely unaddressed. Under assumptions that capture a few basic features of cortical neurons, we show that constraining reward maximization by metabolic cost aligns the information content of actions with their expected reward. Thus, metabolic cost provides a mechanism whereby neurons encode expected reward into their outputs. Further, aside from reducing energy expenditures, imposing a tight metabolic constraint also increases the accuracy of empirical estimates of rewards, increasing the robustness of distributed learning. Finally, we present two implementations of metabolically constrained learning that confirm our theoretical finding. These results suggest that metabolic cost may be an organizing principle underlying the neural code, and may also provide a useful guide to the design and analysis of other cooperating populations. △ Less

Submitted 9 February, 2013; v1 submitted 20 February, 2012; originally announced February 2012.

Comments: 14 pages, 2 figures, to appear in Advances in Complex Systems

arXiv:1111.0708 [pdf, ps, other]

Bayesian Causal Induction

Authors: Pedro A. Ortega

Abstract: Discovering causal relationships is a hard task, often hindered by the need for intervention, and often requiring large amounts of data to resolve statistical uncertainty. However, humans quickly arrive at useful causal relationships. One possible reason is that humans extrapolate from past experience to new, unseen situations: that is, they encode beliefs over causal invariances, allowing for sou… ▽ More Discovering causal relationships is a hard task, often hindered by the need for intervention, and often requiring large amounts of data to resolve statistical uncertainty. However, humans quickly arrive at useful causal relationships. One possible reason is that humans extrapolate from past experience to new, unseen situations: that is, they encode beliefs over causal invariances, allowing for sound generalization from the observations they obtain from directly acting in the world. Here we outline a Bayesian model of causal induction where beliefs over competing causal hypotheses are modeled using probability trees. Based on this model, we illustrate why, in the general case, we need interventions plus constraints on our causal hypotheses in order to extract causal information from our experience. △ Less

Submitted 29 November, 2011; v1 submitted 2 November, 2011; originally announced November 2011.

Comments: 4 pages, 4 figures; 2011 NIPS Workshop on Philosophy and Machine Learning

arXiv:1107.5766 [pdf, ps, other]

Information, Utility & Bounded Rationality

Authors: Pedro A. Ortega, Daniel A. Braun

Abstract: Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here we employ an axiomatic framework for bounded rational decision-making based on a thermodynamic interpretation of resource costs as information costs. This leads to a variational "free utility" principle akin to thermodynamical free energy that trades… ▽ More Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here we employ an axiomatic framework for bounded rational decision-making based on a thermodynamic interpretation of resource costs as information costs. This leads to a variational "free utility" principle akin to thermodynamical free energy that trades off utility and information costs. We show that bounded optimal control solutions can be derived from this variational principle, which leads in general to stochastic policies. Furthermore, we show that risk-sensitive and robust (minimax) control schemes fall out naturally from this framework if the environment is considered as a bounded rational and perfectly rational opponent, respectively. When resource costs are ignored, the maximum expected utility principle is recovered. △ Less

Submitted 28 July, 2011; originally announced July 2011.

Comments: 10 pages. The original publication is available at www.springerlink.com

Journal ref: The Fourth Conference on General Artificial Intelligence (AGI-11), 2011

arXiv:1008.2780 [pdf, ps, other]

Logic, Reasoning under Uncertainty and Causality

Authors: Pedro A. Ortega

Abstract: A simple framework for reasoning under uncertainty and intervention is introduced. This is achieved in three steps. First, logic is restated in set-theoretic terms to obtain a framework for reasoning under certainty. Second, this framework is extended to model reasoning under uncertainty. Finally, causal spaces are introduced and shown how they provide enough information to model knowledge contain… ▽ More A simple framework for reasoning under uncertainty and intervention is introduced. This is achieved in three steps. First, logic is restated in set-theoretic terms to obtain a framework for reasoning under certainty. Second, this framework is extended to model reasoning under uncertainty. Finally, causal spaces are introduced and shown how they provide enough information to model knowledge containing causal information about the world. △ Less

Submitted 16 August, 2010; originally announced August 2010.

Comments: 13 pages, 6 figures

arXiv:1007.0940 [pdf, ps, other]

An axiomatic formalization of bounded rationality based on a utility-information equivalence

Authors: Pedro A. Ortega, Daniel A. Braun

Abstract: Classic decision-theory is based on the maximum expected utility (MEU) principle, but crucially ignores the resource costs incurred when determining optimal decisions. Here we propose an axiomatic framework for bounded decision-making that considers resource costs. Agents are formalized as probability measures over input-output streams. We postulate that any such probability measure can be assigne… ▽ More Classic decision-theory is based on the maximum expected utility (MEU) principle, but crucially ignores the resource costs incurred when determining optimal decisions. Here we propose an axiomatic framework for bounded decision-making that considers resource costs. Agents are formalized as probability measures over input-output streams. We postulate that any such probability measure can be assigned a corresponding conjugate utility function based on three axioms: utilities should be real-valued, additive and monotonic mappings of probabilities. We show that these axioms enforce a unique conversion law between utility and probability (and thereby, information). Moreover, we show that this relation can be characterized as a variational principle: given a utility function, its conjugate probability measure maximizes a free utility functional. Transformations of probability measures can then be formalized as a change in free utility due to the addition of new constraints expressed by a target utility function. Accordingly, one obtains a criterion to choose a probability measure that trades off the maximization of a target utility function and the cost of the deviation from a reference distribution. We show that optimal control, adaptive estimation and adaptive control problems can be solved this way in a resource-efficient way. When resource costs are ignored, the MEU principle is recovered. Our formalization might thus provide a principled approach to bounded rationality that establishes a close link to information theory. △ Less

Submitted 6 July, 2010; originally announced July 2010.

Comments: 22 pages, 4 figures, 1 table

arXiv:1002.3086 [pdf, ps, other]

Convergence of Bayesian Control Rule

Authors: Pedro A. Ortega, Daniel A. Braun

Abstract: Recently, new approaches to adaptive control have sought to reformulate the problem as a minimization of a relative entropy criterion to obtain tractable solutions. In particular, it has been shown that minimizing the expected deviation from the causal input-output dependencies of the true plant leads to a new promising stochastic control rule called the Bayesian control rule. This work proves t… ▽ More Recently, new approaches to adaptive control have sought to reformulate the problem as a minimization of a relative entropy criterion to obtain tractable solutions. In particular, it has been shown that minimizing the expected deviation from the causal input-output dependencies of the true plant leads to a new promising stochastic control rule called the Bayesian control rule. This work proves the convergence of the Bayesian control rule under two sufficient assumptions: boundedness, which is an ergodicity condition; and consistency, which is an instantiation of the sure-thing principle. △ Less

Submitted 16 February, 2010; originally announced February 2010.

Comments: 8 pages, 7 figures

arXiv:1002.1480 [pdf, ps, other]

A Minimum Relative Entropy Controller for Undiscounted Markov Decision Processes

Authors: Pedro A. Ortega, Daniel A. Braun

Abstract: Adaptive control problems are notoriously difficult to solve even in the presence of plant-specific controllers. One way to by-pass the intractable computation of the optimal policy is to restate the adaptive control as the minimization of the relative entropy of a controller that ignores the true plant dynamics from an informed controller. The solution is given by the Bayesian control rule-a se… ▽ More Adaptive control problems are notoriously difficult to solve even in the presence of plant-specific controllers. One way to by-pass the intractable computation of the optimal policy is to restate the adaptive control as the minimization of the relative entropy of a controller that ignores the true plant dynamics from an informed controller. The solution is given by the Bayesian control rule-a set of equations characterizing a stochastic adaptive controller for the class of possible plant dynamics. Here, the Bayesian control rule is applied to derive BCR-MDP, a controller to solve undiscounted Markov decision processes with finite state and action spaces and unknown dynamics. In particular, we derive a non-parametric conjugate prior distribution over the policy space that encapsulates the agent's whole relevant history and we present a Gibbs sampler to draw random policies from this distribution. Preliminary results show that BCR-MDP successfully avoids sub-optimal limit cycles due to its built-in mechanism to balance exploration versus exploitation. △ Less

Submitted 7 February, 2010; originally announced February 2010.

Comments: 8 pages, 3 figures, 3 tables

arXiv:0911.5106 [pdf, ps, other]

A conversion between utility and information

Authors: Pedro A. Ortega, Daniel A. Braun

Abstract: Rewards typically express desirabilities or preferences over a set of alternatives. Here we propose that rewards can be defined for any probability distribution based on three desiderata, namely that rewards should be real-valued, additive and order-preserving, where the latter implies that more probable events should also be more desirable. Our main result states that rewards are then uniquely… ▽ More Rewards typically express desirabilities or preferences over a set of alternatives. Here we propose that rewards can be defined for any probability distribution based on three desiderata, namely that rewards should be real-valued, additive and order-preserving, where the latter implies that more probable events should also be more desirable. Our main result states that rewards are then uniquely determined by the negative information content. To analyze stochastic processes, we define the utility of a realization as its reward rate. Under this interpretation, we show that the expected utility of a stochastic process is its negative entropy rate. Furthermore, we apply our results to analyze agent-environment interactions. We show that the expected utility that will actually be achieved by the agent is given by the negative cross-entropy from the input-output (I/O) distribution of the coupled interaction system and the agent's I/O distribution. Thus, our results allow for an information-theoretic interpretation of the notion of utility and the characterization of agent-environment interactions in terms of entropy dynamics. △ Less

Submitted 30 December, 2009; v1 submitted 26 November, 2009; originally announced November 2009.

Comments: AGI-2010. 6 pages, 1 figure

Journal ref: AGI-2010

arXiv:0911.5104 [pdf, ps, other]

A Bayesian Rule for Adaptive Control based on Causal Interventions

Authors: Pedro A. Ortega, Daniel A. Braun

Abstract: Explaining adaptive behavior is a central problem in artificial intelligence research. Here we formalize adaptive agents as mixture distributions over sequences of inputs and outputs (I/O). Each distribution of the mixture constitutes a `possible world', but the agent does not know which of the possible worlds it is actually facing. The problem is to adapt the I/O stream in a way that is compati… ▽ More Explaining adaptive behavior is a central problem in artificial intelligence research. Here we formalize adaptive agents as mixture distributions over sequences of inputs and outputs (I/O). Each distribution of the mixture constitutes a `possible world', but the agent does not know which of the possible worlds it is actually facing. The problem is to adapt the I/O stream in a way that is compatible with the true world. A natural measure of adaptation can be obtained by the Kullback-Leibler (KL) divergence between the I/O distribution of the true world and the I/O distribution expected by the agent that is uncertain about possible worlds. In the case of pure input streams, the Bayesian mixture provides a well-known solution for this problem. We show, however, that in the case of I/O streams this solution breaks down, because outputs are issued by the agent itself and require a different probabilistic syntax as provided by intervention calculus. Based on this calculus, we obtain a Bayesian control rule that allows modeling adaptive behavior with mixture distributions over I/O streams. This rule might allow for a novel approach to adaptive control based on a minimum KL-principle. △ Less

Submitted 30 December, 2009; v1 submitted 26 November, 2009; originally announced November 2009.

Comments: AGI-2010. 6 pages, 2 figures

Journal ref: AGI-2010

arXiv:0810.3605 [pdf, ps, other]

A Minimum Relative Entropy Principle for Learning and Acting

Authors: Pedro A. Ortega, Daniel A. Braun

Abstract: This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is an agent that has been designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adaptive agent from the expert that is most suitable for the unknown environment. I… ▽ More This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is an agent that has been designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adaptive agent from the expert that is most suitable for the unknown environment. If the agent is a passive observer, then the optimal solution is the well-known Bayesian predictor. However, if the agent is active, then its past actions need to be treated as causal interventions on the I/O stream rather than normal probability conditions. Here it is shown that the solution to this new variational problem is given by a stochastic controller called the Bayesian control rule, which implements adaptive behavior as a mixture of experts. Furthermore, it is shown that under mild assumptions, the Bayesian control rule converges to the control law of the most suitable expert. △ Less

Submitted 10 April, 2010; v1 submitted 20 October, 2008; originally announced October 2008.

Comments: 36 pages, 11 figures

Showing 1–34 of 34 results for author: Ortega, P A