-
Using deep reinforcement learning to promote sustainable human behaviour on a common pool resource problem
Authors:
Raphael Koster,
Miruna Pîslar,
Andrea Tacchetti,
Jan Balaguer,
Leqi Liu,
Romuald Elie,
Oliver P. Hauser,
Karl Tuyls,
Matt Botvinick,
Christopher Summerfield
Abstract:
A canonical social dilemma arises when finite resources are allocated to a group of people, who can choose to either reciprocate with interest, or keep the proceeds for themselves. What resource allocation mechanisms will encourage levels of reciprocation that sustain the commons? Here, in an iterated multiplayer trust game, we use deep reinforcement learning (RL) to design an allocation mechanism…
▽ More
A canonical social dilemma arises when finite resources are allocated to a group of people, who can choose to either reciprocate with interest, or keep the proceeds for themselves. What resource allocation mechanisms will encourage levels of reciprocation that sustain the commons? Here, in an iterated multiplayer trust game, we use deep reinforcement learning (RL) to design an allocation mechanism that endogenously promotes sustainable contributions from human participants to a common pool resource. We first trained neural networks to behave like human players, creating a stimulated economy that allowed us to study how different mechanisms influenced the dynamics of receipt and reciprocation. We then used RL to train a social planner to maximise aggregate return to players. The social planner discovered a redistributive policy that led to a large surplus and an inclusive economy, in which players made roughly equal gains. The RL agent increased human surplus over baseline mechanisms based on unrestricted welfare or conditional cooperation, by conditioning its generosity on available resources and temporarily sanctioning defectors by allocating fewer resources to them. Examining the AI policy allowed us to develop an explainable mechanism that performed similarly and was more popular among players. Deep reinforcement learning can be used to discover mechanisms that promote sustainable human behaviour.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
How should the advent of large language models affect the practice of science?
Authors:
Marcel Binz,
Stephan Alaniz,
Adina Roskies,
Balazs Aczel,
Carl T. Bergstrom,
Colin Allen,
Daniel Schad,
Dirk Wulff,
Jevin D. West,
Qiong Zhang,
Richard M. Shiffrin,
Samuel J. Gershman,
Ven Popov,
Emily M. Bender,
Marco Marelli,
Matthew M. Botvinick,
Zeynep Akata,
Eric Schulz
Abstract:
Large language models (LLMs) are being increasingly incorporated into scientific workflows. However, we have yet to fully grasp the implications of this integration. How should the advent of large language models affect the practice of science? For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate. Schu…
▽ More
Large language models (LLMs) are being increasingly incorporated into scientific workflows. However, we have yet to fully grasp the implications of this integration. How should the advent of large language models affect the practice of science? For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate. Schulz et al. make the argument that working with LLMs is not fundamentally different from working with human collaborators, while Bender et al. argue that LLMs are often misused and over-hyped, and that their limitations warrant a focus on more specialized, easily interpretable tools. Marelli et al. emphasize the importance of transparent attribution and responsible use of LLMs. Finally, Botvinick and Gershman advocate that humans should retain responsibility for determining the scientific roadmap. To facilitate the discussion, the four perspectives are complemented with a response from each group. By putting these different perspectives in conversation, we aim to bring attention to important considerations within the academic community regarding the adoption of LLMs and their impact on both current and future scientific practices.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Meta-in-context learning in large language models
Authors:
Julian Coda-Forno,
Marcel Binz,
Zeynep Akata,
Matthew Botvinick,
Jane X. Wang,
Eric Schulz
Abstract:
Large language models have shown tremendous performance in a variety of tasks. In-context learning -- the ability to improve at a task after being provided with a number of demonstrations -- is seen as one of the main contributors to their success. In the present paper, we demonstrate that the in-context learning abilities of large language models can be recursively improved via in-context learnin…
▽ More
Large language models have shown tremendous performance in a variety of tasks. In-context learning -- the ability to improve at a task after being provided with a number of demonstrations -- is seen as one of the main contributors to their success. In the present paper, we demonstrate that the in-context learning abilities of large language models can be recursively improved via in-context learning itself. We coin this phenomenon meta-in-context learning. Looking at two idealized domains, a one-dimensional regression task and a two-armed bandit task, we show that meta-in-context learning adaptively reshapes a large language model's priors over expected tasks. Furthermore, we find that meta-in-context learning modifies the in-context learning strategies of such models. Finally, we extend our approach to a benchmark of real-world regression problems where we observe competitive performance to traditional learning algorithms. Taken together, our work improves our understanding of in-context learning and paves the way toward adapting large language models to the environment they are applied purely through meta-in-context learning rather than traditional finetuning.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Meta-Learned Models of Cognition
Authors:
Marcel Binz,
Ishita Dasgupta,
Akshay Jagadish,
Matthew Botvinick,
Jane X. Wang,
Eric Schulz
Abstract:
Meta-learning is a framework for learning learning algorithms through repeated interactions with an environment as opposed to designing them by hand. In recent years, this framework has established itself as a promising tool for building models of human cognition. Yet, a coherent research program around meta-learned models of cognition is still missing. The purpose of this article is to synthesize…
▽ More
Meta-learning is a framework for learning learning algorithms through repeated interactions with an environment as opposed to designing them by hand. In recent years, this framework has established itself as a promising tool for building models of human cognition. Yet, a coherent research program around meta-learned models of cognition is still missing. The purpose of this article is to synthesize previous work in this field and establish such a research program. We rely on three key pillars to accomplish this goal. We first point out that meta-learning can be used to construct Bayes-optimal learning algorithms. This result not only implies that any behavioral phenomenon that can be explained by a Bayesian model can also be explained by a meta-learned model but also allows us to draw strong connections to the rational analysis of cognition. We then discuss several advantages of the meta-learning framework over traditional Bayesian methods. In particular, we argue that meta-learning can be applied to situations where Bayesian inference is impossible and that it enables us to make rational models of cognition more realistic, either by incorporating limited computational resources or neuroscientific knowledge. Finally, we reexamine prior studies from psychology and neuroscience that have applied meta-learning and put them into the context of these new insights. In summary, our work highlights that meta-learning considerably extends the scope of rational analysis and thereby of cognitive theories more generally.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
DiscoGen: Learning to Discover Gene Regulatory Networks
Authors:
Nan Rosemary Ke,
Sara-Jane Dunn,
Jorg Bornschein,
Silvia Chiappa,
Melanie Rey,
Jean-Baptiste Lespiau,
Albin Cassirer,
Jane Wang,
Theophane Weber,
David Barrett,
Matthew Botvinick,
Anirudh Goyal,
Mike Mozer,
Danilo Rezende
Abstract:
Accurately inferring Gene Regulatory Networks (GRNs) is a critical and challenging task in biology. GRNs model the activatory and inhibitory interactions between genes and are inherently causal in nature. To accurately identify GRNs, perturbational data is required. However, most GRN discovery methods only operate on observational data. Recent advances in neural network-based causal discovery meth…
▽ More
Accurately inferring Gene Regulatory Networks (GRNs) is a critical and challenging task in biology. GRNs model the activatory and inhibitory interactions between genes and are inherently causal in nature. To accurately identify GRNs, perturbational data is required. However, most GRN discovery methods only operate on observational data. Recent advances in neural network-based causal discovery methods have significantly improved causal discovery, including handling interventional data, improvements in performance and scalability. However, applying state-of-the-art (SOTA) causal discovery methods in biology poses challenges, such as noisy data and a large number of samples. Thus, adapting the causal discovery methods is necessary to handle these challenges. In this paper, we introduce DiscoGen, a neural network-based GRN discovery method that can denoise gene expression measurements and handle interventional data. We demonstrate that our model outperforms SOTA neural network-based causal discovery methods.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Fine-tuning language models to find agreement among humans with diverse preferences
Authors:
Michiel A. Bakker,
Martin J. Chadwick,
Hannah R. Sheahan,
Michael Henry Tessler,
Lucy Campbell-Gillingham,
Jan Balaguer,
Nat McAleese,
Amelia Glaese,
John Aslanides,
Matthew M. Botvinick,
Christopher Summerfield
Abstract:
Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might…
▽ More
Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., "should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality. A reward model is then trained to predict individual preferences, enabling it to quantify and rank consensus statements in terms of their appeal to the overall group, defined according to different aggregation (social welfare) functions. The model produces consensus statements that are preferred by human users over those from prompted LLMs (>70%) and significantly outperforms a tight fine-tuned baseline that lacks the final ranking step. Further, our best model's consensus statements are preferred over the best human-generated opinions (>65%). We find that when we silently constructed consensus statements from only a subset of group members, those who were excluded were more likely to dissent, revealing the sensitivity of the consensus to individual contributions. These results highlight the potential to use LLMs to help groups of humans align their values with one another.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
A Unified Theory of Dual-Process Control
Authors:
Ted Moskovitz,
Kevin Miller,
Maneesh Sahani,
Matthew M. Botvinick
Abstract:
Dual-process theories play a central role in both psychology and neuroscience, figuring prominently in fields ranging from executive control to reward-based learning to judgment and decision making. In each of these domains, two mechanisms appear to operate concurrently, one relatively high in computational complexity, the other relatively simple. Why is neural information processing organized in…
▽ More
Dual-process theories play a central role in both psychology and neuroscience, figuring prominently in fields ranging from executive control to reward-based learning to judgment and decision making. In each of these domains, two mechanisms appear to operate concurrently, one relatively high in computational complexity, the other relatively simple. Why is neural information processing organized in this way? We propose an answer to this question based on the notion of compression. The key insight is that dual-process structure can enhance adaptive behavior by allowing an agent to minimize the description length of its own behavior. We apply a single model based on this observation to findings from research on executive control, reward-based learning, and judgment and decision making, showing that seemingly diverse dual-process phenomena can be understood as domain-specific consequences of a single underlying set of computational principles.
△ Less
Submitted 10 October, 2023; v1 submitted 13 November, 2022;
originally announced November 2022.
-
Toward Next-Generation Artificial Intelligence: Catalyzing the NeuroAI Revolution
Authors:
Anthony Zador,
Sean Escola,
Blake Richards,
Bence Ölveczky,
Yoshua Bengio,
Kwabena Boahen,
Matthew Botvinick,
Dmitri Chklovskii,
Anne Churchland,
Claudia Clopath,
James DiCarlo,
Surya Ganguli,
Jeff Hawkins,
Konrad Koerding,
Alexei Koulakov,
Yann LeCun,
Timothy Lillicrap,
Adam Marblestone,
Bruno Olshausen,
Alexandre Pouget,
Cristina Savin,
Terrence Sejnowski,
Eero Simoncelli,
Sara Solla,
David Sussillo
, et al. (2 additional authors not shown)
Abstract:
Neuroscience has long been an essential driver of progress in artificial intelligence (AI). We propose that to accelerate progress in AI, we must invest in fundamental research in NeuroAI. A core component of this is the embodied Turing test, which challenges AI animal models to interact with the sensorimotor world at skill levels akin to their living counterparts. The embodied Turing test shifts…
▽ More
Neuroscience has long been an essential driver of progress in artificial intelligence (AI). We propose that to accelerate progress in AI, we must invest in fundamental research in NeuroAI. A core component of this is the embodied Turing test, which challenges AI animal models to interact with the sensorimotor world at skill levels akin to their living counterparts. The embodied Turing test shifts the focus from those capabilities like game playing and language that are especially well-developed or uniquely human to those capabilities, inherited from over 500 million years of evolution, that are shared with all animals. Building models that can pass the embodied Turing test will provide a roadmap for the next generation of AI.
△ Less
Submitted 22 February, 2023; v1 submitted 15 October, 2022;
originally announced October 2022.
-
Adaptive patch foraging in deep reinforcement learning agents
Authors:
Nathan J. Wispinski,
Andrew Butcher,
Kory W. Mathewson,
Craig S. Chapman,
Matthew M. Botvinick,
Patrick M. Pilarski
Abstract:
Patch foraging is one of the most heavily studied behavioral optimization challenges in biology. However, despite its importance to biological intelligence, this behavioral optimization problem is understudied in artificial intelligence research. Patch foraging is especially amenable to study given that it has a known optimal solution, which may be difficult to discover given current techniques in…
▽ More
Patch foraging is one of the most heavily studied behavioral optimization challenges in biology. However, despite its importance to biological intelligence, this behavioral optimization problem is understudied in artificial intelligence research. Patch foraging is especially amenable to study given that it has a known optimal solution, which may be difficult to discover given current techniques in deep reinforcement learning. Here, we investigate deep reinforcement learning agents in an ecological patch foraging task. For the first time, we show that machine learning agents can learn to patch forage adaptively in patterns similar to biological foragers, and approach optimal patch foraging behavior when accounting for temporal discounting. Finally, we show emergent internal dynamics in these agents that resemble single-cell recordings from foraging non-human primates, which complements experimental and theoretical work on the neural mechanisms of biological foraging. This work suggests that agents interacting in complex environments with ecologically valid pressures arrive at common solutions, suggesting the emergence of foundational computations behind adaptive, intelligent behavior in both biological and artificial agents.
△ Less
Submitted 21 April, 2023; v1 submitted 14 October, 2022;
originally announced October 2022.
-
Minimum Description Length Control
Authors:
Ted Moskovitz,
Ta-Chu Kao,
Maneesh Sahani,
Matthew M. Botvinick
Abstract:
We propose a novel framework for multitask reinforcement learning based on the minimum description length (MDL) principle. In this approach, which we term MDL-control (MDL-C), the agent learns the common structure among the tasks with which it is faced and then distills it into a simpler representation which facilitates faster convergence and generalization to new tasks. In doing so, MDL-C natural…
▽ More
We propose a novel framework for multitask reinforcement learning based on the minimum description length (MDL) principle. In this approach, which we term MDL-control (MDL-C), the agent learns the common structure among the tasks with which it is faced and then distills it into a simpler representation which facilitates faster convergence and generalization to new tasks. In doing so, MDL-C naturally balances adaptation to each task with epistemic uncertainty about the task distribution. We motivate MDL-C via formal connections between the MDL principle and Bayesian inference, derive theoretical performance guarantees, and demonstrate MDL-C's empirical effectiveness on both discrete and high-dimensional continuous control tasks.
△ Less
Submitted 24 July, 2022; v1 submitted 17 July, 2022;
originally announced July 2022.
-
The Frost Hollow Experiments: Pavlovian Signalling as a Path to Coordination and Communication Between Agents
Authors:
Patrick M. Pilarski,
Andrew Butcher,
Elnaz Davoodi,
Michael Bradley Johanson,
Dylan J. A. Brenneis,
Adam S. R. Parker,
Leslie Acker,
Matthew M. Botvinick,
Joseph Modayil,
Adam White
Abstract:
Learned communication between agents is a powerful tool when approaching decision-making problems that are hard to overcome by any single agent in isolation. However, continual coordination and communication learning between machine agents or human-machine partnerships remains a challenging open problem. As a stepping stone toward solving the continual communication learning problem, in this paper…
▽ More
Learned communication between agents is a powerful tool when approaching decision-making problems that are hard to overcome by any single agent in isolation. However, continual coordination and communication learning between machine agents or human-machine partnerships remains a challenging open problem. As a stepping stone toward solving the continual communication learning problem, in this paper we contribute a multi-faceted study into what we term Pavlovian signalling -- a process by which learned, temporally extended predictions made by one agent inform decision-making by another agent with different perceptual access to their shared environment. We seek to establish how different temporal processes and representational choices impact Pavlovian signalling between learning agents. To do so, we introduce a partially observable decision-making domain we call the Frost Hollow. In this domain a prediction learning agent and a reinforcement learning agent are coupled into a two-part decision-making system that seeks to acquire sparse reward while avoiding time-conditional hazards. We evaluate two domain variations: 1) machine prediction and control learning in a linear walk, and 2) a prediction learning machine interacting with a human participant in a virtual reality environment. Our results showcase the speed of learning for Pavlovian signalling, the impact that different temporal representations do (and do not) have on agent-agent coordination, and how temporal aliasing impacts agent-agent and human-agent interactions differently. As a main contribution, we establish Pavlovian signalling as a natural bridge between fixed signalling paradigms and fully adaptive communication learning. Our results therefore point to an actionable, constructivist path towards continual communication learning between reinforcement learning agents, with potential impact in a range of real-world settings.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
HiP: Hierarchical Perceiver
Authors:
Joao Carreira,
Skanda Koppula,
Daniel Zoran,
Adria Recasens,
Catalin Ionescu,
Olivier Henaff,
Evan Shelhamer,
Relja Arandjelovic,
Matt Botvinick,
Oriol Vinyals,
Karen Simonyan,
Andrew Zisserman,
Andrew Jaegle
Abstract:
General perception systems such as Perceivers can process arbitrary modalities in any combination and are able to handle up to a few hundred thousand inputs. They achieve this generality by using exclusively global attention operations. This however hinders them from scaling up to the inputs sizes required to process raw high-resolution images or video. In this paper, we show that some degree of l…
▽ More
General perception systems such as Perceivers can process arbitrary modalities in any combination and are able to handle up to a few hundred thousand inputs. They achieve this generality by using exclusively global attention operations. This however hinders them from scaling up to the inputs sizes required to process raw high-resolution images or video. In this paper, we show that some degree of locality can be introduced back into these models, greatly improving their efficiency while preserving their generality. To scale them further, we introduce a self-supervised approach that enables learning dense low-dimensional positional embeddings for very large signals. We call the resulting model a Hierarchical Perceiver (HiP). In sum our contributions are: 1) scaling Perceiver-type models to raw high-resolution images and audio+video, 2) showing the feasibility of learning 1M+ positional embeddings from scratch using masked auto-encoding, 3) demonstrating competitive performance on raw data from ImageNet, AudioSet, PASCAL VOC, ModelNet40 and Kinetics datasets with the same exact, unchanged model and without specialized preprocessing or any tokenization.
△ Less
Submitted 3 November, 2022; v1 submitted 22 February, 2022;
originally announced February 2022.
-
HCMD-zero: Learning Value Aligned Mechanisms from Data
Authors:
Jan Balaguer,
Raphael Koster,
Ari Weinstein,
Lucy Campbell-Gillingham,
Christopher Summerfield,
Matthew Botvinick,
Andrea Tacchetti
Abstract:
Artificial learning agents are mediating a larger and larger number of interactions among humans, firms, and organizations, and the intersection between mechanism design and machine learning has been heavily investigated in recent years. However, mechanism design methods often make strong assumptions on how participants behave (e.g. rationality), on the kind of knowledge designers have access to a…
▽ More
Artificial learning agents are mediating a larger and larger number of interactions among humans, firms, and organizations, and the intersection between mechanism design and machine learning has been heavily investigated in recent years. However, mechanism design methods often make strong assumptions on how participants behave (e.g. rationality), on the kind of knowledge designers have access to a priori (e.g. access to strong baseline mechanisms), or on what the goal of the mechanism should be (e.g. total welfare). Here we introduce HCMD-zero, a general purpose method to construct mechanisms making none of these three assumptions. HCMD-zero learns to mediate interactions among participants and adjusts the mechanism parameters to make itself more likely to be preferred by participants. It does so by remaining engaged in an electoral contest with copies of itself, thereby accessing direct feedback from participants. We test our method on a stylized resource allocation game that highlights the tension between productivity, equality and the temptation to free ride. HCMD-zero produces a mechanism that is preferred by human participants over a strong baseline, it does so automatically, without requiring prior knowledge, and using human behavioral trajectories sparingly and effectively. Our analysis shows HCMD-zero consistently makes the mechanism policy more and more likely to be preferred by human participants over the course of training, and that it results in a mechanism with an interpretable and intuitive policy.
△ Less
Submitted 20 May, 2022; v1 submitted 21 February, 2022;
originally announced February 2022.
-
General-purpose, long-context autoregressive modeling with Perceiver AR
Authors:
Curtis Hawthorne,
Andrew Jaegle,
Cătălina Cangea,
Sebastian Borgeaud,
Charlie Nash,
Mateusz Malinowski,
Sander Dieleman,
Oriol Vinyals,
Matthew Botvinick,
Ian Simon,
Hannah Sheahan,
Neil Zeghidour,
Jean-Baptiste Alayrac,
João Carreira,
Jesse Engel
Abstract:
Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture this long-range structure. We develop Perceiver AR, an autoregressive, modality-agnostic…
▽ More
Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture this long-range structure. We develop Perceiver AR, an autoregressive, modality-agnostic architecture which uses cross-attention to map long-range inputs to a small number of latents while also maintaining end-to-end causal masking. Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms. When trained on images or music, Perceiver AR generates outputs with clear long-term coherence and structure. Our architecture also obtains state-of-the-art likelihood on long-sequence benchmarks, including 64 x 64 ImageNet images and PG-19 books.
△ Less
Submitted 14 June, 2022; v1 submitted 15 February, 2022;
originally announced February 2022.
-
Human-centered mechanism design with Democratic AI
Authors:
Raphael Koster,
Jan Balaguer,
Andrea Tacchetti,
Ari Weinstein,
Tina Zhu,
Oliver Hauser,
Duncan Williams,
Lucy Campbell-Gillingham,
Phoebe Thacker,
Matthew Botvinick,
Christopher Summerfield
Abstract:
Building artificial intelligence (AI) that aligns with human values is an unsolved problem. Here, we developed a human-in-the-loop research pipeline called Democratic AI, in which reinforcement learning is used to design a social mechanism that humans prefer by majority. A large group of humans played an online investment game that involved deciding whether to keep a monetary endowment or to share…
▽ More
Building artificial intelligence (AI) that aligns with human values is an unsolved problem. Here, we developed a human-in-the-loop research pipeline called Democratic AI, in which reinforcement learning is used to design a social mechanism that humans prefer by majority. A large group of humans played an online investment game that involved deciding whether to keep a monetary endowment or to share it with others for collective benefit. Shared revenue was returned to players under two different redistribution mechanisms, one designed by the AI and the other by humans. The AI discovered a mechanism that redressed initial wealth imbalance, sanctioned free riders, and successfully won the majority vote. By optimizing for human preferences, Democratic AI may be a promising method for value-aligned policy innovation.
△ Less
Submitted 27 January, 2022;
originally announced January 2022.
-
Assessing Human Interaction in Virtual Reality With Continually Learning Prediction Agents Based on Reinforcement Learning Algorithms: A Pilot Study
Authors:
Dylan J. A. Brenneis,
Adam S. Parker,
Michael Bradley Johanson,
Andrew Butcher,
Elnaz Davoodi,
Leslie Acker,
Matthew M. Botvinick,
Joseph Modayil,
Adam White,
Patrick M. Pilarski
Abstract:
Artificial intelligence systems increasingly involve continual learning to enable flexibility in general situations that are not encountered during system training. Human interaction with autonomous systems is broadly studied, but research has hitherto under-explored interactions that occur while the system is actively learning, and can noticeably change its behaviour in minutes. In this pilot stu…
▽ More
Artificial intelligence systems increasingly involve continual learning to enable flexibility in general situations that are not encountered during system training. Human interaction with autonomous systems is broadly studied, but research has hitherto under-explored interactions that occur while the system is actively learning, and can noticeably change its behaviour in minutes. In this pilot study, we investigate how the interaction between a human and a continually learning prediction agent develops as the agent develops competency. Additionally, we compare two different agent architectures to assess how representational choices in agent design affect the human-agent interaction. We develop a virtual reality environment and a time-based prediction task wherein learned predictions from a reinforcement learning (RL) algorithm augment human predictions. We assess how a participant's performance and behaviour in this task differs across agent types, using both quantitative and qualitative analyses. Our findings suggest that human trust of the system may be influenced by early interactions with the agent, and that trust in turn affects strategic behaviour, but limitations of the pilot study rule out any conclusive statement. We identify trust as a key feature of interaction to focus on when considering RL-based technologies, and make several recommendations for modification to this study in preparation for a larger-scale investigation. A video summary of this paper can be found at https://youtu.be/oVYJdnBqTwQ .
△ Less
Submitted 22 April, 2022; v1 submitted 14 December, 2021;
originally announced December 2021.
-
Collaborating with Humans without Human Data
Authors:
DJ Strouse,
Kevin R. McKee,
Matt Botvinick,
Edward Hughes,
Richard Everett
Abstract:
Collaborating with humans requires rapidly adapting to their individual strengths, weaknesses, and preferences. Unfortunately, most standard multi-agent reinforcement learning techniques, such as self-play (SP) or population play (PP), produce agents that overfit to their training partners and do not generalize well to humans. Alternatively, researchers can collect human data, train a human model…
▽ More
Collaborating with humans requires rapidly adapting to their individual strengths, weaknesses, and preferences. Unfortunately, most standard multi-agent reinforcement learning techniques, such as self-play (SP) or population play (PP), produce agents that overfit to their training partners and do not generalize well to humans. Alternatively, researchers can collect human data, train a human model using behavioral cloning, and then use that model to train "human-aware" agents ("behavioral cloning play", or BCP). While such an approach can improve the generalization of agents to new human co-players, it involves the onerous and expensive step of collecting large amounts of human data first. Here, we study the problem of how to train agents that collaborate well with human partners without using human data. We argue that the crux of the problem is to produce a diverse set of training partners. Drawing inspiration from successful multi-agent approaches in competitive domains, we find that a surprisingly simple approach is highly effective. We train our agent partner as the best response to a population of self-play agents and their past checkpoints taken throughout training, a method we call Fictitious Co-Play (FCP). Our experiments focus on a two-player collaborative cooking simulator that has recently been proposed as a challenge problem for coordination with humans. We find that FCP agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners. Furthermore, humans also report a strong subjective preference to partnering with FCP agents over all baselines.
△ Less
Submitted 7 January, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Authors:
Andrew Jaegle,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Carl Doersch,
Catalin Ionescu,
David Ding,
Skanda Koppula,
Daniel Zoran,
Andrew Brock,
Evan Shelhamer,
Olivier Hénaff,
Matthew M. Botvinick,
Andrew Zisserman,
Oriol Vinyals,
Joāo Carreira
Abstract:
A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data f…
▽ More
A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data from arbitrary settings while scaling linearly with the size of inputs and outputs. Our model augments the Perceiver with a flexible querying mechanism that enables outputs of various sizes and semantics, doing away with the need for task-specific architecture engineering. The same architecture achieves strong results on tasks spanning natural language and visual understanding, multi-task and multi-modal reasoning, and StarCraft II. As highlights, Perceiver IO outperforms a Transformer-based BERT baseline on the GLUE language benchmark despite removing input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation with no explicit mechanisms for multiscale correspondence.
△ Less
Submitted 15 March, 2022; v1 submitted 30 July, 2021;
originally announced July 2021.
-
SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition
Authors:
Rishabh Kabra,
Daniel Zoran,
Goker Erdogan,
Loic Matthey,
Antonia Creswell,
Matthew Botvinick,
Alexander Lerchner,
Christopher P. Burgess
Abstract:
To help agents reason about scenes in terms of their building blocks, we wish to extract the compositional structure of any given scene (in particular, the configuration and characteristics of objects comprising the scene). This problem is especially difficult when scene structure needs to be inferred while also estimating the agent's location/viewpoint, as the two variables jointly give rise to t…
▽ More
To help agents reason about scenes in terms of their building blocks, we wish to extract the compositional structure of any given scene (in particular, the configuration and characteristics of objects comprising the scene). This problem is especially difficult when scene structure needs to be inferred while also estimating the agent's location/viewpoint, as the two variables jointly give rise to the agent's observations. We present an unsupervised variational approach to this problem. Leveraging the shared structure that exists across different scenes, our model learns to infer two sets of latent representations from RGB video input alone: a set of "object" latents, corresponding to the time-invariant, object-level contents of the scene, as well as a set of "frame" latents, corresponding to global time-varying elements such as viewpoint. This factorization of latents allows our model, SIMONe, to represent object attributes in an allocentric manner which does not depend on viewpoint. Moreover, it allows us to disentangle object dynamics and summarize their trajectories as time-abstracted, view-invariant, per-object properties. We demonstrate these capabilities, as well as the model's performance in terms of view synthesis and instance segmentation, across three procedurally generated video datasets.
△ Less
Submitted 6 December, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.
-
A multi-agent reinforcement learning model of reputation and cooperation in human groups
Authors:
Kevin R. McKee,
Edward Hughes,
Tina O. Zhu,
Martin J. Chadwick,
Raphael Koster,
Antonio Garcia Castaneda,
Charlie Beattie,
Thore Graepel,
Matt Botvinick,
Joel Z. Leibo
Abstract:
Collective action demands that individuals efficiently coordinate how much, where, and when to cooperate. Laboratory experiments have extensively explored the first part of this process, demonstrating that a variety of social-cognitive mechanisms influence how much individuals choose to invest in group efforts. However, experimental research has been unable to shed light on how social cognitive me…
▽ More
Collective action demands that individuals efficiently coordinate how much, where, and when to cooperate. Laboratory experiments have extensively explored the first part of this process, demonstrating that a variety of social-cognitive mechanisms influence how much individuals choose to invest in group efforts. However, experimental research has been unable to shed light on how social cognitive mechanisms contribute to the where and when of collective action. We build and test a computational model of human behavior in Clean Up, a social dilemma task popular in multi-agent reinforcement learning research. We show that human groups effectively cooperate in Clean Up when they can identify group members and track reputations over time, but fail to organize under conditions of anonymity. A multi-agent reinforcement learning model of reputation demonstrates the same difference in cooperation under conditions of identifiability and anonymity. In addition, the model accurately predicts spatial and temporal patterns of group behavior: in this public goods dilemma, the intrinsic motivation for reputation catalyzes the development of a non-territorial, turn-taking strategy to coordinate collective action.
△ Less
Submitted 22 February, 2023; v1 submitted 8 March, 2021;
originally announced March 2021.
-
Synthetic Returns for Long-Term Credit Assignment
Authors:
David Raposo,
Sam Ritter,
Adam Santoro,
Greg Wayne,
Theophane Weber,
Matt Botvinick,
Hado van Hasselt,
Francis Song
Abstract:
Since the earliest days of reinforcement learning, the workhorse method for assigning credit to actions over time has been temporal-difference (TD) learning, which propagates credit backward timestep-by-timestep. This approach suffers when delays between actions and rewards are long and when intervening unrelated events contribute variance to long-term returns. We propose state-associative (SA) le…
▽ More
Since the earliest days of reinforcement learning, the workhorse method for assigning credit to actions over time has been temporal-difference (TD) learning, which propagates credit backward timestep-by-timestep. This approach suffers when delays between actions and rewards are long and when intervening unrelated events contribute variance to long-term returns. We propose state-associative (SA) learning, where the agent learns associations between states and arbitrarily distant future rewards, then propagates credit directly between the two. In this work, we use SA-learning to model the contribution of past states to the current reward. With this model we can predict each state's contribution to the far future, a quantity we call "synthetic returns". TD-learning can then be applied to select actions that maximize these synthetic returns (SRs). We demonstrate the effectiveness of augmenting agents with SRs across a range of tasks on which TD-learning alone fails. We show that the learned SRs are interpretable: they spike for states that occur after critical actions are taken. Finally, we show that our IMPALA-based SR agent solves Atari Skiing -- a game with a lengthy reward delay that posed a major hurdle to deep-RL agents -- 25 times faster than the published state-of-the-art.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents
Authors:
Jane X. Wang,
Michael King,
Nicolas Porcel,
Zeb Kurth-Nelson,
Tina Zhu,
Charlie Deck,
Peter Choy,
Mary Cassin,
Malcolm Reynolds,
Francis Song,
Gavin Buttimore,
David P. Reichert,
Neil Rabinowitz,
Loic Matthey,
Demis Hassabis,
Alexander Lerchner,
Matthew Botvinick
Abstract:
There has been rapidly growing interest in meta-learning as a method for increasing the flexibility and sample efficiency of reinforcement learning. One problem in this area of research, however, has been a scarcity of adequate benchmark tasks. In general, the structure underlying past benchmarks has either been too simple to be inherently interesting, or too ill-defined to support principled anal…
▽ More
There has been rapidly growing interest in meta-learning as a method for increasing the flexibility and sample efficiency of reinforcement learning. One problem in this area of research, however, has been a scarcity of adequate benchmark tasks. In general, the structure underlying past benchmarks has either been too simple to be inherently interesting, or too ill-defined to support principled analysis. In the present work, we introduce a new benchmark for meta-RL research, emphasizing transparency and potential for in-depth analysis as well as structural richness. Alchemy is a 3D video game, implemented in Unity, which involves a latent causal structure that is resampled procedurally from episode to episode, affording structure learning, online inference, hypothesis testing and action sequencing based on abstract domain knowledge. We evaluate a pair of powerful RL agents on Alchemy and present an in-depth analysis of one of these agents. Results clearly indicate a frank and specific failure of meta-learning, providing validation for Alchemy as a challenging benchmark for meta-RL. Concurrent with this report, we are releasing Alchemy as public resource, together with a suite of analysis tools and sample agent trajectories.
△ Less
Submitted 20 October, 2021; v1 submitted 4 February, 2021;
originally announced February 2021.
-
Attention over learned object embeddings enables complex visual reasoning
Authors:
David Ding,
Felix Hill,
Adam Santoro,
Malcolm Reynolds,
Matt Botvinick
Abstract:
Neural networks have achieved success in a wide array of perceptual tasks but often fail at tasks involving both perception and higher-level reasoning. On these more challenging tasks, bespoke approaches (such as modular symbolic components, independent dynamics models or semantic parsers) targeted towards that specific type of task have typically performed better. The downside to these targeted a…
▽ More
Neural networks have achieved success in a wide array of perceptual tasks but often fail at tasks involving both perception and higher-level reasoning. On these more challenging tasks, bespoke approaches (such as modular symbolic components, independent dynamics models or semantic parsers) targeted towards that specific type of task have typically performed better. The downside to these targeted approaches, however, is that they can be more brittle than general-purpose neural networks, requiring significant modification or even redesign according to the particular task at hand. Here, we propose a more general neural-network-based approach to dynamic visual reasoning problems that obtains state-of-the-art performance on three different domains, in each case outperforming bespoke modular approaches tailored specifically to the task. Our method relies on learned object-centric representations, self-attention and self-supervised dynamics learning, and all three elements together are required for strong performance to emerge. The success of this combination suggests that there may be no need to trade off flexibility for performance on problems involving spatio-temporal or causal-style reasoning. With the right soft biases and learning objectives in a neural network we may be able to attain the best of both worlds.
△ Less
Submitted 26 October, 2021; v1 submitted 15 December, 2020;
originally announced December 2020.
-
Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences
Authors:
Raphael Köster,
Kevin R. McKee,
Richard Everett,
Laura Weidinger,
William S. Isaac,
Edward Hughes,
Edgar A. Duéñez-Guzmán,
Thore Graepel,
Matthew Botvinick,
Joel Z. Leibo
Abstract:
Game theoretic views of convention generally rest on notions of common knowledge and hyper-rational models of individual behavior. However, decades of work in behavioral economics have questioned the validity of both foundations. Meanwhile, computational neuroscience has contributed a modernized 'dual process' account of decision-making where model-free (MF) reinforcement learning trades off with…
▽ More
Game theoretic views of convention generally rest on notions of common knowledge and hyper-rational models of individual behavior. However, decades of work in behavioral economics have questioned the validity of both foundations. Meanwhile, computational neuroscience has contributed a modernized 'dual process' account of decision-making where model-free (MF) reinforcement learning trades off with model-based (MB) reinforcement learning. The former captures habitual and procedural learning while the latter captures choices taken via explicit planning and deduction. Some conventions (e.g. international treaties) are likely supported by cognition that resonates with the game theoretic and MB accounts. However, convention formation may also occur via MF mechanisms like habit learning; though this possibility has been understudied. Here, we demonstrate that complex, large-scale conventions can emerge from MF learning mechanisms. This suggests that some conventions may be supported by habit-like cognition rather than explicit reasoning. We apply MF multi-agent reinforcement learning to a temporo-spatially extended game with incomplete information. In this game, large parts of the state space are reachable only by collective action. However, heterogeneity of tastes makes such coordinated action difficult: multiple equilibria are desirable for all players, but subgroups prefer a particular equilibrium over all others. This creates a coordination problem that can be solved by establishing a convention. We investigate start-up and free rider subproblems as well as the effects of group size, intensity of intrinsic preference, and salience on the emergence dynamics of coordination conventions. Results of our simulations show agents establish and switch between conventions, even working against their own preferred outcome when doing so is necessary for effective coordination.
△ Less
Submitted 14 December, 2020; v1 submitted 18 October, 2020;
originally announced October 2020.
-
Deep Reinforcement Learning and its Neuroscientific Implications
Authors:
Matthew Botvinick,
Jane X. Wang,
Will Dabney,
Kevin J. Miller,
Zeb Kurth-Nelson
Abstract:
The emergence of powerful artificial intelligence is defining new research directions in neuroscience. To date, this research has focused largely on deep neural networks trained using supervised learning, in tasks such as image classification. However, there is another area of recent AI work which has so far received less attention from neuroscientists, but which may have profound neuroscientific…
▽ More
The emergence of powerful artificial intelligence is defining new research directions in neuroscience. To date, this research has focused largely on deep neural networks trained using supervised learning, in tasks such as image classification. However, there is another area of recent AI work which has so far received less attention from neuroscientists, but which may have profound neuroscientific implications: deep reinforcement learning. Deep RL offers a comprehensive framework for studying the interplay among learning, representation and decision-making, offering to the brain sciences a new set of research tools and a wide range of novel hypotheses. In the present review, we provide a high-level introduction to deep RL, discuss some of its initial applications to neuroscience, and survey its wider implications for research on brain and behavior, concluding with a list of opportunities for next-stage research.
△ Less
Submitted 7 July, 2020;
originally announced July 2020.
-
Unsupervised deep learning identifies semantic disentanglement in single inferotemporal neurons
Authors:
Irina Higgins,
Le Chang,
Victoria Langston,
Demis Hassabis,
Christopher Summerfield,
Doris Tsao,
Matthew Botvinick
Abstract:
Deep supervised neural networks trained to classify objects have emerged as popular models of computation in the primate ventral stream. These models represent information with a high-dimensional distributed population code, implying that inferotemporal (IT) responses are also too complex to interpret at the single-neuron level. We challenge this view by modelling neural responses to faces in the…
▽ More
Deep supervised neural networks trained to classify objects have emerged as popular models of computation in the primate ventral stream. These models represent information with a high-dimensional distributed population code, implying that inferotemporal (IT) responses are also too complex to interpret at the single-neuron level. We challenge this view by modelling neural responses to faces in the macaque IT with a deep unsupervised generative model, beta-VAE. Unlike deep classifiers, beta-VAE "disentangles" sensory data into interpretable latent factors, such as gender or hair length. We found a remarkable correspondence between the generative factors discovered by the model and those coded by single IT neurons. Moreover, we were able to reconstruct face images using the signals from just a handful of cells. This suggests that the ventral visual stream may be optimising the disentangling objective, producing a neural code that is low-dimensional and semantically interpretable at the single-unit level.
△ Less
Submitted 25 June, 2020;
originally announced June 2020.
-
Rapid Task-Solving in Novel Environments
Authors:
Sam Ritter,
Ryan Faulkner,
Laurent Sartran,
Adam Santoro,
Matt Botvinick,
David Raposo
Abstract:
We propose the challenge of rapid task-solving in novel environments (RTS), wherein an agent must solve a series of tasks as rapidly as possible in an unfamiliar environment. An effective RTS agent must balance between exploring the unfamiliar environment and solving its current task, all while building a model of the new environment over which it can plan when faced with later tasks. While modern…
▽ More
We propose the challenge of rapid task-solving in novel environments (RTS), wherein an agent must solve a series of tasks as rapidly as possible in an unfamiliar environment. An effective RTS agent must balance between exploring the unfamiliar environment and solving its current task, all while building a model of the new environment over which it can plan when faced with later tasks. While modern deep RL agents exhibit some of these abilities in isolation, none are suitable for the full RTS challenge. To enable progress toward RTS, we introduce two challenge domains: (1) a minimal RTS challenge called the Memory&Planning Game and (2) One-Shot StreetLearn Navigation, which introduces scale and complexity from real-world data. We demonstrate that state-of-the-art deep RL agents fail at RTS in both domains, and that this failure is due to an inability to plan over gathered knowledge. We develop Episodic Planning Networks (EPNs) and show that deep-RL agents with EPNs excel at RTS, outperforming the nearest baseline by factors of 2-3 and learning to navigate held-out StreetLearn maps within a single episode. We show that EPNs learn to execute a value iteration-like planning algorithm and that they generalize to situations beyond their training experience. algorithm and that they generalize to situations beyond their training experience.
△ Less
Submitted 19 April, 2021; v1 submitted 5 June, 2020;
originally announced June 2020.
-
The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget
Authors:
Anirudh Goyal,
Yoshua Bengio,
Matthew Botvinick,
Sergey Levine
Abstract:
In many applications, it is desirable to extract only the relevant information from complex input data, which involves making a decision about which input features are relevant. The information bottleneck method formalizes this as an information-theoretic optimization problem by maintaining an optimal tradeoff between compression (throwing away irrelevant input information), and predicting the tar…
▽ More
In many applications, it is desirable to extract only the relevant information from complex input data, which involves making a decision about which input features are relevant. The information bottleneck method formalizes this as an information-theoretic optimization problem by maintaining an optimal tradeoff between compression (throwing away irrelevant input information), and predicting the target. In many problem settings, including the reinforcement learning problems we consider in this work, we might prefer to compress only part of the input. This is typically the case when we have a standard conditioning input, such as a state observation, and a "privileged" input, which might correspond to the goal of a task, the output of a costly planning algorithm, or communication with another agent. In such cases, we might prefer to compress the privileged input, either to achieve better generalization (e.g., with respect to goals) or to minimize access to costly information (e.g., in the case of communication). Practical implementations of the information bottleneck based on variational inference require access to the privileged input in order to compute the bottleneck variable, so although they perform compression, this compression operation itself needs unrestricted, lossless access. In this work, we propose the variational bandwidth bottleneck, which decides for each example on the estimated value of the privileged information before seeing it, i.e., only based on the standard input, and then accordingly chooses stochastically, whether to access the privileged input or not. We formulate a tractable approximation to this framework and demonstrate in a series of reinforcement learning experiments that it can improve generalization and reduce access to computationally costly information.
△ Less
Submitted 24 April, 2020;
originally announced April 2020.
-
MEMO: A Deep Network for Flexible Combination of Episodic Memories
Authors:
Andrea Banino,
Adrià Puigdomènech Badia,
Raphael Köster,
Martin J. Chadwick,
Vinicius Zambaldi,
Demis Hassabis,
Caswell Barry,
Matthew Botvinick,
Dharshan Kumaran,
Charles Blundell
Abstract:
Recent research developing neural network architectures with external memory have often used the benchmark bAbI question and answering dataset which provides a challenging number of tasks requiring reasoning. Here we employed a classic associative inference task from the memory-based reasoning neuroscience literature in order to more carefully probe the reasoning capacity of existing memory-augmen…
▽ More
Recent research developing neural network architectures with external memory have often used the benchmark bAbI question and answering dataset which provides a challenging number of tasks requiring reasoning. Here we employed a classic associative inference task from the memory-based reasoning neuroscience literature in order to more carefully probe the reasoning capacity of existing memory-augmented architectures. This task is thought to capture the essence of reasoning -- the appreciation of distant relationships among elements distributed across multiple facts or memories. Surprisingly, we found that current architectures struggle to reason over long distance associations. Similar results were obtained on a more complex task involving finding the shortest path between nodes in a path. We therefore developed MEMO, an architecture endowed with the capacity to reason over longer distances. This was accomplished with the addition of two novel components. First, it introduces a separation between memories (facts) stored in external memory and the items that comprise these facts in external memory. Second, it makes use of an adaptive retrieval mechanism, allowing a variable number of "memory hops" before the answer is produced. MEMO is capable of solving our novel reasoning tasks, as well as match state of the art results in bAbI.
△ Less
Submitted 29 January, 2020;
originally announced January 2020.
-
Stabilizing Transformers for Reinforcement Learning
Authors:
Emilio Parisotto,
H. Francis Song,
Jack W. Rae,
Razvan Pascanu,
Caglar Gulcehre,
Siddhant M. Jayakumar,
Max Jaderberg,
Raphael Lopez Kaufman,
Aidan Clark,
Seb Noury,
Matthew M. Botvinick,
Nicolas Heess,
Raia Hadsell
Abstract:
Owing to their ability to both effectively integrate information over long time horizons and scale to massive amounts of data, self-attention architectures have recently shown breakthrough success in natural language processing (NLP), achieving state-of-the-art results in domains such as language modeling and machine translation. Harnessing the transformer's ability to process long time horizons o…
▽ More
Owing to their ability to both effectively integrate information over long time horizons and scale to massive amounts of data, self-attention architectures have recently shown breakthrough success in natural language processing (NLP), achieving state-of-the-art results in domains such as language modeling and machine translation. Harnessing the transformer's ability to process long time horizons of information could provide a similar performance boost in partially observable reinforcement learning (RL) domains, but the large-scale transformers used in NLP have yet to be successfully applied to the RL setting. In this work we demonstrate that the standard transformer architecture is difficult to optimize, which was previously observed in the supervised learning setting but becomes especially pronounced with RL objectives. We propose architectural modifications that substantially improve the stability and learning speed of the original Transformer and XL variant. The proposed architecture, the Gated Transformer-XL (GTrXL), surpasses LSTMs on challenging memory environments and achieves state-of-the-art results on the multi-task DMLab-30 benchmark suite, exceeding the performance of an external memory architecture. We show that the GTrXL, trained using the same losses, has stability and performance that consistently matches or exceeds a competitive LSTM baseline, including on more reactive tasks where memory is less critical. GTrXL offers an easy-to-train, simple-to-implement but substantially more expressive architectural alternative to the standard multi-layer LSTM ubiquitously used for RL agents in partially observable environments.
△ Less
Submitted 13 October, 2019;
originally announced October 2019.
-
Environmental drivers of systematicity and generalization in a situated agent
Authors:
Felix Hill,
Andrew Lampinen,
Rosalia Schneider,
Stephen Clark,
Matthew Botvinick,
James L. McClelland,
Adam Santoro
Abstract:
The question of whether deep neural networks are good at generalising beyond their immediate training experience is of critical importance for learning-based approaches to AI. Here, we consider tests of out-of-sample generalisation that require an agent to respond to never-seen-before instructions by manipulating and positioning objects in a 3D Unity simulated room. We first describe a comparative…
▽ More
The question of whether deep neural networks are good at generalising beyond their immediate training experience is of critical importance for learning-based approaches to AI. Here, we consider tests of out-of-sample generalisation that require an agent to respond to never-seen-before instructions by manipulating and positioning objects in a 3D Unity simulated room. We first describe a comparatively generic agent architecture that exhibits strong performance on these tests. We then identify three aspects of the training regime and environment that make a significant difference to its performance: (a) the number of object/word experiences in the training set; (b) the visual invariances afforded by the agent's perspective, or frame of reference; and (c) the variety of visual input inherent in the perceptual aspect of the agent's perception. Our findings indicate that the degree of generalisation that networks exhibit can depend critically on particulars of the environment in which a given task is instantiated. They further suggest that the propensity for neural networks to generalise in systematic ways may increase if, like human children, those networks have access to many frames of richly varying, multi-modal observations as they learn.
△ Less
Submitted 19 February, 2020; v1 submitted 1 October, 2019;
originally announced October 2019.
-
V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
Authors:
H. Francis Song,
Abbas Abdolmaleki,
Jost Tobias Springenberg,
Aidan Clark,
Hubert Soyer,
Jack W. Rae,
Seb Noury,
Arun Ahuja,
Siqi Liu,
Dhruva Tirumala,
Nicolas Heess,
Dan Belov,
Martin Riedmiller,
Matthew M. Botvinick
Abstract:
Some of the most successful applications of deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting. However, policy gradients can suffer from large variance that may limit performance, and in practice require carefully tuned entropy regularization to prevent policy collapse. As an alternative to policy gradie…
▽ More
Some of the most successful applications of deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting. However, policy gradients can suffer from large variance that may limit performance, and in practice require carefully tuned entropy regularization to prevent policy collapse. As an alternative to policy gradient algorithms, we introduce V-MPO, an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) that performs policy iteration based on a learned state-value function. We show that V-MPO surpasses previously reported scores for both the Atari-57 and DMLab-30 benchmark suites in the multi-task setting, and does so reliably without importance weighting, entropy regularization, or population-based tuning of hyperparameters. On individual DMLab and Atari levels, the proposed algorithm can achieve scores that are substantially higher than has previously been reported. V-MPO is also applicable to problems with high-dimensional, continuous action spaces, which we demonstrate in the context of learning to control simulated humanoids with 22 degrees of freedom from full state observations and 56 degrees of freedom from pixel observations, as well as example OpenAI Gym tasks where V-MPO achieves substantially higher asymptotic scores than previously reported.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Meta-learning of Sequential Strategies
Authors:
Pedro A. Ortega,
Jane X. Wang,
Mark Rowland,
Tim Genewein,
Zeb Kurth-Nelson,
Razvan Pascanu,
Nicolas Heess,
Joel Veness,
Alex Pritzel,
Pablo Sprechmann,
Siddhant M. Jayakumar,
Tom McGrath,
Kevin Miller,
Mohammad Azar,
Ian Osband,
Neil Rabinowitz,
András György,
Silvia Chiappa,
Simon Osindero,
Yee Whye Teh,
Hado van Hasselt,
Nando de Freitas,
Matthew Botvinick,
Shane Legg
Abstract:
In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal pred…
▽ More
In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal predictors and reinforcement learners which behave as if they had a probabilistic model that allowed them to efficiently exploit task structure. Furthermore, we recast memory-based meta-learning within a Bayesian framework, showing that the meta-learned strategies are near-optimal because they amortize Bayes-filtered data, where the adaptation is implemented in the memory dynamics as a state-machine of sufficient statistics. Essentially, memory-based meta-learning translates the hard problem of probabilistic sequential inference into a regression problem.
△ Less
Submitted 18 July, 2019; v1 submitted 8 May, 2019;
originally announced May 2019.
-
Learned human-agent decision-making, communication and joint action in a virtual reality environment
Authors:
Patrick M. Pilarski,
Andrew Butcher,
Michael Johanson,
Matthew M. Botvinick,
Andrew Bolt,
Adam S. R. Parker
Abstract:
Humans make decisions and act alongside other humans to pursue both short-term and long-term goals. As a result of ongoing progress in areas such as computing science and automation, humans now also interact with non-human agents of varying complexity as part of their day-to-day activities; substantial work is being done to integrate increasingly intelligent machine agents into human work and play…
▽ More
Humans make decisions and act alongside other humans to pursue both short-term and long-term goals. As a result of ongoing progress in areas such as computing science and automation, humans now also interact with non-human agents of varying complexity as part of their day-to-day activities; substantial work is being done to integrate increasingly intelligent machine agents into human work and play. With increases in the cognitive, sensory, and motor capacity of these agents, intelligent machinery for human assistance can now reasonably be considered to engage in joint action with humans---i.e., two or more agents adapting their behaviour and their understanding of each other so as to progress in shared objectives or goals. The mechanisms, conditions, and opportunities for skillful joint action in human-machine partnerships is of great interest to multiple communities. Despite this, human-machine joint action is as yet under-explored, especially in cases where a human and an intelligent machine interact in a persistent way during the course of real-time, daily-life experience. In this work, we contribute a virtual reality environment wherein a human and an agent can adapt their predictions, their actions, and their communication so as to pursue a simple foraging task. In a case study with a single participant, we provide an example of human-agent coordination and decision-making involving prediction learning on the part of the human and the machine agent, and control learning on the part of the machine agent wherein audio communication signals are used to cue its human partner in service of acquiring shared reward. These comparisons suggest the utility of studying human-machine coordination in a virtual reality environment, and identify further research that will expand our understanding of persistent human-machine joint action.
△ Less
Submitted 7 May, 2019;
originally announced May 2019.
-
Is coding a relevant metaphor for building AI? A commentary on "Is coding a relevant metaphor for the brain?", by Romain Brette
Authors:
Adam Santoro,
Felix Hill,
David Barrett,
David Raposo,
Matthew Botvinick,
Timothy Lillicrap
Abstract:
Brette contends that the neural coding metaphor is an invalid basis for theories of what the brain does. Here, we argue that it is an insufficient guide for building an artificial intelligence that learns to accomplish short- and long-term goals in a complex, changing environment.
Brette contends that the neural coding metaphor is an invalid basis for theories of what the brain does. Here, we argue that it is an insufficient guide for building an artificial intelligence that learns to accomplish short- and long-term goals in a complex, changing environment.
△ Less
Submitted 18 April, 2019;
originally announced April 2019.
-
Multi-Object Representation Learning with Iterative Variational Inference
Authors:
Klaus Greff,
Raphaël Lopez Kaufman,
Rishabh Kabra,
Nick Watters,
Chris Burgess,
Daniel Zoran,
Loic Matthey,
Matthew Botvinick,
Alexander Lerchner
Abstract:
Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities. Yet most work on representation learning focuses on feature learning without even considering multiple objects, or treats segmentation as an (often supervised) preprocessing step. Instead, we argue for the importance of learning to segment and repres…
▽ More
Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities. Yet most work on representation learning focuses on feature learning without even considering multiple objects, or treats segmentation as an (often supervised) preprocessing step. Instead, we argue for the importance of learning to segment and represent objects jointly. We demonstrate that, starting from the simple assumption that a scene is composed of multiple entities, it is possible to learn to segment images into interpretable objects with disentangled representations. Our method learns -- without supervision -- to inpaint occluded parts, and extrapolates to scenes with more objects and to unseen objects with novel feature combinations. We also show that, due to the use of iterative variational inference, our system is able to learn multi-modal posteriors for ambiguous inputs and extends naturally to sequences.
△ Less
Submitted 27 July, 2020; v1 submitted 1 March, 2019;
originally announced March 2019.
-
MONet: Unsupervised Scene Decomposition and Representation
Authors:
Christopher P. Burgess,
Loic Matthey,
Nicholas Watters,
Rishabh Kabra,
Irina Higgins,
Matt Botvinick,
Alexander Lerchner
Abstract:
The ability to decompose scenes in terms of abstract building blocks is crucial for general intelligence. Where those basic building blocks share meaningful properties, interactions and other regularities across scenes, such decompositions can simplify reasoning and facilitate imagination of novel scenarios. In particular, representing perceptual observations in terms of entities should improve da…
▽ More
The ability to decompose scenes in terms of abstract building blocks is crucial for general intelligence. Where those basic building blocks share meaningful properties, interactions and other regularities across scenes, such decompositions can simplify reasoning and facilitate imagination of novel scenarios. In particular, representing perceptual observations in terms of entities should improve data efficiency and transfer performance on a wide range of tasks. Thus we need models capable of discovering useful decompositions of scenes by identifying units with such regularities and representing them in a common format. To address this problem, we have developed the Multi-Object Network (MONet). In this model, a VAE is trained end-to-end together with a recurrent attention network -- in a purely unsupervised manner -- to provide attention masks around, and reconstructions of, regions of images. We show that this model is capable of learning to decompose and represent challenging 3D scenes into semantically meaningful components, such as objects and background elements.
△ Less
Submitted 22 January, 2019;
originally announced January 2019.
-
InfoBot: Transfer and Exploration via the Information Bottleneck
Authors:
Anirudh Goyal,
Riashat Islam,
Daniel Strouse,
Zafarali Ahmed,
Matthew Botvinick,
Hugo Larochelle,
Yoshua Bengio,
Sergey Levine
Abstract:
A central challenge in reinforcement learning is discovering effective policies for tasks where rewards are sparsely distributed. We postulate that in the absence of useful reward signals, an effective exploration strategy should seek out {\it decision states}. These states lie at critical junctions in the state space from where the agent can transition to new, potentially unexplored regions. We p…
▽ More
A central challenge in reinforcement learning is discovering effective policies for tasks where rewards are sparsely distributed. We postulate that in the absence of useful reward signals, an effective exploration strategy should seek out {\it decision states}. These states lie at critical junctions in the state space from where the agent can transition to new, potentially unexplored regions. We propose to learn about decision states from prior experience. By training a goal-conditioned policy with an information bottleneck, we can identify decision states by examining where the model actually leverages the goal state. We find that this simple mechanism effectively identifies decision states, even in partially observed settings. In effect, the model learns the sensory cues that correlate with potential subgoals. In new environments, this model can then identify novel subgoals for further exploration, guiding the agent through a sequence of potential decision states and through new regions of the state space.
△ Less
Submitted 5 December, 2023; v1 submitted 30 January, 2019;
originally announced January 2019.
-
Causal Reasoning from Meta-reinforcement Learning
Authors:
Ishita Dasgupta,
Jane Wang,
Silvia Chiappa,
Jovana Mitrovic,
Pedro Ortega,
David Raposo,
Edward Hughes,
Peter Battaglia,
Matthew Botvinick,
Zeb Kurth-Nelson
Abstract:
Discovering and exploiting the causal structure in the environment is a crucial challenge for intelligent agents. Here we explore whether causal reasoning can emerge via meta-reinforcement learning. We train a recurrent network with model-free reinforcement learning to solve a range of problems that each contain causal structure. We find that the trained agent can perform causal reasoning in novel…
▽ More
Discovering and exploiting the causal structure in the environment is a crucial challenge for intelligent agents. Here we explore whether causal reasoning can emerge via meta-reinforcement learning. We train a recurrent network with model-free reinforcement learning to solve a range of problems that each contain causal structure. We find that the trained agent can perform causal reasoning in novel situations in order to obtain rewards. The agent can select informative interventions, draw causal inferences from observational data, and make counterfactual predictions. Although established formal causal reasoning algorithms also exist, in this paper we show that such reasoning can arise from model-free reinforcement learning, and suggest that causal reasoning in complex settings may benefit from the more end-to-end learning-based approaches presented here. This work also offers new strategies for structured exploration in reinforcement learning, by providing agents with the ability to perform -- and interpret -- experiments.
△ Less
Submitted 23 January, 2019;
originally announced January 2019.
-
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
Authors:
Jakob N. Foerster,
Francis Song,
Edward Hughes,
Neil Burch,
Iain Dunning,
Shimon Whiteson,
Matthew Botvinick,
Michael Bowling
Abstract:
When observing the actions of others, humans make inferences about why they acted as they did, and what this implies about the world; humans also use the fact that their actions will be interpreted in this manner, allowing them to act informatively and thereby communicate efficiently with others. Although learning algorithms have recently achieved superhuman performance in a number of two-player,…
▽ More
When observing the actions of others, humans make inferences about why they acted as they did, and what this implies about the world; humans also use the fact that their actions will be interpreted in this manner, allowing them to act informatively and thereby communicate efficiently with others. Although learning algorithms have recently achieved superhuman performance in a number of two-player, zero-sum games, scalable multi-agent reinforcement learning algorithms that can discover effective strategies and conventions in complex, partially observable settings have proven elusive. We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment. BAD introduces a new Markov decision process, the public belief MDP, in which the action space consists of all deterministic partial policies, and exploits the fact that an agent acting only on this public belief state can still learn to use its private information if the action space is augmented to be over all partial policies mapping private information into environment actions. The Bayesian update is closely related to the theory of mind reasoning that humans carry out when observing others' actions. We first validate BAD on a proof-of-principle two-step matrix game, where it outperforms policy gradient methods; we then evaluate BAD on the challenging, cooperative partial-information card game Hanabi, where, in the two-player setting, it surpasses all previously published learning and hand-coded approaches, establishing a new state of the art.
△ Less
Submitted 10 September, 2019; v1 submitted 4 November, 2018;
originally announced November 2018.
-
Relational Forward Models for Multi-Agent Learning
Authors:
Andrea Tacchetti,
H. Francis Song,
Pedro A. M. Mediano,
Vinicius Zambaldi,
Neil C. Rabinowitz,
Thore Graepel,
Matthew Botvinick,
Peter W. Battaglia
Abstract:
The behavioral dynamics of multi-agent systems have a rich and orderly structure, which can be leveraged to understand these systems, and to improve how artificial agents learn to operate in them. Here we introduce Relational Forward Models (RFM) for multi-agent learning, networks that can learn to make accurate predictions of agents' future behavior in multi-agent environments. Because these mode…
▽ More
The behavioral dynamics of multi-agent systems have a rich and orderly structure, which can be leveraged to understand these systems, and to improve how artificial agents learn to operate in them. Here we introduce Relational Forward Models (RFM) for multi-agent learning, networks that can learn to make accurate predictions of agents' future behavior in multi-agent environments. Because these models operate on the discrete entities and relations present in the environment, they produce interpretable intermediate representations which offer insights into what drives agents' behavior, and what events mediate the intensity and valence of social interactions. Furthermore, we show that embedding RFM modules inside agents results in faster learning systems compared to non-augmented baselines. As more and more of the autonomous systems we develop and interact with become multi-agent in nature, developing richer analysis tools for characterizing how and why agents make decisions is increasingly necessary. Moreover, developing artificial agents that quickly and safely learn to coordinate with one another, and with humans in shared environments, is crucial.
△ Less
Submitted 28 September, 2018;
originally announced September 2018.
-
Learning to Share and Hide Intentions using Information Regularization
Authors:
DJ Strouse,
Max Kleiman-Weiner,
Josh Tenenbaum,
Matt Botvinick,
David Schwab
Abstract:
Learning to cooperate with friends and compete with foes is a key component of multi-agent reinforcement learning. Typically to do so, one requires access to either a model of or interaction with the other agent(s). Here we show how to learn effective strategies for cooperation and competition in an asymmetric information game with no such model or interaction. Our approach is to encourage an agen…
▽ More
Learning to cooperate with friends and compete with foes is a key component of multi-agent reinforcement learning. Typically to do so, one requires access to either a model of or interaction with the other agent(s). Here we show how to learn effective strategies for cooperation and competition in an asymmetric information game with no such model or interaction. Our approach is to encourage an agent to reveal or hide their intentions using an information-theoretic regularizer. We consider both the mutual information between goal and action given state, as well as the mutual information between goal and state. We show how to optimize these regularizers in a way that is easy to integrate with policy gradient reinforcement learning. Finally, we demonstrate that cooperative (competitive) policies learned with our approach lead to more (less) reward for a second agent in two simple asymmetric information games.
△ Less
Submitted 1 January, 2019; v1 submitted 6 August, 2018;
originally announced August 2018.
-
Relational Deep Reinforcement Learning
Authors:
Vinicius Zambaldi,
David Raposo,
Adam Santoro,
Victor Bapst,
Yujia Li,
Igor Babuschkin,
Karl Tuyls,
David Reichert,
Timothy Lillicrap,
Edward Lockhart,
Murray Shanahan,
Victoria Langston,
Razvan Pascanu,
Matthew Botvinick,
Oriol Vinyals,
Peter Battaglia
Abstract:
We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and…
▽ More
We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and planning task called Box-World, our agent finds interpretable solutions that improve upon baselines in terms of sample complexity, ability to generalize to more complex scenes than experienced during training, and overall performance. In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games -- surpassing human grandmaster performance on four. By considering architectural inductive biases, our work opens new directions for overcoming important, but stubborn, challenges in deep RL.
△ Less
Submitted 28 June, 2018; v1 submitted 5 June, 2018;
originally announced June 2018.
-
Relational inductive biases, deep learning, and graph networks
Authors:
Peter W. Battaglia,
Jessica B. Hamrick,
Victor Bapst,
Alvaro Sanchez-Gonzalez,
Vinicius Zambaldi,
Mateusz Malinowski,
Andrea Tacchetti,
David Raposo,
Adam Santoro,
Ryan Faulkner,
Caglar Gulcehre,
Francis Song,
Andrew Ballard,
Justin Gilmer,
George Dahl,
Ashish Vaswani,
Kelsey Allen,
Charles Nash,
Victoria Langston,
Chris Dyer,
Nicolas Heess,
Daan Wierstra,
Pushmeet Kohli,
Matt Botvinick,
Oriol Vinyals
, et al. (2 additional authors not shown)
Abstract:
Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, rema…
▽ More
Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, remain out of reach for current approaches. In particular, generalizing beyond one's experiences--a hallmark of human intelligence from infancy--remains a formidable challenge for modern AI.
The following is part position paper, part review, and part unification. We argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective. Just as biology uses nature and nurture cooperatively, we reject the false choice between "hand-engineering" and "end-to-end" learning, and instead advocate for an approach which benefits from their complementary strengths. We explore how using relational inductive biases within deep learning architectures can facilitate learning about entities, relations, and rules for composing them. We present a new building block for the AI toolkit with a strong relational inductive bias--the graph network--which generalizes and extends various approaches for neural networks that operate on graphs, and provides a straightforward interface for manipulating structured knowledge and producing structured behaviors. We discuss how graph networks can support relational reasoning and combinatorial generalization, laying the foundation for more sophisticated, interpretable, and flexible patterns of reasoning. As a companion to this paper, we have released an open-source software library for building graph networks, with demonstrations of how to use them in practice.
△ Less
Submitted 17 October, 2018; v1 submitted 4 June, 2018;
originally announced June 2018.
-
Been There, Done That: Meta-Learning with Episodic Recall
Authors:
Samuel Ritter,
Jane X. Wang,
Zeb Kurth-Nelson,
Siddhant M. Jayakumar,
Charles Blundell,
Razvan Pascanu,
Matthew Botvinick
Abstract:
Meta-learning agents excel at rapidly learning new tasks from open-ended task distributions; yet, they forget what they learn about each task as soon as the next begins. When tasks reoccur - as they do in natural environments - metalearning agents must explore again instead of immediately exploiting previously discovered solutions. We propose a formalism for generating open-ended yet repetitious e…
▽ More
Meta-learning agents excel at rapidly learning new tasks from open-ended task distributions; yet, they forget what they learn about each task as soon as the next begins. When tasks reoccur - as they do in natural environments - metalearning agents must explore again instead of immediately exploiting previously discovered solutions. We propose a formalism for generating open-ended yet repetitious environments, then develop a meta-learning architecture for solving these environments. This architecture melds the standard LSTM working memory with a differentiable neural episodic memory. We explore the capabilities of agents with this episodic LSTM in five meta-learning environments with reoccurring tasks, ranging from bandits to navigation and stochastic sequential decision problems.
△ Less
Submitted 6 July, 2018; v1 submitted 24 May, 2018;
originally announced May 2018.
-
Probing Physics Knowledge Using Tools from Developmental Psychology
Authors:
Luis Piloto,
Ari Weinstein,
Dhruva TB,
Arun Ahuja,
Mehdi Mirza,
Greg Wayne,
David Amos,
Chia-chun Hung,
Matt Botvinick
Abstract:
In order to build agents with a rich understanding of their environment, one key objective is to endow them with a grasp of intuitive physics; an ability to reason about three-dimensional objects, their dynamic interactions, and responses to forces. While some work on this problem has taken the approach of building in components such as ready-made physics engines, other research aims to extract ge…
▽ More
In order to build agents with a rich understanding of their environment, one key objective is to endow them with a grasp of intuitive physics; an ability to reason about three-dimensional objects, their dynamic interactions, and responses to forces. While some work on this problem has taken the approach of building in components such as ready-made physics engines, other research aims to extract general physical concepts directly from sensory data. In the latter case, one challenge that arises is evaluating the learning system. Research on intuitive physics knowledge in children has long employed a violation of expectations (VOE) method to assess children's mastery of specific physical concepts. We take the novel step of applying this method to artificial learning systems. In addition to introducing the VOE technique, we describe a set of probe datasets inspired by classic test stimuli from developmental psychology. We test a baseline deep learning system on this battery, as well as on a physics learning dataset ("IntPhys") recently posed by another research group. Our results show how the VOE technique may provide a useful tool for tracking physics knowledge in future research.
△ Less
Submitted 3 April, 2018;
originally announced April 2018.
-
Unsupervised Predictive Memory in a Goal-Directed Agent
Authors:
Greg Wayne,
Chia-Chun Hung,
David Amos,
Mehdi Mirza,
Arun Ahuja,
Agnieszka Grabska-Barwinska,
Jack Rae,
Piotr Mirowski,
Joel Z. Leibo,
Adam Santoro,
Mevlana Gemici,
Malcolm Reynolds,
Tim Harley,
Josh Abramson,
Shakir Mohamed,
Danilo Rezende,
David Saxton,
Adam Cain,
Chloe Hillier,
David Silver,
Koray Kavukcuoglu,
Matt Botvinick,
Demis Hassabis,
Timothy Lillicrap
Abstract:
Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement l…
▽ More
Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement learning (RL) algorithms with deep neural networks, and the excitement surrounding these results has led to the pursuit of related ideas as explanations of non-human animal learning. However, we demonstrate that contemporary RL algorithms struggle to solve simple tasks when enough information is concealed from the sensors of the agent, a property called "partial observability". An obvious requirement for handling partially observed tasks is access to extensive memory, but we show memory is not enough; it is critical that the right information be stored in the right format. We develop a model, the Memory, RL, and Inference Network (MERLIN), in which memory formation is guided by a process of predictive modeling. MERLIN facilitates the solution of tasks in 3D virtual reality environments for which partial observability is severe and memories must be maintained over long durations. Our model demonstrates a single learning agent architecture that can solve canonical behavioural tasks in psychology and neurobiology without strong simplifying assumptions about the dimensionality of sensory input or the duration of experiences.
△ Less
Submitted 28 March, 2018;
originally announced March 2018.
-
On the importance of single directions for generalization
Authors:
Ari S. Morcos,
David G. T. Barrett,
Neil C. Rabinowitz,
Matthew Botvinick
Abstract:
Despite their ability to memorize large datasets, deep neural networks often achieve good generalization performance. However, the differences between the learned solutions of networks which generalize and those which do not remain unclear. Additionally, the tuning properties of single directions (defined as the activation of a single unit or some linear combination of units in response to some in…
▽ More
Despite their ability to memorize large datasets, deep neural networks often achieve good generalization performance. However, the differences between the learned solutions of networks which generalize and those which do not remain unclear. Additionally, the tuning properties of single directions (defined as the activation of a single unit or some linear combination of units in response to some input) have been highlighted, but their importance has not been evaluated. Here, we connect these lines of inquiry to demonstrate that a network's reliance on single directions is a good predictor of its generalization performance, across networks trained on datasets with different fractions of corrupted labels, across ensembles of networks trained on datasets with unmodified labels, across different hyperparameters, and over the course of training. While dropout only regularizes this quantity up to a point, batch normalization implicitly discourages single direction reliance, in part by decreasing the class selectivity of individual units. Finally, we find that class selectivity is a poor predictor of task importance, suggesting not only that networks which generalize well minimize their dependence on individual units by reducing their selectivity, but also that individually selective units may not be necessary for strong network performance.
△ Less
Submitted 22 May, 2018; v1 submitted 19 March, 2018;
originally announced March 2018.
-
Machine Theory of Mind
Authors:
Neil C. Rabinowitz,
Frank Perbet,
H. Francis Song,
Chiyuan Zhang,
S. M. Ali Eslami,
Matthew Botvinick
Abstract:
Theory of mind (ToM; Premack & Woodruff, 1978) broadly refers to humans' ability to represent the mental states of others, including their desires, beliefs, and intentions. We propose to train a machine to build such models too. We design a Theory of Mind neural network -- a ToMnet -- which uses meta-learning to build models of the agents it encounters, from observations of their behaviour alone.…
▽ More
Theory of mind (ToM; Premack & Woodruff, 1978) broadly refers to humans' ability to represent the mental states of others, including their desires, beliefs, and intentions. We propose to train a machine to build such models too. We design a Theory of Mind neural network -- a ToMnet -- which uses meta-learning to build models of the agents it encounters, from observations of their behaviour alone. Through this process, it acquires a strong prior model for agents' behaviour, as well as the ability to bootstrap to richer predictions about agents' characteristics and mental states using only a small number of behavioural observations. We apply the ToMnet to agents behaving in simple gridworld environments, showing that it learns to model random, algorithmic, and deep reinforcement learning agents from varied populations, and that it passes classic ToM tasks such as the "Sally-Anne" test (Wimmer & Perner, 1983; Baron-Cohen et al., 1985) of recognising that others can hold false beliefs about the world. We argue that this system -- which autonomously learns how to model other agents in its world -- is an important step forward for developing multi-agent AI systems, for building intermediating technology for machine-human interaction, and for advancing the progress on interpretable AI.
△ Less
Submitted 12 March, 2018; v1 submitted 21 February, 2018;
originally announced February 2018.
-
Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents
Authors:
Joel Z. Leibo,
Cyprien de Masson d'Autume,
Daniel Zoran,
David Amos,
Charles Beattie,
Keith Anderson,
Antonio García Castañeda,
Manuel Sanchez,
Simon Green,
Audrunas Gruslys,
Shane Legg,
Demis Hassabis,
Matthew M. Botvinick
Abstract:
Psychlab is a simulated psychology laboratory inside the first-person 3D game world of DeepMind Lab (Beattie et al. 2016). Psychlab enables implementations of classical laboratory psychological experiments so that they work with both human and artificial agents. Psychlab has a simple and flexible API that enables users to easily create their own tasks. As examples, we are releasing Psychlab implem…
▽ More
Psychlab is a simulated psychology laboratory inside the first-person 3D game world of DeepMind Lab (Beattie et al. 2016). Psychlab enables implementations of classical laboratory psychological experiments so that they work with both human and artificial agents. Psychlab has a simple and flexible API that enables users to easily create their own tasks. As examples, we are releasing Psychlab implementations of several classical experimental paradigms including visual search, change detection, random dot motion discrimination, and multiple object tracking. We also contribute a study of the visual psychophysics of a specific state-of-the-art deep reinforcement learning agent: UNREAL (Jaderberg et al. 2016). This study leads to the surprising conclusion that UNREAL learns more quickly about larger target stimuli than it does about smaller stimuli. In turn, this insight motivates a specific improvement in the form of a simple model of foveal vision that turns out to significantly boost UNREAL's performance, both on Psychlab tasks, and on standard DeepMind Lab tasks. By open-sourcing Psychlab we hope to facilitate a range of future such studies that simultaneously advance deep reinforcement learning and improve its links with cognitive science.
△ Less
Submitted 4 February, 2018; v1 submitted 24 January, 2018;
originally announced January 2018.