Zum Hauptinhalt springen

Showing 1–50 of 63 results for author: Leibo, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.00392  [pdf, other

    cs.AI

    Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning

    Authors: Jonathan Cook, Chris Lu, Edward Hughes, Joel Z. Leibo, Jakob Foerster

    Abstract: Cultural accumulation drives the open-ended and diverse progress in capabilities spanning human history. It builds an expanding body of knowledge and skills by combining individual exploration with inter-generational information transmission. Despite its widespread success among humans, the capacity for artificial learning agents to accumulate culture remains under-explored. In particular, approac… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  2. A social path to human-like artificial intelligence

    Authors: Edgar A. Duéñez-Guzmán, Suzanne Sadedin, Jane X. Wang, Kevin R. McKee, Joel Z. Leibo

    Abstract: Traditionally, cognitive and computer scientists have viewed intelligence solipsistically, as a property of unitary agents devoid of social context. Given the success of contemporary learning algorithms, we argue that the bottleneck in artificial intelligence (AI) progress is shifting from data assimilation to novel data generation. We bring together evidence showing that natural intelligence emer… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 17 pages, 2 figures, 1 box

    MSC Class: 68T05 ACM Class: I.2.6

  3. arXiv:2402.16985  [pdf, other

    cs.GT cs.SE

    Visualizing 2x2 Normal-Form Games: twoxtwogame LaTeX Package

    Authors: Luke Marris, Ian Gemp, Siqi Liu, Joel Z. Leibo, Georgios Piliouras

    Abstract: Normal-form games with two players, each with two strategies, are the most studied class of games. These so-called 2x2 games are used to model a variety of strategic interactions. They appear in game theory, economics, and artificial intelligence research. However, there lacks tools for describing and visualizing such games. This work introduces a LaTeX package for visualizing 2x2 games. This work… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  4. arXiv:2401.05133  [pdf, other

    cs.AI cs.MA

    Neural Population Learning beyond Symmetric Zero-sum Games

    Authors: Siqi Liu, Luke Marris, Marc Lanctot, Georgios Piliouras, Joel Z. Leibo, Nicolas Heess

    Abstract: We study computationally efficient methods for finding equilibria in n-player general-sum games, specifically ones that afford complex visuomotor skills. We show how existing methods would struggle in this setting, either computationally or in theory. We then introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Corre… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  5. arXiv:2312.05162  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    A Review of Cooperation in Multi-agent Learning

    Authors: Yali Du, Joel Z. Leibo, Usman Islam, Richard Willis, Peter Sunehag

    Abstract: Cooperation in multi-agent learning (MAL) is a topic at the intersection of numerous disciplines, including game theory, economics, social sciences, and evolutionary biology. Research in this area aims to understand both how agents can coordinate effectively when goals are aligned and how they may cooperate in settings where gains from working together are possible but possibilities for conflict a… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: 29 pages, 3 figures

  6. arXiv:2312.03664  [pdf, other

    cs.AI cs.CL

    Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia

    Authors: Alexander Sasha Vezhnevets, John P. Agapiou, Avia Aharon, Ron Ziv, Jayd Matyas, Edgar A. Duéñez-Guzmán, William A. Cunningham, Simon Osindero, Danny Karmon, Joel Z. Leibo

    Abstract: Agent-based modeling has been around for decades, and applied widely across the social and natural sciences. The scope of this research method is now poised to grow dramatically as it absorbs the new affordances provided by Large Language Models (LLM)s. Generative Agent-Based Models (GABM) are not just classic Agent-Based Models (ABM)s where the agents talk to one another. Rather, GABMs are constr… ▽ More

    Submitted 13 December, 2023; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: 32 pages, 5 figures

  7. Machine Culture

    Authors: Levin Brinkmann, Fabian Baumann, Jean-François Bonnefon, Maxime Derex, Thomas F. Müller, Anne-Marie Nussberger, Agnieszka Czaplicka, Alberto Acerbi, Thomas L. Griffiths, Joseph Henrich, Joel Z. Leibo, Richard McElreath, Pierre-Yves Oudeyer, Jonathan Stray, Iyad Rahwan

    Abstract: The ability of humans to create and disseminate culture is often credited as the single most important factor of our success as a species. In this Perspective, we explore the notion of machine culture, culture mediated or generated by machines. We argue that intelligent machines simultaneously transform the cultural evolutionary processes of variation, transmission, and selection. Recommender algo… ▽ More

    Submitted 22 November, 2023; v1 submitted 19 November, 2023; originally announced November 2023.

    Journal ref: Nat Hum Behav 7, 1855-1868 (2023)

  8. arXiv:2310.12928  [pdf, other

    cs.GT

    Resolving social dilemmas with minimal reward transfer

    Authors: Richard Willis, Yali Du, Joel Z Leibo, Michael Luck

    Abstract: Social dilemmas present a significant challenge in multi-agent cooperation because individuals are incentivised to behave in ways that undermine socially optimal outcomes. Consequently, self-interested agents often avoid collective behaviour. In response, we formalise social dilemmas and introduce a novel metric, the general self-interest level, to quantify the disparity between individual and gro… ▽ More

    Submitted 31 July, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: 35 pages, 15 tables, 2 figures. Submitted to the Journal of Autonomous Agents and Multi-Agent Systems: Special Issue on Citizen-Centric AI Systems

  9. arXiv:2309.06364  [pdf, other

    cs.CL cs.AI

    Framework-Based Qualitative Analysis of Free Responses of Large Language Models: Algorithmic Fidelity

    Authors: Aliya Amirova, Theodora Fteropoulli, Nafiso Ahmed, Martin R. Cowie, Joel Z. Leibo

    Abstract: Today, using Large-scale generative Language Models (LLMs) it is possible to simulate free responses to interview questions like those traditionally analyzed using qualitative research methods. Qualitative methodology encompasses a broad family of techniques involving manual analysis of open-ended interviews or conversations conducted freely in natural language. Here we consider whether artificial… ▽ More

    Submitted 4 February, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: 52 pages, 5 tables, 5 figures

  10. arXiv:2305.18269  [pdf, other

    cs.AI cs.CY

    Doing the right thing for the right reason: Evaluating artificial moral cognition by probing cost insensitivity

    Authors: Yiran Mao, Madeline G. Reinecke, Markus Kunesch, Edgar A. Duéñez-Guzmán, Ramona Comanescu, Julia Haas, Joel Z. Leibo

    Abstract: Is it possible to evaluate the moral cognition of complex artificial agents? In this work, we take a look at one aspect of morality: `doing the right thing for the right reasons.' We propose a behavior-based analysis of artificial moral cognition which could also be applied to humans to facilitate like-for-like comparison. Morally-motivated behavior should persist despite mounting cost; by measuri… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 11 pages, 3 figures

  11. arXiv:2305.00768  [pdf, other

    cs.MA stat.ML

    Heterogeneous Social Value Orientation Leads to Meaningful Diversity in Sequential Social Dilemmas

    Authors: Udari Madhushani, Kevin R. McKee, John P. Agapiou, Joel Z. Leibo, Richard Everett, Thomas Anthony, Edward Hughes, Karl Tuyls, Edgar A. Duéñez-Guzmán

    Abstract: In social psychology, Social Value Orientation (SVO) describes an individual's propensity to allocate resources between themself and others. In reinforcement learning, SVO has been instantiated as an intrinsic motivation that remaps an agent's rewards based on particular target distributions of group reward. Prior studies show that groups of agents endowed with heterogeneous SVO learn diverse poli… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

  12. arXiv:2302.01180  [pdf, other

    cs.AI cs.NE

    Diversity Through Exclusion (DTE): Niche Identification for Reinforcement Learning through Value-Decomposition

    Authors: Peter Sunehag, Alexander Sasha Vezhnevets, Edgar Duéñez-Guzmán, Igor Mordach, Joel Z. Leibo

    Abstract: Many environments contain numerous available niches of variable value, each associated with a different local optimum in the space of behaviors (policy space). In such situations it is often difficult to design a learning process capable of evading distraction by poor local optima long enough to stumble upon the best available niche. In this work we propose a generic reinforcement learning (RL) al… ▽ More

    Submitted 3 February, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: Full length paper accompanying short format appearing at AAMAS 2023

    ACM Class: I.2

  13. arXiv:2211.13746  [pdf, other

    cs.MA cs.AI cs.GT cs.NE

    Melting Pot 2.0

    Authors: John P. Agapiou, Alexander Sasha Vezhnevets, Edgar A. Duéñez-Guzmán, Jayd Matyas, Yiran Mao, Peter Sunehag, Raphael Köster, Udari Madhushani, Kavya Kopparapu, Ramona Comanescu, DJ Strouse, Michael B. Johanson, Sukhdeep Singh, Julia Haas, Igor Mordatch, Dean Mobbs, Joel Z. Leibo

    Abstract: Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by "solipsistic" approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures ge… ▽ More

    Submitted 30 October, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: 69 pages, 54 figures. arXiv admin note: text overlap with arXiv:2107.06857

  14. arXiv:2208.05568  [pdf, other

    cs.MA cs.AI cs.NE

    The emergence of division of labor through decentralized social sanctioning

    Authors: Anil Yaman, Joel Z. Leibo, Giovanni Iacca, Sang Wan Lee

    Abstract: Human ecological success relies on our characteristic ability to flexibly self-organize into cooperative social groups, the most successful of which employ substantial specialization and division of labor. Unlike most other animals, humans learn by trial and error during their lives what role to take on. However, when some critical roles are more attractive than others, and individuals are self-in… ▽ More

    Submitted 30 September, 2023; v1 submitted 10 August, 2022; originally announced August 2022.

  15. arXiv:2205.06760  [pdf, other

    cs.AI cs.LG cs.MA

    Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

    Authors: Michael Bradley Johanson, Edward Hughes, Finbarr Timbers, Joel Z. Leibo

    Abstract: Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefe… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

  16. arXiv:2201.01816  [pdf, other

    cs.AI cs.LG cs.MA

    Hidden Agenda: a Social Deduction Game with Diverse Learned Equilibria

    Authors: Kavya Kopparapu, Edgar A. Duéñez-Guzmán, Jayd Matyas, Alexander Sasha Vezhnevets, John P. Agapiou, Kevin R. McKee, Richard Everett, Janusz Marecki, Joel Z. Leibo, Thore Graepel

    Abstract: A key challenge in the study of multiagent cooperation is the need for individual agents not only to cooperate effectively, but to decide with whom to cooperate. This is particularly critical in situations when other agents have hidden, possibly misaligned motivations and goals. Social deduction games offer an avenue to study how individuals might learn to synthesize potentially unreliable informa… ▽ More

    Submitted 5 January, 2022; originally announced January 2022.

  17. arXiv:2110.11404  [pdf, other

    cs.LG cs.AI cs.GT cs.MA

    Statistical discrimination in learning agents

    Authors: Edgar A. Duéñez-Guzmán, Kevin R. McKee, Yiran Mao, Ben Coppin, Silvia Chiappa, Alexander Sasha Vezhnevets, Michiel A. Bakker, Yoram Bachrach, Suzanne Sadedin, William Isaac, Karl Tuyls, Joel Z. Leibo

    Abstract: Undesired bias afflicts both human and algorithmic decision making, and may be especially prevalent when information processing trade-offs incentivize the use of heuristics. One primary example is \textit{statistical discrimination} -- selecting social partners based not on their underlying attributes, but on readily perceptible characteristics that covary with their suitability for the task at ha… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

    Comments: 29 pages, 10 figures

    MSC Class: 68T07 (Primary) 91A26; 91-10; 93A16 (Secondary) ACM Class: I.2.11; I.2.0

  18. arXiv:2107.06857  [pdf, other

    cs.MA cs.AI

    Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot

    Authors: Joel Z. Leibo, Edgar Duéñez-Guzmán, Alexander Sasha Vezhnevets, John P. Agapiou, Peter Sunehag, Raphael Koster, Jayd Matyas, Charles Beattie, Igor Mordatch, Thore Graepel

    Abstract: Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess generalization to novel situations as their primary objective (unlike supervised-learning benchmarks). Our contribution, Melting Pot, is a MARL evaluation suite that fills this gap, and uses reinforcement learning to reduce the human labor required to create novel test scenarios. This works because one agent's b… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

    Comments: Accepted to ICML 2021 and presented as a long talk; 33 pages; 9 figures

    Journal ref: In International Conference on Machine Learning 2021 (pp. 6187-6199). PMLR

  19. arXiv:2106.10015  [pdf, other

    cs.SI cs.AI cs.MA cs.NE

    Meta-control of social learning strategies

    Authors: Anil Yaman, Nicolas Bredeche, Onur Çaylak, Joel Z. Leibo, Sang Wan Lee

    Abstract: Social learning, copying other's behavior without actual experience, offers a cost-effective means of knowledge acquisition. However, it raises the fundamental question of which individuals have reliable information: successful individuals versus the majority. The former and the latter are known respectively as success-based and conformist social learning strategies. We show here that while the su… ▽ More

    Submitted 7 March, 2022; v1 submitted 18 June, 2021; originally announced June 2021.

    Journal ref: PLoS Comput Biol 18(2): e1009882 (2022)

  20. arXiv:2106.09012  [pdf, other

    cs.MA

    A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings

    Authors: Eugene Vinitsky, Raphael Köster, John P. Agapiou, Edgar Duéñez-Guzmán, Alexander Sasha Vezhnevets, Joel Z. Leibo

    Abstract: Society is characterized by the presence of a variety of social norms: collective patterns of sanctioning that can prevent miscoordination and free-riding. Inspired by this, we aim to construct learning dynamics where potentially beneficial social norms can emerge. Since social norms are underpinned by sanctioning, we introduce a training regime where agents can access all sanctioning events but l… ▽ More

    Submitted 27 September, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

  21. arXiv:2103.04982  [pdf, other

    cs.MA cs.AI cs.GT

    A multi-agent reinforcement learning model of reputation and cooperation in human groups

    Authors: Kevin R. McKee, Edward Hughes, Tina O. Zhu, Martin J. Chadwick, Raphael Koster, Antonio Garcia Castaneda, Charlie Beattie, Thore Graepel, Matt Botvinick, Joel Z. Leibo

    Abstract: Collective action demands that individuals efficiently coordinate how much, where, and when to cooperate. Laboratory experiments have extensively explored the first part of this process, demonstrating that a variety of social-cognitive mechanisms influence how much individuals choose to invest in group efforts. However, experimental research has been unable to shed light on how social cognitive me… ▽ More

    Submitted 22 February, 2023; v1 submitted 8 March, 2021; originally announced March 2021.

  22. Quantifying the effects of environment and population diversity in multi-agent reinforcement learning

    Authors: Kevin R. McKee, Joel Z. Leibo, Charlie Beattie, Richard Everett

    Abstract: Generalization is a major challenge for multi-agent reinforcement learning. How well does an agent perform when placed in novel environments and in interactions with new co-players? In this paper, we investigate and quantify the relationship between generalization and diversity in the multi-agent domain. Across the range of multi-agent environments considered here, procedurally generating training… ▽ More

    Submitted 4 March, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: Accepted at Autonomous Agents and Multi-Agent Systems

  23. arXiv:2102.06911  [pdf, other

    cs.MA cs.AI

    Modelling Cooperation in Network Games with Spatio-Temporal Complexity

    Authors: Michiel A. Bakker, Richard Everett, Laura Weidinger, Iason Gabriel, William S. Isaac, Joel Z. Leibo, Edward Hughes

    Abstract: The real world is awash with multi-agent problems that require collective action by self-interested agents, from the routing of packets across a computer network to the management of irrigation systems. Such systems have local incentives for individuals, whose behavior has an impact on the global outcome for the group. Given appropriate mechanisms describing agent interaction, groups may achieve s… ▽ More

    Submitted 13 February, 2021; originally announced February 2021.

    Comments: AAMAS 2021

  24. arXiv:2012.08630  [pdf, other

    cs.AI cs.MA

    Open Problems in Cooperative AI

    Authors: Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R. McKee, Joel Z. Leibo, Kate Larson, Thore Graepel

    Abstract: Problems of cooperation--in which agents seek ways to jointly improve their welfare--are ubiquitous and important. They can be found at scales ranging from our daily routines--such as driving on highways, scheduling meetings, and working collaboratively--to our global challenges--such as peace, commerce, and pandemic preparedness. Arguably, the success of the human species is rooted in our ability… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

  25. arXiv:2011.07027  [pdf, other

    cs.AI

    DeepMind Lab2D

    Authors: Charles Beattie, Thomas Köppe, Edgar A. Duéñez-Guzmán, Joel Z. Leibo

    Abstract: We present DeepMind Lab2D, a scalable environment simulator for artificial intelligence research that facilitates researcher-led experimentation with environment design. DeepMind Lab2D was built with the specific needs of multi-agent deep reinforcement learning researchers in mind, but it may also be useful beyond that particular subfield.

    Submitted 12 December, 2020; v1 submitted 13 November, 2020; originally announced November 2020.

    Comments: 7 pages, 2 figures

  26. arXiv:2010.10380  [pdf, other

    cs.LG cs.AI cs.MA

    Negotiating Team Formation Using Deep Reinforcement Learning

    Authors: Yoram Bachrach, Richard Everett, Edward Hughes, Angeliki Lazaridou, Joel Z. Leibo, Marc Lanctot, Michael Johanson, Wojciech M. Czarnecki, Thore Graepel

    Abstract: When autonomous agents interact in the same environment, they must often cooperate to achieve their goals. One way for agents to cooperate effectively is to form a team, make a binding agreement on a joint plan, and execute it. However, when agents are self-interested, the gains from team formation must be allocated appropriately to incentivize agreement. Various approaches for multi-agent negotia… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

    ACM Class: I.2.6

    Journal ref: Artificial Intelligence 288 (2020): 103356

  27. arXiv:2010.09054  [pdf, other

    cs.MA

    Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences

    Authors: Raphael Köster, Kevin R. McKee, Richard Everett, Laura Weidinger, William S. Isaac, Edward Hughes, Edgar A. Duéñez-Guzmán, Thore Graepel, Matthew Botvinick, Joel Z. Leibo

    Abstract: Game theoretic views of convention generally rest on notions of common knowledge and hyper-rational models of individual behavior. However, decades of work in behavioral economics have questioned the validity of both foundations. Meanwhile, computational neuroscience has contributed a modernized 'dual process' account of decision-making where model-free (MF) reinforcement learning trades off with… ▽ More

    Submitted 14 December, 2020; v1 submitted 18 October, 2020; originally announced October 2020.

  28. arXiv:2003.00799  [pdf, other

    cs.GT cs.LG cs.MA stat.ML

    Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games

    Authors: Edward Hughes, Thomas W. Anthony, Tom Eccles, Joel Z. Leibo, David Balduzzi, Yoram Bachrach

    Abstract: Zero-sum games have long guided artificial intelligence research, since they possess both a rich strategy space of best-responses and a clear evaluation metric. What's more, competition is a vital mechanism in many real-world multi-agent systems capable of generating intelligent innovations: Darwinian evolution, the market economy and the AlphaZero algorithm, to name a few. In two-player zero-sum… ▽ More

    Submitted 27 February, 2020; originally announced March 2020.

    Comments: Accepted for publication at AAMAS 2020

  29. arXiv:2002.02325  [pdf, other

    cs.MA cs.AI

    Social diversity and social preferences in mixed-motive reinforcement learning

    Authors: Kevin R. McKee, Ian Gemp, Brian McWilliams, Edgar A. Duéñez-Guzmán, Edward Hughes, Joel Z. Leibo

    Abstract: Recent research on reinforcement learning in pure-conflict and pure-common interest games has emphasized the importance of population heterogeneity. In contrast, studies of reinforcement learning in mixed-motive games have primarily leveraged homogeneous approaches. Given the defining characteristic of mixed-motive games--the imperfect correlation of incentives between group members--we study the… ▽ More

    Submitted 12 February, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

    Comments: Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020)

  30. arXiv:2001.09318  [pdf, other

    cs.MA cs.AI

    Silly rules improve the capacity of agents to learn stable enforcement and compliance behaviors

    Authors: Raphael Köster, Dylan Hadfield-Menell, Gillian K. Hadfield, Joel Z. Leibo

    Abstract: How can societies learn to enforce and comply with social norms? Here we investigate the learning dynamics and emergence of compliance and enforcement of social norms in a foraging game, implemented in a multi-agent reinforcement learning setting. In this spatiotemporally extended game, individuals are incentivized to implement complex berry-foraging policies and punish transgressions against soci… ▽ More

    Submitted 25 January, 2020; originally announced January 2020.

  31. arXiv:2001.04678  [pdf, other

    cs.LG cs.AI cs.GT cs.MA stat.ML

    Smooth markets: A basic mechanism for organizing gradient-based learners

    Authors: David Balduzzi, Wojciech M Czarnecki, Thomas W Anthony, Ian M Gemp, Edward Hughes, Joel Z Leibo, Georgios Piliouras, Thore Graepel

    Abstract: With the success of modern machine learning, it is becoming increasingly important to understand and control how learning algorithms interact. Unfortunately, negative results from game theory show there is little hope of understanding or controlling general n-player games. We therefore introduce smooth markets (SM-games), a class of n-player games with pairwise zero sum interactions. SM-games codi… ▽ More

    Submitted 18 January, 2020; v1 submitted 14 January, 2020; originally announced January 2020.

    Comments: 18 pages, 3 figures

    Journal ref: ICLR 2020

  32. arXiv:1910.13406  [pdf, other

    cs.LG cs.AI stat.ML

    Generalization of Reinforcement Learners with Working and Episodic Memory

    Authors: Meire Fortunato, Melissa Tan, Ryan Faulkner, Steven Hansen, Adrià Puigdomènech Badia, Gavin Buttimore, Charlie Deck, Joel Z Leibo, Charles Blundell

    Abstract: Memory is an important aspect of intelligence and plays a role in many deep reinforcement learning models. However, little progress has been made in understanding when specific memory systems help more than others and how well they generalize. The field also has yet to see a prevalent consistent and rigorous approach for evaluating agent performance on holdout data. In this paper, we aim to develo… ▽ More

    Submitted 18 February, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

    Comments: NeurIPS 2019. Equal contribution of first 4 authors

    Journal ref: 33rd Conference on Neural Information Processing Systems (Neurips 2019)

  33. arXiv:1906.01470  [pdf, other

    cs.LG cs.AI cs.MA cs.NE stat.ML

    Options as responses: Grounding behavioural hierarchies in multi-agent RL

    Authors: Alexander Sasha Vezhnevets, Yuhuai Wu, Remi Leblond, Joel Z. Leibo

    Abstract: This paper investigates generalisation in multi-agent games, where the generality of the agent can be evaluated by playing against opponents it hasn't seen during training. We propose two new games with concealed information and complex, non-transitive reward structure (think rock/paper/scissors). It turns out that most current deep reinforcement learning methods fail to efficiently explore the st… ▽ More

    Submitted 10 July, 2020; v1 submitted 4 June, 2019; originally announced June 2019.

    Comments: First two authors contributed equally

    Journal ref: International Conference on Machine Learning 2020

  34. arXiv:1905.13469  [pdf, other

    cs.LG cs.AI cs.NE

    Interval timing in deep reinforcement learning agents

    Authors: Ben Deverett, Ryan Faulkner, Meire Fortunato, Greg Wayne, Joel Z. Leibo

    Abstract: The measurement of time is central to intelligent behavior. We know that both animals and artificial agents can successfully use temporal dependencies to select actions. In artificial agents, little work has directly addressed (1) which architectural components are necessary for successful development of this ability, (2) how this timing ability comes to be represented in the units and actions of… ▽ More

    Submitted 7 December, 2019; v1 submitted 31 May, 2019; originally announced May 2019.

    Comments: 11 pages, 7 figures

  35. arXiv:1903.08082  [pdf, other

    cs.MA cs.LG

    Learning Reciprocity in Complex Sequential Social Dilemmas

    Authors: Tom Eccles, Edward Hughes, János Kramár, Steven Wheelwright, Joel Z. Leibo

    Abstract: Reciprocity is an important feature of human social interaction and underpins our cooperative nature. What is more, simple forms of reciprocity have proved remarkably resilient in matrix game social dilemmas. Most famously, the tit-for-tat strategy performs very well in tournaments of Prisoner's Dilemma. Unfortunately this strategy is not readily applicable to the real world, in which options to c… ▽ More

    Submitted 19 March, 2019; originally announced March 2019.

  36. arXiv:1903.00742  [pdf, other

    cs.AI cs.GT cs.MA cs.NE q-bio.NC

    Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research

    Authors: Joel Z. Leibo, Edward Hughes, Marc Lanctot, Thore Graepel

    Abstract: Evolution has produced a multi-scale mosaic of interacting adaptive units. Innovations arise when perturbations push parts of the system away from stable equilibria into new regimes where previously well-adapted solutions no longer work. Here we explore the hypothesis that multi-agent systems sometimes display intrinsic dynamics arising from competition and cooperation that provide a naturally eme… ▽ More

    Submitted 11 March, 2019; v1 submitted 2 March, 2019; originally announced March 2019.

    Comments: 16 pages, 2 figures

  37. arXiv:1812.07019  [pdf, other

    cs.NE cs.MA q-bio.PE

    Malthusian Reinforcement Learning

    Authors: Joel Z. Leibo, Julien Perolat, Edward Hughes, Steven Wheelwright, Adam H. Marblestone, Edgar Duéñez-Guzmán, Peter Sunehag, Iain Dunning, Thore Graepel

    Abstract: Here we explore a new algorithmic framework for multi-agent reinforcement learning, called Malthusian reinforcement learning, which extends self-play to include fitness-linked population size dynamics that drive ongoing innovation. In Malthusian RL, increases in a subpopulation's average return drive subsequent increases in its size, just as Thomas Malthus argued in 1798 was the relationship betwe… ▽ More

    Submitted 3 March, 2019; v1 submitted 17 December, 2018; originally announced December 2018.

    Comments: 9 pages, 2 tables, 4 figures

  38. arXiv:1811.05931  [pdf, other

    cs.MA

    Evolving intrinsic motivations for altruistic behavior

    Authors: Jane X. Wang, Edward Hughes, Chrisantha Fernando, Wojciech M. Czarnecki, Edgar A. Duenez-Guzman, Joel Z. Leibo

    Abstract: Multi-agent cooperation is an important feature of the natural world. Many tasks involve individual incentives that are misaligned with the common good, yet a wide range of organisms from bacteria to insects and humans are able to overcome their differences and collaborate. Therefore, the emergence of cooperative behavior amongst self-interested individuals is an important question for the fields… ▽ More

    Submitted 11 March, 2019; v1 submitted 14 November, 2018; originally announced November 2018.

    Comments: 10 pages, 6 figures. In Proc. of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019)

  39. arXiv:1810.08647  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

    Authors: Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

    Abstract: We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions. Causal influence is assessed using counterfactual reasoning. At each timestep, an agent simulates alternate actions that it could have taken, and computes their effect on the behavior of other agen… ▽ More

    Submitted 18 June, 2019; v1 submitted 19 October, 2018; originally announced October 2018.

  40. arXiv:1807.01281  [pdf, other

    cs.LG cs.AI stat.ML

    Human-level performance in first-person multiplayer games with population-based deep reinforcement learning

    Authors: Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel

    Abstract: Recent progress in artificial intelligence through reinforcement learning (RL) has shown great success on increasingly complex single-agent environments and two-player turn-based games. However, the real-world contains multiple agents, each learning and acting independently to cooperate and compete with other agents, and environments reflecting this degree of complexity remain an open challenge. I… ▽ More

    Submitted 3 July, 2018; originally announced July 2018.

  41. arXiv:1804.03980  [pdf, other

    cs.AI cs.CL cs.LG cs.MA

    Emergent Communication through Negotiation

    Authors: Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z Leibo, Karl Tuyls, Stephen Clark

    Abstract: Multi-agent reinforcement learning offers a way to study how communication could emerge in communities of agents needing to solve specific problems. In this paper, we study the emergence of communication in the negotiation environment, a semi-cooperative model of agent interaction. We introduce two communication protocols -- one grounded in the semantics of the game, and one which is \textit{a pri… ▽ More

    Submitted 11 April, 2018; originally announced April 2018.

    Comments: Published as a conference paper at ICLR 2018

  42. arXiv:1803.10760  [pdf, other

    cs.LG stat.ML

    Unsupervised Predictive Memory in a Goal-Directed Agent

    Authors: Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, Timothy Lillicrap

    Abstract: Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement l… ▽ More

    Submitted 28 March, 2018; originally announced March 2018.

  43. arXiv:1803.08884  [pdf, other

    cs.NE cs.AI cs.GT cs.MA q-bio.PE

    Inequity aversion improves cooperation in intertemporal social dilemmas

    Authors: Edward Hughes, Joel Z. Leibo, Matthew G. Phillips, Karl Tuyls, Edgar A. Duéñez-Guzmán, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin R. McKee, Raphael Koster, Heather Roff, Thore Graepel

    Abstract: Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However… ▽ More

    Submitted 27 September, 2018; v1 submitted 23 March, 2018; originally announced March 2018.

    Comments: 15 pages, 8 figures

  44. arXiv:1803.06376  [pdf, other

    cs.GT cs.MA

    A Generalised Method for Empirical Game Theoretic Analysis

    Authors: Karl Tuyls, Julien Perolat, Marc Lanctot, Joel Z Leibo, Thore Graepel

    Abstract: This paper provides theoretical bounds for empirical game theoretical analysis of complex multi-agent interactions. We provide insights in the empirical meta game showing that a Nash equilibrium of the meta-game is an approximate Nash equilibrium of the true underlying game. We investigate and show how many data samples are required to obtain a close enough approximation of the underlying game. Ad… ▽ More

    Submitted 16 March, 2018; originally announced March 2018.

    Comments: will appear at AAMAS'18

  45. arXiv:1803.03835  [pdf, other

    cs.LG

    Kickstarting Deep Reinforcement Learning

    Authors: Simon Schmitt, Jonathan J. Hudson, Augustin Zidek, Simon Osindero, Carl Doersch, Wojciech M. Czarnecki, Joel Z. Leibo, Heinrich Kuttler, Andrew Zisserman, Karen Simonyan, S. M. Ali Eslami

    Abstract: We present a method for using previously-trained 'teacher' agents to kickstart the training of a new 'student' agent. To this end, we leverage ideas from policy distillation and population based training. Our method places no constraints on the architecture of the teacher or student agents, and it regulates itself to allow the students to surpass their teachers in performance. We show that, on a c… ▽ More

    Submitted 10 March, 2018; originally announced March 2018.

  46. arXiv:1801.08116  [pdf, other

    cs.AI cs.NE q-bio.NC

    Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents

    Authors: Joel Z. Leibo, Cyprien de Masson d'Autume, Daniel Zoran, David Amos, Charles Beattie, Keith Anderson, Antonio García Castañeda, Manuel Sanchez, Simon Green, Audrunas Gruslys, Shane Legg, Demis Hassabis, Matthew M. Botvinick

    Abstract: Psychlab is a simulated psychology laboratory inside the first-person 3D game world of DeepMind Lab (Beattie et al. 2016). Psychlab enables implementations of classical laboratory psychological experiments so that they work with both human and artificial agents. Psychlab has a simple and flexible API that enables users to easily create their own tasks. As examples, we are releasing Psychlab implem… ▽ More

    Submitted 4 February, 2018; v1 submitted 24 January, 2018; originally announced January 2018.

    Comments: 28 pages, 11 figures

  47. arXiv:1711.08378  [pdf

    cs.AI

    Building Machines that Learn and Think for Themselves: Commentary on Lake et al., Behavioral and Brain Sciences, 2017

    Authors: M. Botvinick, D. G. T. Barrett, P. Battaglia, N. de Freitas, D. Kumaran, J. Z Leibo, T. Lillicrap, J. Modayil, S. Mohamed, N. C. Rabinowitz, D. J. Rezende, A. Santoro, T. Schaul, C. Summerfield, G. Wayne, T. Weber, D. Wierstra, S. Legg, D. Hassabis

    Abstract: We agree with Lake and colleagues on their list of key ingredients for building humanlike intelligence, including the idea that model-based reasoning is essential. However, we favor an approach that centers on one additional ingredient: autonomy. In particular, we aim toward agents that can both build and exploit their own internal models, with minimal human hand-engineering. We believe an approac… ▽ More

    Submitted 22 November, 2017; originally announced November 2017.

  48. arXiv:1711.05074  [pdf, other

    cs.GT cs.MA

    Symmetric Decomposition of Asymmetric Games

    Authors: Karl Tuyls, Julien Perolat, Marc Lanctot, Georg Ostrovski, Rahul Savani, Joel Leibo, Toby Ord, Thore Graepel, Shane Legg

    Abstract: We introduce new theoretical insights into two-population asymmetric games allowing for an elegant symmetric decomposition into two single population symmetric games. Specifically, we show how an asymmetric bimatrix game (A,B) can be decomposed into its symmetric counterparts by envisioning and investigating the payoff tables (A and B) that constitute the asymmetric game, as two independent, singl… ▽ More

    Submitted 17 January, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

    Comments: Paper is published in Scientific Reports; https://www.nature.com/articles/s41598-018-19194-4, 2018

  49. arXiv:1707.06600  [pdf, other

    cs.MA cs.NE q-bio.PE

    A multi-agent reinforcement learning model of common-pool resource appropriation

    Authors: Julien Perolat, Joel Z. Leibo, Vinicius Zambaldi, Charles Beattie, Karl Tuyls, Thore Graepel

    Abstract: Humanity faces numerous problems of common-pool resource appropriation. This class of multi-agent social dilemma includes the problems of ensuring sustainable use of fresh water, common fisheries, grazing pastures, and irrigation systems. Abstract models of common-pool resource appropriation based on non-cooperative game theory predict that self-interested agents will generally fail to find social… ▽ More

    Submitted 6 September, 2017; v1 submitted 20 July, 2017; originally announced July 2017.

    Comments: 15 pages, 11 figures

  50. arXiv:1706.05296  [pdf, other

    cs.AI

    Value-Decomposition Networks For Cooperative Multi-Agent Learning

    Authors: Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel

    Abstract: We study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal. This class of learning problems is difficult because of the often large combined action and observation spaces. In the fully centralized and decentralized approaches, we find the problem of spurious rewards and a phenomenon we call the "lazy agent" problem, which arises due to partial observab… ▽ More

    Submitted 16 June, 2017; originally announced June 2017.

    ACM Class: I.2.11