-
Approximating the Core via Iterative Coalition Sampling
Authors:
Ian Gemp,
Marc Lanctot,
Luke Marris,
Yiran Mao,
Edgar Duéñez-Guzmán,
Sarah Perrin,
Andras Gyorgy,
Romuald Elie,
Georgios Piliouras,
Michael Kaisers,
Daniel Hennes,
Kalesha Bullard,
Kate Larson,
Yoram Bachrach
Abstract:
The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has long been known that the core (and approximations, such as the least-core) are hard to compute. This limits our ability to analyze cooperative games in general, a…
▽ More
The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has long been known that the core (and approximations, such as the least-core) are hard to compute. This limits our ability to analyze cooperative games in general, and to fully embrace cooperative game theory contributions in domains such as explainable AI (XAI), where the core can complement the Shapley values to identify influential features or instances supporting predictions by black-box models. We propose novel iterative algorithms for computing variants of the core, which avoid the computational bottleneck of many other approaches; namely solving large linear programs. As such, they scale better to very large problems as we demonstrate across different classes of cooperative games, including weighted voting games, induced subgraph games, and marginal contribution networks. We also explore our algorithms in the context of XAI, providing further evidence of the power of the core for such applications.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
TacticAI: an AI assistant for football tactics
Authors:
Zhe Wang,
Petar Veličković,
Daniel Hennes,
Nenad Tomašev,
Laurel Prince,
Michael Kaisers,
Yoram Bachrach,
Romuald Elie,
Li Kevin Wenliang,
Federico Piccinini,
William Spearman,
Ian Graham,
Jerome Connor,
Yi Yang,
Adrià Recasens,
Mina Khan,
Nathalie Beauguerlange,
Pablo Sprechmann,
Pol Moreno,
Nicolas Heess,
Michael Bowling,
Demis Hassabis,
Karl Tuyls
Abstract:
Identifying key patterns of tactics implemented by rival teams, and developing effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we propose TacticAI, an AI football tactics assistant developed and evaluated in close collaboration with domain experts from Liverpool FC. We focus on analysing co…
▽ More
Identifying key patterns of tactics implemented by rival teams, and developing effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we propose TacticAI, an AI football tactics assistant developed and evaluated in close collaboration with domain experts from Liverpool FC. We focus on analysing corner kicks, as they offer coaches the most direct opportunities for interventions and improvements. TacticAI incorporates both a predictive and a generative component, allowing the coaches to effectively sample and explore alternative player setups for each corner kick routine and to select those with the highest predicted likelihood of success. We validate TacticAI on a number of relevant benchmark tasks: predicting receivers and shot attempts and recommending player position adjustments. The utility of TacticAI is validated by a qualitative study conducted with football domain experts at Liverpool FC. We show that TacticAI's model suggestions are not only indistinguishable from real tactics, but also favoured over existing tactics 90% of the time, and that TacticAI offers an effective corner kick retrieval system. TacticAI achieves these results despite the limited availability of gold-standard data, achieving data efficiency through geometric deep learning.
△ Less
Submitted 17 October, 2023; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning
Authors:
Marc Lanctot,
John Schultz,
Neil Burch,
Max Olan Smith,
Daniel Hennes,
Thomas Anthony,
Julien Perolat
Abstract:
Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy. In sequential decision-making, agent evaluation has largely been restricted to few interactions against experts, with the aim to reach some desired level of performance (e.g. beating a human professional player). We pro…
▽ More
Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy. In sequential decision-making, agent evaluation has largely been restricted to few interactions against experts, with the aim to reach some desired level of performance (e.g. beating a human professional player). We propose a benchmark for multiagent learning based on repeated play of the simple game Rock, Paper, Scissors along with a population of forty-three tournament entries, some of which are intentionally sub-optimal. We describe metrics to measure the quality of agents based both on average returns and exploitability. We then show that several RL, online learning, and language model approaches can learn good counter-strategies and generalize well, but ultimately lose to the top-performing bots, creating an opportunity for research in multiagent learning.
△ Less
Submitted 31 October, 2023; v1 submitted 2 March, 2023;
originally announced March 2023.
-
Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning
Authors:
Zun Li,
Marc Lanctot,
Kevin R. McKee,
Luke Marris,
Ian Gemp,
Daniel Hennes,
Paul Muller,
Kate Larson,
Yoram Bachrach,
Michael P. Wellman
Abstract:
Multiagent reinforcement learning (MARL) has benefited significantly from population-based and game-theoretic training regimes. One approach, Policy-Space Response Oracles (PSRO), employs standard reinforcement learning to compute response policies via approximate best responses and combines them via meta-strategy selection. We augment PSRO by adding a novel search procedure with generative sampli…
▽ More
Multiagent reinforcement learning (MARL) has benefited significantly from population-based and game-theoretic training regimes. One approach, Policy-Space Response Oracles (PSRO), employs standard reinforcement learning to compute response policies via approximate best responses and combines them via meta-strategy selection. We augment PSRO by adding a novel search procedure with generative sampling of world states, and introduce two new meta-strategy solvers based on the Nash bargaining solution. We evaluate PSRO's ability to compute approximate Nash equilibrium, and its performance in two negotiation games: Colored Trails, and Deal or No Deal. We conduct behavioral studies where human participants negotiate with our agents ($N = 346$). We find that search with generative modeling finds stronger policies during both training time and test time, enables online Bayesian co-player prediction, and can produce agents that achieve comparable social welfare negotiating with humans as humans trading among themselves.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments
Authors:
Ian Gemp,
Thomas Anthony,
Yoram Bachrach,
Avishkar Bhoopchand,
Kalesha Bullard,
Jerome Connor,
Vibhavari Dasagi,
Bart De Vylder,
Edgar Duenez-Guzman,
Romuald Elie,
Richard Everett,
Daniel Hennes,
Edward Hughes,
Mina Khan,
Marc Lanctot,
Kate Larson,
Guy Lever,
Siqi Liu,
Luke Marris,
Kevin R. McKee,
Paul Muller,
Julien Perolat,
Florian Strub,
Andrea Tacchetti,
Eugene Tarassov
, et al. (2 additional authors not shown)
Abstract:
The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in d…
▽ More
The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in deep reinforcement learning to explore multi-agent systems in complex environments and use these benchmarks to advance our understanding. Here, we summarise the recent work of our team and present a taxonomy that we feel highlights many important open challenges in multi-agent research.
△ Less
Submitted 22 September, 2022;
originally announced September 2022.
-
Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
Authors:
Julien Perolat,
Bart de Vylder,
Daniel Hennes,
Eugene Tarassov,
Florian Strub,
Vincent de Boer,
Paul Muller,
Jerome T. Connor,
Neil Burch,
Thomas Anthony,
Stephen McAleer,
Romuald Elie,
Sarah H. Cen,
Zhe Wang,
Audrunas Gruslys,
Aleksandra Malysheva,
Mina Khan,
Sherjil Ozair,
Finbarr Timbers,
Toby Pohlen,
Tom Eccles,
Mark Rowland,
Marc Lanctot,
Jean-Baptiste Lespiau,
Bilal Piot
, et al. (9 additional authors not shown)
Abstract:
We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additiona…
▽ More
We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.
△ Less
Submitted 30 June, 2022;
originally announced June 2022.
-
NeuPL: Neural Population Learning
Authors:
Siqi Liu,
Luke Marris,
Daniel Hennes,
Josh Merel,
Nicolas Heess,
Thore Graepel
Abstract:
Learning in strategy games (e.g. StarCraft, poker) requires the discovery of diverse policies. This is often achieved by iteratively training new policies against existing ones, growing a policy population that is robust to exploit. This iterative approach suffers from two issues in real-world games: a) under finite budget, approximate best-response operators at each iteration needs truncating, re…
▽ More
Learning in strategy games (e.g. StarCraft, poker) requires the discovery of diverse policies. This is often achieved by iteratively training new policies against existing ones, growing a policy population that is robust to exploit. This iterative approach suffers from two issues in real-world games: a) under finite budget, approximate best-response operators at each iteration needs truncating, resulting in under-trained good-responses populating the population; b) repeated learning of basic skills at each iteration is wasteful and becomes intractable in the presence of increasingly strong opponents. In this work, we propose Neural Population Learning (NeuPL) as a solution to both issues. NeuPL offers convergence guarantees to a population of best-responses under mild assumptions. By representing a population of policies within a single conditional model, NeuPL enables transfer learning across policies. Empirically, we show the generality, improved performance and efficiency of NeuPL across several test domains. Most interestingly, we show that novel strategies become more accessible, not less, as the neural population expands.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
Which priors matter? Benchmarking models for learning latent dynamics
Authors:
Aleksandar Botev,
Andrew Jaegle,
Peter Wirnsberger,
Daniel Hennes,
Irina Higgins
Abstract:
Learning dynamics is at the heart of many important applications of machine learning (ML), such as robotics and autonomous driving. In these settings, ML algorithms typically need to reason about a physical system using high dimensional observations, such as images, without access to the underlying state. Recently, several methods have proposed to integrate priors from classical mechanics into ML…
▽ More
Learning dynamics is at the heart of many important applications of machine learning (ML), such as robotics and autonomous driving. In these settings, ML algorithms typically need to reason about a physical system using high dimensional observations, such as images, without access to the underlying state. Recently, several methods have proposed to integrate priors from classical mechanics into ML models to address the challenge of physical reasoning from images. In this work, we take a sober look at the current capabilities of these models. To this end, we introduce a suite consisting of 17 datasets with visual observations based on physical systems exhibiting a wide range of dynamics. We conduct a thorough and detailed comparison of the major classes of physically inspired methods alongside several strong baselines. While models that incorporate physical priors can often learn latent spaces with desirable properties, our results demonstrate that these methods fail to significantly improve upon standard techniques. Nonetheless, we find that the use of continuous and time-reversible dynamics benefits models of all classes.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
Evolutionary Dynamics and $Φ$-Regret Minimization in Games
Authors:
Georgios Piliouras,
Mark Rowland,
Shayegan Omidshafiei,
Romuald Elie,
Daniel Hennes,
Jerome Connor,
Karl Tuyls
Abstract:
Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. Regret quantifies the difference between a learner's performance against a baseline in hindsight. It is well-known that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret u…
▽ More
Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. Regret quantifies the difference between a learner's performance against a baseline in hindsight. It is well-known that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret used in game theory predominantly consider baselines that permit deviations to deterministic actions or strategies. In this paper, we revisit our understanding of regret from the perspective of deviations over partitions of the full \emph{mixed} strategy space (i.e., probability distributions over pure strategies), under the lens of the previously-established $Φ$-regret framework, which provides a continuum of stronger regret measures. Importantly, $Φ$-regret enables learning agents to consider deviations from and to mixed strategies, generalizing several existing notions of regret such as external, internal, and swap regret, and thus broadening the insights gained from regret-based analysis of learning algorithms. We prove here that the well-studied evolutionary learning algorithm of replicator dynamics (RD) seamlessly minimizes the strongest possible form of $Φ$-regret in generic $2 \times 2$ games, without any modification of the underlying algorithm itself. We subsequently conduct experiments validating our theoretical results in a suite of 144 $2 \times 2$ games wherein RD exhibits a diverse set of behaviors. We conclude by providing empirical evidence of $Φ$-regret minimization by RD in some larger games, hinting at further opportunity for $Φ$-regret based study of such algorithms from both a theoretical and empirical perspective.
△ Less
Submitted 28 June, 2021;
originally announced June 2021.
-
Time-series Imputation of Temporally-occluded Multiagent Trajectories
Authors:
Shayegan Omidshafiei,
Daniel Hennes,
Marta Garnelo,
Eugene Tarassov,
Zhe Wang,
Romuald Elie,
Jerome T. Connor,
Paul Muller,
Ian Graham,
William Spearman,
Karl Tuyls
Abstract:
In multiagent environments, several decision-making individuals interact while adhering to the dynamics constraints imposed by the environment. These interactions, combined with the potential stochasticity of the agents' decision-making processes, make such systems complex and interesting to study from a dynamical perspective. Significant research has been conducted on learning models for forward-…
▽ More
In multiagent environments, several decision-making individuals interact while adhering to the dynamics constraints imposed by the environment. These interactions, combined with the potential stochasticity of the agents' decision-making processes, make such systems complex and interesting to study from a dynamical perspective. Significant research has been conducted on learning models for forward-direction estimation of agent behaviors, for example, pedestrian predictions used for collision-avoidance in self-driving cars. However, in many settings, only sporadic observations of agents may be available in a given trajectory sequence. For instance, in football, subsets of players may come in and out of view of broadcast video footage, while unobserved players continue to interact off-screen. In this paper, we study the problem of multiagent time-series imputation, where available past and future observations of subsets of agents are used to estimate missing observations for other agents. Our approach, called the Graph Imputer, uses forward- and backward-information in combination with graph networks and variational autoencoders to enable learning of a distribution of imputed trajectories. We evaluate our approach on a dataset of football matches, using a projective camera module to train and evaluate our model for the off-screen player state estimation setting. We illustrate that our method outperforms several state-of-the-art approaches, including those hand-crafted for football.
△ Less
Submitted 8 June, 2021;
originally announced June 2021.
-
From Motor Control to Team Play in Simulated Humanoid Football
Authors:
Siqi Liu,
Guy Lever,
Zhe Wang,
Josh Merel,
S. M. Ali Eslami,
Daniel Hennes,
Wojciech M. Czarnecki,
Yuval Tassa,
Shayegan Omidshafiei,
Abbas Abdolmaleki,
Noah Y. Siegel,
Leonard Hasenclever,
Luke Marris,
Saran Tunyasuvunakool,
H. Francis Song,
Markus Wulfmeier,
Paul Muller,
Tuomas Haarnoja,
Brendan D. Tracey,
Karl Tuyls,
Thore Graepel,
Nicolas Heess
Abstract:
Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents…
▽ More
Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents. Recent research in artificial intelligence has shown the promise of learning-based approaches to the respective problems of complex movement, longer-term planning and multi-agent coordination. However, there is limited research aimed at their integration. We study this problem by training teams of physically simulated humanoid avatars to play football in a realistic virtual environment. We develop a method that combines imitation learning, single- and multi-agent reinforcement learning and population-based training, and makes use of transferable representations of behaviour for decision making at different levels of abstraction. In a sequence of stages, players first learn to control a fully articulated body to perform realistic, human-like movements such as running and turning; they then acquire mid-level football skills such as dribbling and shooting; finally, they develop awareness of others and play as a team, bridging the gap between low-level motor control at a timescale of milliseconds, and coordinated goal-directed behaviour as a team at the timescale of tens of seconds. We investigate the emergence of behaviours at different levels of abstraction, as well as the representations that underlie these behaviours using several analysis techniques, including statistics from real-world sports analytics. Our work constitutes a complete demonstration of integrated decision-making at multiple scales in a physically embodied multi-agent setting. See project video at https://youtu.be/KHMwq9pv7mg.
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
Game Plan: What AI can do for Football, and What Football can do for AI
Authors:
Karl Tuyls,
Shayegan Omidshafiei,
Paul Muller,
Zhe Wang,
Jerome Connor,
Daniel Hennes,
Ian Graham,
William Spearman,
Tim Waskett,
Dafydd Steele,
Pauline Luc,
Adria Recasens,
Alexandre Galashov,
Gregory Thornton,
Romuald Elie,
Pablo Sprechmann,
Pol Moreno,
Kris Cao,
Marta Garnelo,
Praneet Dutta,
Michal Valko,
Nicolas Heess,
Alex Bridgland,
Julien Perolat,
Bart De Vylder
, et al. (11 additional authors not shown)
Abstract:
The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with t…
▽ More
The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with the goal of better addressing new scientific challenges involved in the analysis of both individual players' and coordinated teams' behaviors. The research challenges associated with predictive and prescriptive football analytics require new developments and progress at the intersection of statistical learning, game theory, and computer vision. In this paper, we provide an overarching perspective highlighting how the combination of these fields, in particular, forms a unique microcosm for AI research, while offering mutual benefits for professional teams, spectators, and broadcasters in the years to come. We illustrate that this duality makes football analytics a game changer of tremendous value, in terms of not only changing the game of football itself, but also in terms of what this domain can mean for the field of AI. We review the state-of-the-art and exemplify the types of analysis enabled by combining the aforementioned fields, including illustrative examples of counterfactual analysis using predictive models, and the combination of game-theoretic analysis of penalty kicks with statistical learning of player attributes. We conclude by highlighting envisioned downstream impacts, including possibilities for extensions to other sports (real and virtual).
△ Less
Submitted 18 November, 2020;
originally announced November 2020.
-
Navigating the Landscape of Multiplayer Games
Authors:
Shayegan Omidshafiei,
Karl Tuyls,
Wojciech M. Czarnecki,
Francisco C. Santos,
Mark Rowland,
Jerome Connor,
Daniel Hennes,
Paul Muller,
Julien Perolat,
Bart De Vylder,
Audrunas Gruslys,
Remi Munos
Abstract:
Multiplayer games have long been used as testbeds in artificial intelligence research, aptly referred to as the Drosophila of artificial intelligence. Traditionally, researchers have focused on using well-known games to build strong agents. This progress, however, can be better informed by characterizing games and their topological landscape. Tackling this latter question can facilitate understand…
▽ More
Multiplayer games have long been used as testbeds in artificial intelligence research, aptly referred to as the Drosophila of artificial intelligence. Traditionally, researchers have focused on using well-known games to build strong agents. This progress, however, can be better informed by characterizing games and their topological landscape. Tackling this latter question can facilitate understanding of agents and help determine what game an agent should target next as part of its training. Here, we show how network measures applied to response graphs of large-scale games enable the creation of a landscape of games, quantifying relationships between games of varying sizes and characteristics. We illustrate our findings in domains ranging from canonical games to complex empirical games capturing the performance of trained agents pitted against one another. Our results culminate in a demonstration leveraging this information to generate new and interesting games, including mixtures of empirical games synthesized from real world games.
△ Less
Submitted 17 November, 2020; v1 submitted 4 May, 2020;
originally announced May 2020.
-
A Generalized Training Approach for Multiagent Learning
Authors:
Paul Muller,
Shayegan Omidshafiei,
Mark Rowland,
Karl Tuyls,
Julien Perolat,
Siqi Liu,
Daniel Hennes,
Luke Marris,
Marc Lanctot,
Edward Hughes,
Zhe Wang,
Guy Lever,
Nicolas Heess,
Thore Graepel,
Remi Munos
Abstract:
This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-…
▽ More
This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-player zero-sum games, a regime wherein Nash equilibria are tractably computable. In moving from two-player zero-sum games to more general settings, computation of Nash equilibria quickly becomes infeasible. Here, we extend the theoretical underpinnings of PSRO by considering an alternative solution concept, $α$-Rank, which is unique (thus faces no equilibrium selection issues, unlike Nash) and applies readily to general-sum, many-player settings. We establish convergence guarantees in several games classes, and identify links between Nash equilibria and $α$-Rank. We demonstrate the competitive performance of $α$-Rank-based PSRO against an exact Nash solver-based PSRO in 2-player Kuhn and Leduc Poker. We then go beyond the reach of prior PSRO applications by considering 3- to 5-player poker games, yielding instances where $α$-Rank achieves faster convergence than approximate Nash solvers, thus establishing it as a favorable general games solver. We also carry out an initial empirical validation in MuJoCo soccer, illustrating the feasibility of the proposed approach in another complex domain.
△ Less
Submitted 14 February, 2020; v1 submitted 27 September, 2019;
originally announced September 2019.
-
OpenSpiel: A Framework for Reinforcement Learning in Games
Authors:
Marc Lanctot,
Edward Lockhart,
Jean-Baptiste Lespiau,
Vinicius Zambaldi,
Satyaki Upadhyay,
Julien Pérolat,
Sriram Srinivasan,
Finbarr Timbers,
Karl Tuyls,
Shayegan Omidshafiei,
Daniel Hennes,
Dustin Morrill,
Paul Muller,
Timo Ewalds,
Ryan Faulkner,
János Kramár,
Bart De Vylder,
Brennan Saeta,
James Bradbury,
David Ding,
Sebastian Borgeaud,
Matthew Lai,
Julian Schrittwieser,
Thomas Anthony,
Edward Hughes
, et al. (2 additional authors not shown)
Abstract:
OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partia…
▽ More
OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partially- and fully- observable) grid worlds and social dilemmas. OpenSpiel also includes tools to analyze learning dynamics and other common evaluation metrics. This document serves both as an overview of the code base and an introduction to the terminology, core concepts, and algorithms across the fields of reinforcement learning, computational game theory, and search.
△ Less
Submitted 26 September, 2020; v1 submitted 25 August, 2019;
originally announced August 2019.
-
Neural Replicator Dynamics
Authors:
Daniel Hennes,
Dustin Morrill,
Shayegan Omidshafiei,
Remi Munos,
Julien Perolat,
Marc Lanctot,
Audrunas Gruslys,
Jean-Baptiste Lespiau,
Paavo Parmas,
Edgar Duenez-Guzman,
Karl Tuyls
Abstract:
Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning. Using these algorithms in multiagent environments poses problems such as nonstationarity and instability. In this paper, we first demonstrate that standard softmax-based policy gradient can be prone to poor performance in the presence of even the most benign nonstati…
▽ More
Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning. Using these algorithms in multiagent environments poses problems such as nonstationarity and instability. In this paper, we first demonstrate that standard softmax-based policy gradient can be prone to poor performance in the presence of even the most benign nonstationarity. By contrast, it is known that the replicator dynamics, a well-studied model from evolutionary game theory, eliminates dominated strategies and exhibits convergence of the time-averaged trajectories to interior Nash equilibria in zero-sum games. Thus, using the replicator dynamics as a foundation, we derive an elegant one-line change to policy gradient methods that simply bypasses the gradient step through the softmax, yielding a new algorithm titled Neural Replicator Dynamics (NeuRD). NeuRD reduces to the exponential weights/Hedge algorithm in the single-state all-actions case. Additionally, NeuRD has formal equivalence to softmax counterfactual regret minimization, which guarantees convergence in the sequential tabular case. Importantly, our algorithm provides a straightforward way of extending the replicator dynamics to the function approximation setting. Empirical results show that NeuRD quickly adapts to nonstationarities, outperforming policy gradient significantly in both tabular and function approximation settings, when evaluated on the standard imperfect information benchmarks of Kuhn Poker, Leduc Poker, and Goofspiel.
△ Less
Submitted 26 February, 2020; v1 submitted 1 June, 2019;
originally announced June 2019.
-
Persistent self-supervised learning principle: from stereo to monocular vision for obstacle avoidance
Authors:
Kevin van Hecke,
Guido de Croon,
Laurens van der Maaten,
Daniel Hennes,
Dario Izzo
Abstract:
Self-Supervised Learning (SSL) is a reliable learning mechanism in which a robot uses an original, trusted sensor cue for training to recognize an additional, complementary sensor cue. We study for the first time in SSL how a robot's learning behavior should be organized, so that the robot can keep performing its task in the case that the original cue becomes unavailable. We study this persistent…
▽ More
Self-Supervised Learning (SSL) is a reliable learning mechanism in which a robot uses an original, trusted sensor cue for training to recognize an additional, complementary sensor cue. We study for the first time in SSL how a robot's learning behavior should be organized, so that the robot can keep performing its task in the case that the original cue becomes unavailable. We study this persistent form of SSL in the context of a flying robot that has to avoid obstacles based on distance estimates from the visual cue of stereo vision. Over time it will learn to also estimate distances based on monocular appearance cues. A strategy is introduced that has the robot switch from stereo vision based flight to monocular flight, with stereo vision purely used as 'training wheels' to avoid imminent collisions. This strategy is shown to be an effective approach to the 'feedback-induced data bias' problem as also experienced in learning from demonstration. Both simulations and real-world experiments with a stereo vision equipped AR drone 2.0 show the feasibility of this approach, with the robot successfully using monocular vision to avoid obstacles in a 5 x 5 room. The experiments show the potential of persistent SSL as a robust learning approach to enhance the capabilities of robots. Moreover, the abundant training data coming from the own sensors allows to gather large data sets necessary for deep learning approaches.
△ Less
Submitted 25 March, 2016;
originally announced March 2016.
-
GTOC8: Results and Methods of ESA Advanced Concepts Team and JAXA-ISAS
Authors:
Dario Izzo,
Daniel Hennes,
Marcus Märtens,
Ingmar Getzner,
Krzysztof Nowak,
Anna Heffernan,
Stefano Campagnola,
Chit Hong Yam,
Naoya Ozaki,
Yoshihide Sugimoto
Abstract:
We consider the interplanetary trajectory design problem posed by the 8th edition of the Global Trajectory Optimization Competition and present the end-to-end strategy developed by the team ACT-ISAS (a collaboration between the European Space Agency's Advanced Concepts Team and JAXA's Institute of Space and Astronautical Science). The resulting interplanetary trajectory won 1st place in the compet…
▽ More
We consider the interplanetary trajectory design problem posed by the 8th edition of the Global Trajectory Optimization Competition and present the end-to-end strategy developed by the team ACT-ISAS (a collaboration between the European Space Agency's Advanced Concepts Team and JAXA's Institute of Space and Astronautical Science). The resulting interplanetary trajectory won 1st place in the competition, achieving a final mission value of $J=146.33$ [Mkm]. Several new algorithms were developed in this context but have an interest that go beyond the particular problem considered, thus, they are discussed in some detail. These include the Moon-targeting technique, allowing one to target a Moon encounter from a low Earth orbit; the 1-$k$ and 2-$k$ fly-by targeting techniques, enabling one to design resonant fly-bys while ensuring a targeted future formation plane% is acquired at some point after the manoeuvre ; the distributed low-thrust targeting technique, admitting one to control the spacecraft formation plane at 1,000,000 [km]; and the low-thrust optimization technique, permitting one to enforce the formation plane's orientations as path constraints.
△ Less
Submitted 3 February, 2016; v1 submitted 2 February, 2016;
originally announced February 2016.
-
Designing Complex Interplanetary Trajectories for the Global Trajectory Optimization Competitions
Authors:
Dario Izzo,
Daniel Hennes,
Luís F. Simões,
Marcus Märtens
Abstract:
The design of interplanetary trajectories often involves a preliminary search for options later refined/assembled into one final trajectory. It is this broad search that, often being intractable, inspires the international event called Global Trajectory Optimization Competition. In the first part of this chapter, we introduce some fundamental problems of space flight mechanics, building blocks of…
▽ More
The design of interplanetary trajectories often involves a preliminary search for options later refined/assembled into one final trajectory. It is this broad search that, often being intractable, inspires the international event called Global Trajectory Optimization Competition. In the first part of this chapter, we introduce some fundamental problems of space flight mechanics, building blocks of any attempt to participate successfully in these competitions, and we describe the use of the open source software PyKEP to solve them. In the second part, we formulate an instance of a multiple asteroid rendezvous problem, related to the 7th edition of the competition, and we show step by step how to build a possible solution strategy. In doing so, we introduce two new techniques useful in the design of this particular mission type: the use of an asteroid phasing value and its surrogates and the efficient computation of asteroid clusters. We show how the basic building blocks, sided to these innovative ideas, allow designing an effective global search for possible trajectories.
△ Less
Submitted 10 March, 2016; v1 submitted 3 November, 2015;
originally announced November 2015.