Search | arXiv e-print repository

Using deep reinforcement learning to promote sustainable human behaviour on a common pool resource problem

Authors: Raphael Koster, Miruna Pîslar, Andrea Tacchetti, Jan Balaguer, Leqi Liu, Romuald Elie, Oliver P. Hauser, Karl Tuyls, Matt Botvinick, Christopher Summerfield

Abstract: A canonical social dilemma arises when finite resources are allocated to a group of people, who can choose to either reciprocate with interest, or keep the proceeds for themselves. What resource allocation mechanisms will encourage levels of reciprocation that sustain the commons? Here, in an iterated multiplayer trust game, we use deep reinforcement learning (RL) to design an allocation mechanism… ▽ More A canonical social dilemma arises when finite resources are allocated to a group of people, who can choose to either reciprocate with interest, or keep the proceeds for themselves. What resource allocation mechanisms will encourage levels of reciprocation that sustain the commons? Here, in an iterated multiplayer trust game, we use deep reinforcement learning (RL) to design an allocation mechanism that endogenously promotes sustainable contributions from human participants to a common pool resource. We first trained neural networks to behave like human players, creating a stimulated economy that allowed us to study how different mechanisms influenced the dynamics of receipt and reciprocation. We then used RL to train a social planner to maximise aggregate return to players. The social planner discovered a redistributive policy that led to a large surplus and an inclusive economy, in which players made roughly equal gains. The RL agent increased human surplus over baseline mechanisms based on unrestricted welfare or conditional cooperation, by conditioning its generosity on available resources and temporarily sanctioning defectors by allocating fewer resources to them. Examining the AI policy allowed us to develop an explainable mechanism that performed similarly and was more popular among players. Deep reinforcement learning can be used to discover mechanisms that promote sustainable human behaviour. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2402.03928 [pdf, other]

Approximating the Core via Iterative Coalition Sampling

Authors: Ian Gemp, Marc Lanctot, Luke Marris, Yiran Mao, Edgar Duéñez-Guzmán, Sarah Perrin, Andras Gyorgy, Romuald Elie, Georgios Piliouras, Michael Kaisers, Daniel Hennes, Kalesha Bullard, Kate Larson, Yoram Bachrach

Abstract: The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has long been known that the core (and approximations, such as the least-core) are hard to compute. This limits our ability to analyze cooperative games in general, a… ▽ More The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has long been known that the core (and approximations, such as the least-core) are hard to compute. This limits our ability to analyze cooperative games in general, and to fully embrace cooperative game theory contributions in domains such as explainable AI (XAI), where the core can complement the Shapley values to identify influential features or instances supporting predictions by black-box models. We propose novel iterative algorithms for computing variants of the core, which avoid the computational bottleneck of many other approaches; namely solving large linear programs. As such, they scale better to very large problems as we demonstrate across different classes of cooperative games, including weighted voting games, induced subgraph games, and marginal contribution networks. We also explore our algorithms in the context of XAI, providing further evidence of the power of the core for such applications. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: Published in AAMAS 2024

arXiv:2310.10553 [pdf, other]

TacticAI: an AI assistant for football tactics

Authors: Zhe Wang, Petar Veličković, Daniel Hennes, Nenad Tomašev, Laurel Prince, Michael Kaisers, Yoram Bachrach, Romuald Elie, Li Kevin Wenliang, Federico Piccinini, William Spearman, Ian Graham, Jerome Connor, Yi Yang, Adrià Recasens, Mina Khan, Nathalie Beauguerlange, Pablo Sprechmann, Pol Moreno, Nicolas Heess, Michael Bowling, Demis Hassabis, Karl Tuyls

Abstract: Identifying key patterns of tactics implemented by rival teams, and developing effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we propose TacticAI, an AI football tactics assistant developed and evaluated in close collaboration with domain experts from Liverpool FC. We focus on analysing co… ▽ More Identifying key patterns of tactics implemented by rival teams, and developing effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we propose TacticAI, an AI football tactics assistant developed and evaluated in close collaboration with domain experts from Liverpool FC. We focus on analysing corner kicks, as they offer coaches the most direct opportunities for interventions and improvements. TacticAI incorporates both a predictive and a generative component, allowing the coaches to effectively sample and explore alternative player setups for each corner kick routine and to select those with the highest predicted likelihood of success. We validate TacticAI on a number of relevant benchmark tasks: predicting receivers and shot attempts and recommending player position adjustments. The utility of TacticAI is validated by a qualitative study conducted with football domain experts at Liverpool FC. We show that TacticAI's model suggestions are not only indistinguishable from real tactics, but also favoured over existing tactics 90% of the time, and that TacticAI offers an effective corner kick retrieval system. TacticAI achieves these results despite the limited availability of gold-standard data, achieving data efficiency through geometric deep learning. △ Less

Submitted 17 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: 32 pages, 10 figures

arXiv:2302.06607 [pdf, other]

Generative Adversarial Equilibrium Solvers

Authors: Denizalp Goktas, David C. Parkes, Ian Gemp, Luke Marris, Georgios Piliouras, Romuald Elie, Guy Lever, Andrea Tacchetti

Abstract: We introduce the use of generative adversarial learning to compute equilibria in general game-theoretic settings, specifically the generalized Nash equilibrium (GNE) in pseudo-games, and its specific instantiation as the competitive equilibrium (CE) in Arrow-Debreu competitive economies. Pseudo-games are a generalization of games in which players' actions affect not only the payoffs of other playe… ▽ More We introduce the use of generative adversarial learning to compute equilibria in general game-theoretic settings, specifically the generalized Nash equilibrium (GNE) in pseudo-games, and its specific instantiation as the competitive equilibrium (CE) in Arrow-Debreu competitive economies. Pseudo-games are a generalization of games in which players' actions affect not only the payoffs of other players but also their feasible action spaces. Although the computation of GNE and CE is intractable in the worst-case, i.e., PPAD-hard, in practice, many applications only require solutions with high accuracy in expectation over a distribution of problem instances. We introduce Generative Adversarial Equilibrium Solvers (GAES): a family of generative adversarial neural networks that can learn GNE and CE from only a sample of problem instances. We provide computational and sample complexity bounds, and apply the framework to finding Nash equilibria in normal-form games, CE in Arrow-Debreu competitive economies, and GNE in an environmental economic model of the Kyoto mechanism. △ Less

Submitted 20 February, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: 41 pages, 13 figures

arXiv:2209.10958 [pdf, ps, other]

Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments

Authors: Ian Gemp, Thomas Anthony, Yoram Bachrach, Avishkar Bhoopchand, Kalesha Bullard, Jerome Connor, Vibhavari Dasagi, Bart De Vylder, Edgar Duenez-Guzman, Romuald Elie, Richard Everett, Daniel Hennes, Edward Hughes, Mina Khan, Marc Lanctot, Kate Larson, Guy Lever, Siqi Liu, Luke Marris, Kevin R. McKee, Paul Muller, Julien Perolat, Florian Strub, Andrea Tacchetti, Eugene Tarassov , et al. (2 additional authors not shown)

Abstract: The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in d… ▽ More The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in deep reinforcement learning to explore multi-agent systems in complex environments and use these benchmarks to advance our understanding. Here, we summarise the recent work of our team and present a taxonomy that we feel highlights many important open challenges in multi-agent research. △ Less

Submitted 22 September, 2022; originally announced September 2022.

Comments: Published in AI Communications 2022

arXiv:2208.10138 [pdf, other]

Learning Correlated Equilibria in Mean-Field Games

Authors: Paul Muller, Romuald Elie, Mark Rowland, Mathieu Lauriere, Julien Perolat, Sarah Perrin, Matthieu Geist, Georgios Piliouras, Olivier Pietquin, Karl Tuyls

Abstract: The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-… ▽ More The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-Field games, an approximation of anonymous $N$-player games, where the number of players is infinite and the population's state distribution, instead of every individual player's state, is the object of interest. The practical computability of Mean-Field Nash equilibria, the most studied Mean-Field equilibrium to date, however, typically depends on beneficial non-generic structural properties such as monotonicity or contraction properties, which are required for known algorithms to converge. In this work, we provide an alternative route for studying Mean-Field games, by developing the concepts of Mean-Field correlated and coarse-correlated equilibria. We show that they can be efficiently learnt in \emph{all games}, without requiring any additional assumption on the structure of the game, using three classical algorithms. Furthermore, we establish correspondences between our notions and those already present in the literature, derive optimality bounds for the Mean-Field - $N$-player transition, and empirically demonstrate the convergence of these algorithms on simple games. △ Less

Submitted 22 August, 2022; originally announced August 2022.

arXiv:2206.15378 [pdf, other]

doi 10.1126/science.add4679

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Authors: Julien Perolat, Bart de Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot , et al. (9 additional authors not shown)

Abstract: We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additiona… ▽ More We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players. △ Less

Submitted 30 June, 2022; originally announced June 2022.

arXiv:2205.12944 [pdf, other]

Learning in Mean Field Games: A Survey

Authors: Mathieu Laurière, Sarah Perrin, Julien Pérolat, Sertan Girgin, Paul Muller, Romuald Élie, Matthieu Geist, Olivier Pietquin

Abstract: Non-cooperative and cooperative games with a very large number of players have many applications but remain generally intractable when the number of players increases. Introduced by Lasry and Lions, and Huang, Caines and Malhamé, Mean Field Games (MFGs) rely on a mean-field approximation to allow the number of players to grow to infinity. Traditional methods for solving these games generally rely… ▽ More Non-cooperative and cooperative games with a very large number of players have many applications but remain generally intractable when the number of players increases. Introduced by Lasry and Lions, and Huang, Caines and Malhamé, Mean Field Games (MFGs) rely on a mean-field approximation to allow the number of players to grow to infinity. Traditional methods for solving these games generally rely on solving partial or stochastic differential equations with a full knowledge of the model. Recently, Reinforcement Learning (RL) has appeared promising to solve complex problems at scale. The combination of RL and MFGs is promising to solve games at a very large scale both in terms of population size and environment complexity. In this survey, we review the quickly growing recent literature on RL methods to learn equilibria and social optima in MFGs. We first identify the most common settings (static, stationary, and evolutive) of MFGs. We then present a general framework for classical iterative methods (based on best-response computation or policy evaluation) to solve MFGs in an exact way. Building on these algorithms and the connection with Markov Decision Processes, we explain how RL can be used to learn MFG solutions in a model-free way. Last, we present numerical illustrations on a benchmark problem, and conclude with some perspectives. △ Less

Submitted 26 July, 2024; v1 submitted 25 May, 2022; originally announced May 2022.

arXiv:2203.11973 [pdf, other]

Scalable Deep Reinforcement Learning Algorithms for Mean Field Games

Authors: Mathieu Laurière, Sarah Perrin, Sertan Girgin, Paul Muller, Ayush Jain, Theophile Cabannes, Georgios Piliouras, Julien Pérolat, Romuald Élie, Olivier Pietquin, Matthieu Geist

Abstract: Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quant… ▽ More Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values. This is far from being trivial in the case of non-linear function approximation that enjoy good generalization properties, e.g. neural networks. We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm. The second one is an online mixing method based on regularization that does not require memorizing historical data or previous estimates. It is used to extend Online Mirror Descent. We demonstrate numerically that these methods efficiently enable the use of Deep RL algorithms to solve various MFGs. In addition, we show that these methods outperform SotA baselines from the literature. △ Less

Submitted 17 June, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

arXiv:2112.09466 [pdf, other]

Fair Active Learning: Solving the Labeling Problem in Insurance

Authors: Romuald Elie, Caroline Hillairet, François Hu, Marc Juillard

Abstract: This paper addresses significant obstacles that arise from the widespread use of machine learning models in the insurance industry, with a specific focus on promoting fairness. The initial challenge lies in effectively leveraging unlabeled data in insurance while reducing the labeling effort and emphasizing data relevance through active learning techniques. The paper explores various active learni… ▽ More This paper addresses significant obstacles that arise from the widespread use of machine learning models in the insurance industry, with a specific focus on promoting fairness. The initial challenge lies in effectively leveraging unlabeled data in insurance while reducing the labeling effort and emphasizing data relevance through active learning techniques. The paper explores various active learning sampling methodologies and evaluates their impact on both synthetic and real insurance datasets. This analysis highlights the difficulty of achieving fair model inferences, as machine learning models may replicate biases and discrimination found in the underlying data. To tackle these interconnected challenges, the paper introduces an innovative fair active learning method. The proposed approach samples informative and fair instances, achieving a good balance between model predictive performance and fairness, as confirmed by numerical experiments on insurance datasets. △ Less

Submitted 20 May, 2024; v1 submitted 17 December, 2021; originally announced December 2021.

arXiv:2111.08350 [pdf, other]

Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO

Authors: Paul Muller, Mark Rowland, Romuald Elie, Georgios Piliouras, Julien Perolat, Mathieu Lauriere, Raphael Marinier, Olivier Pietquin, Karl Tuyls

Abstract: Recent advances in multiagent learning have seen the introduction ofa family of algorithms that revolve around the population-based trainingmethod PSRO, showing convergence to Nash, correlated and coarse corre-lated equilibria. Notably, when the number of agents increases, learningbest-responses becomes exponentially more difficult, and as such ham-pers PSRO training methods. The paradigm of mean-… ▽ More Recent advances in multiagent learning have seen the introduction ofa family of algorithms that revolve around the population-based trainingmethod PSRO, showing convergence to Nash, correlated and coarse corre-lated equilibria. Notably, when the number of agents increases, learningbest-responses becomes exponentially more difficult, and as such ham-pers PSRO training methods. The paradigm of mean-field games pro-vides an asymptotic solution to this problem when the considered gamesare anonymous-symmetric. Unfortunately, the mean-field approximationintroduces non-linearities which prevent a straightforward adaptation ofPSRO. Building upon optimization and adversarial regret minimization,this paper sidesteps this issue and introduces mean-field PSRO, an adap-tation of PSRO which learns Nash, coarse correlated and correlated equi-libria in mean-field games. The key is to replace the exact distributioncomputation step by newly-defined mean-field no-adversarial-regret learn-ers, or by black-box optimization. We compare the asymptotic complexityof the approach to standard PSRO, greatly improve empirical bandit con-vergence speed by compressing temporal mixture weights, and ensure itis theoretically robust to payoff noise. Finally, we illustrate the speed andaccuracy of mean-field PSRO on several mean-field games, demonstratingconvergence to strong and weak equilibria. △ Less

Submitted 29 August, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

Comments: AAMAS

arXiv:2110.11943 [pdf, other]

Solving N-player dynamic routing games with congestion: a mean field approach

Authors: Theophile Cabannes, Mathieu Lauriere, Julien Perolat, Raphael Marinier, Sertan Girgin, Sarah Perrin, Olivier Pietquin, Alexandre M. Bayen, Eric Goubault, Romuald Elie

Abstract: The recent emergence of navigational tools has changed traffic patterns and has now enabled new types of congestion-aware routing control like dynamic road pricing. Using the fundamental diagram of traffic flows - applied in macroscopic and mesoscopic traffic modeling - the article introduces a new N-player dynamic routing game with explicit congestion dynamics. The model is well-posed and can rep… ▽ More The recent emergence of navigational tools has changed traffic patterns and has now enabled new types of congestion-aware routing control like dynamic road pricing. Using the fundamental diagram of traffic flows - applied in macroscopic and mesoscopic traffic modeling - the article introduces a new N-player dynamic routing game with explicit congestion dynamics. The model is well-posed and can reproduce heterogeneous departure times and congestion spill back phenomena. However, as Nash equilibrium computations are PPAD-complete, solving the game becomes intractable for large but realistic numbers of vehicles N. Therefore, the corresponding mean field game is also introduced. Experiments were performed on several classical benchmark networks of the traffic community: the Pigou, Braess, and Sioux Falls networks with heterogeneous origin, destination and departure time tuples. The Pigou and the Braess examples reveal that the mean field approximation is generally very accurate and computationally efficient as soon as the number of vehicles exceeds a few dozen. On the Sioux Falls network (76 links, 100 time steps), this approach enables learning traffic dynamics with more than 14,000 vehicles. △ Less

Submitted 27 October, 2021; v1 submitted 22 October, 2021; originally announced October 2021.

arXiv:2109.13642 [pdf, other]

Fairness guarantee in multi-class classification

Authors: Christophe Denis, Romuald Elie, Mohamed Hebiri, François Hu

Abstract: Algorithmic Fairness is an established area of machine learning, willing to reduce the influence of hidden bias in the data. Yet, despite its wide range of applications, very few works consider the multi-class classification setting from the fairness perspective. We focus on this question and extend the definition of approximate fairness in the case of Demographic Parity to multi-class classificat… ▽ More Algorithmic Fairness is an established area of machine learning, willing to reduce the influence of hidden bias in the data. Yet, despite its wide range of applications, very few works consider the multi-class classification setting from the fairness perspective. We focus on this question and extend the definition of approximate fairness in the case of Demographic Parity to multi-class classification. We specify the corresponding expressions of the optimal fair classifiers. This suggests a plug-in data-driven procedure, for which we establish theoretical guarantees. The enhanced estimator is proved to mimic the behavior of the optimal rule both in terms of fairness and risk. Notably, fairness guarantees are distribution-free. The approach is evaluated on both synthetic and real datasets and reveals very effective in decision making with a preset level of unfairness. In addition, our method is competitive (if not better) with the state-of-the-art in binary and multi-class tasks. △ Less

Submitted 10 March, 2023; v1 submitted 28 September, 2021; originally announced September 2021.

arXiv:2109.09717 [pdf, other]

Generalization in Mean Field Games by Learning Master Policies

Authors: Sarah Perrin, Mathieu Laurière, Julien Pérolat, Romuald Élie, Matthieu Geist, Olivier Pietquin

Abstract: Mean Field Games (MFGs) can potentially scale multi-agent systems to extremely large populations of agents. Yet, most of the literature assumes a single initial distribution for the agents, which limits the practical applications of MFGs. Machine Learning has the potential to solve a wider diversity of MFG problems thanks to generalizations capacities. We study how to leverage these generalization… ▽ More Mean Field Games (MFGs) can potentially scale multi-agent systems to extremely large populations of agents. Yet, most of the literature assumes a single initial distribution for the agents, which limits the practical applications of MFGs. Machine Learning has the potential to solve a wider diversity of MFG problems thanks to generalizations capacities. We study how to leverage these generalization properties to learn policies enabling a typical agent to behave optimally against any population distribution. In reference to the Master equation in MFGs, we coin the term ``Master policies'' to describe them and we prove that a single Master policy provides a Nash equilibrium, whatever the initial distribution. We propose a method to learn such Master policies. Our approach relies on three ingredients: adding the current population distribution as part of the observation, approximating Master policies with neural networks, and training via Reinforcement Learning and Fictitious Play. We illustrate on numerical examples not only the efficiency of the learned Master policy but also its generalization capabilities beyond the distributions used for training. △ Less

Submitted 20 September, 2021; originally announced September 2021.

arXiv:2106.14668 [pdf, other]

Evolutionary Dynamics and $Φ$-Regret Minimization in Games

Authors: Georgios Piliouras, Mark Rowland, Shayegan Omidshafiei, Romuald Elie, Daniel Hennes, Jerome Connor, Karl Tuyls

Abstract: Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. Regret quantifies the difference between a learner's performance against a baseline in hindsight. It is well-known that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret u… ▽ More Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. Regret quantifies the difference between a learner's performance against a baseline in hindsight. It is well-known that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret used in game theory predominantly consider baselines that permit deviations to deterministic actions or strategies. In this paper, we revisit our understanding of regret from the perspective of deviations over partitions of the full \emph{mixed} strategy space (i.e., probability distributions over pure strategies), under the lens of the previously-established $Φ$-regret framework, which provides a continuum of stronger regret measures. Importantly, $Φ$-regret enables learning agents to consider deviations from and to mixed strategies, generalizing several existing notions of regret such as external, internal, and swap regret, and thus broadening the insights gained from regret-based analysis of learning algorithms. We prove here that the well-studied evolutionary learning algorithm of replicator dynamics (RD) seamlessly minimizes the strongest possible form of $Φ$-regret in generic $2 \times 2$ games, without any modification of the underlying algorithm itself. We subsequently conduct experiments validating our theoretical results in a suite of 144 $2 \times 2$ games wherein RD exhibits a diverse set of behaviors. We conclude by providing empirical evidence of $Φ$-regret minimization by RD in some larger games, hinting at further opportunity for $Φ$-regret based study of such algorithms from both a theoretical and empirical perspective. △ Less

Submitted 28 June, 2021; originally announced June 2021.

arXiv:2106.04219 [pdf, other]

Time-series Imputation of Temporally-occluded Multiagent Trajectories

Authors: Shayegan Omidshafiei, Daniel Hennes, Marta Garnelo, Eugene Tarassov, Zhe Wang, Romuald Elie, Jerome T. Connor, Paul Muller, Ian Graham, William Spearman, Karl Tuyls

Abstract: In multiagent environments, several decision-making individuals interact while adhering to the dynamics constraints imposed by the environment. These interactions, combined with the potential stochasticity of the agents' decision-making processes, make such systems complex and interesting to study from a dynamical perspective. Significant research has been conducted on learning models for forward-… ▽ More In multiagent environments, several decision-making individuals interact while adhering to the dynamics constraints imposed by the environment. These interactions, combined with the potential stochasticity of the agents' decision-making processes, make such systems complex and interesting to study from a dynamical perspective. Significant research has been conducted on learning models for forward-direction estimation of agent behaviors, for example, pedestrian predictions used for collision-avoidance in self-driving cars. However, in many settings, only sporadic observations of agents may be available in a given trajectory sequence. For instance, in football, subsets of players may come in and out of view of broadcast video footage, while unobserved players continue to interact off-screen. In this paper, we study the problem of multiagent time-series imputation, where available past and future observations of subsets of agents are used to estimate missing observations for other agents. Our approach, called the Graph Imputer, uses forward- and backward-information in combination with graph networks and variational autoencoders to enable learning of a distribution of imputed trajectories. We evaluate our approach on a dataset of football matches, using a projective camera module to train and evaluate our model for the off-screen player state estimation setting. We illustrate that our method outperforms several state-of-the-art approaches, including those hand-crafted for football. △ Less

Submitted 8 June, 2021; originally announced June 2021.

arXiv:2106.03787 [pdf, other]

Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint

Authors: Matthieu Geist, Julien Pérolat, Mathieu Laurière, Romuald Elie, Sarah Perrin, Olivier Bachem, Rémi Munos, Olivier Pietquin

Abstract: Concave Utility Reinforcement Learning (CURL) extends RL from linear to concave utilities in the occupancy measure induced by the agent's policy. This encompasses not only RL but also imitation learning and exploration, among others. Yet, this more general paradigm invalidates the classical Bellman equations, and calls for new algorithms. Mean-field Games (MFGs) are a continuous approximation of m… ▽ More Concave Utility Reinforcement Learning (CURL) extends RL from linear to concave utilities in the occupancy measure induced by the agent's policy. This encompasses not only RL but also imitation learning and exploration, among others. Yet, this more general paradigm invalidates the classical Bellman equations, and calls for new algorithms. Mean-field Games (MFGs) are a continuous approximation of many-agent RL. They consider the limit case of a continuous distribution of identical agents, anonymous with symmetric interests, and reduce the problem to the study of a single representative agent in interaction with the full population. Our core contribution consists in showing that CURL is a subclass of MFGs. We think this important to bridge together both communities. It also allows to shed light on aspects of both fields: we show the equivalence between concavity in CURL and monotonicity in the associated MFG, between optimality conditions in CURL and Nash equilibrium in MFG, or that Fictitious Play (FP) for this class of MFGs is simply Frank-Wolfe, bringing the first convergence rate for discrete-time FP for MFGs. We also experimentally demonstrate that, using algorithms recently introduced for solving MFGs, we can address the CURL problem more efficiently. △ Less

Submitted 16 February, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: AAMAS 2022

arXiv:2105.07933 [pdf, other]

Mean Field Games Flock! The Reinforcement Learning Way

Authors: Sarah Perrin, Mathieu Laurière, Julien Pérolat, Matthieu Geist, Romuald Élie, Olivier Pietquin

Abstract: We present a method enabling a large number of agents to learn how to flock, which is a natural behavior observed in large populations of animals. This problem has drawn a lot of interest but requires many structural assumptions and is tractable only in small dimensions. We phrase this problem as a Mean Field Game (MFG), where each individual chooses its acceleration depending on the population be… ▽ More We present a method enabling a large number of agents to learn how to flock, which is a natural behavior observed in large populations of animals. This problem has drawn a lot of interest but requires many structural assumptions and is tractable only in small dimensions. We phrase this problem as a Mean Field Game (MFG), where each individual chooses its acceleration depending on the population behavior. Combining Deep Reinforcement Learning (RL) and Normalizing Flows (NF), we obtain a tractable solution requiring only very weak assumptions. Our algorithm finds a Nash Equilibrium and the agents adapt their velocity to match the neighboring flock's average one. We use Fictitious Play and alternate: (1) computing an approximate best response with Deep RL, and (2) estimating the next population distribution with NF. We show numerically that our algorithm learn multi-group or high-dimensional flocking with obstacles. △ Less

Submitted 17 May, 2021; originally announced May 2021.

arXiv:2103.00623 [pdf, other]

Scaling up Mean Field Games with Online Mirror Descent

Authors: Julien Perolat, Sarah Perrin, Romuald Elie, Mathieu Laurière, Georgios Piliouras, Matthieu Geist, Karl Tuyls, Olivier Pietquin

Abstract: We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD). We show that continuous-time OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions. This theoretical result nicely extends to multi-population games and to settings involving common noise. A thorough experimental investigation on vari… ▽ More We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD). We show that continuous-time OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions. This theoretical result nicely extends to multi-population games and to settings involving common noise. A thorough experimental investigation on various single and multi-population MFGs shows that OMD outperforms traditional algorithms such as Fictitious Play (FP). We empirically show that OMD scales up and converges significantly faster than FP by solving, for the first time to our knowledge, examples of MFGs with hundreds of billions states. This study establishes the state-of-the-art for learning in large-scale multi-agent and multi-population games. △ Less

Submitted 28 February, 2021; originally announced March 2021.

arXiv:2102.05313 [pdf, other]

Conditional Loss and Deep Euler Scheme for Time Series Generation

Authors: Carl Remlinger, Joseph Mikael, Romuald Elie

Abstract: We introduce three new generative models for time series that are based on Euler discretization of Stochastic Differential Equations (SDEs) and Wasserstein metrics. Two of these methods rely on the adaptation of generative adversarial networks (GANs) to time series. The third algorithm, called Conditional Euler Generator (CEGEN), minimizes a dedicated distance between the transition probability di… ▽ More We introduce three new generative models for time series that are based on Euler discretization of Stochastic Differential Equations (SDEs) and Wasserstein metrics. Two of these methods rely on the adaptation of generative adversarial networks (GANs) to time series. The third algorithm, called Conditional Euler Generator (CEGEN), minimizes a dedicated distance between the transition probability distributions over all time steps. In the context of Ito processes, we provide theoretical guarantees that minimizing this criterion implies accurate estimations of the drift and volatility parameters. We demonstrate empirically that CEGEN outperforms state-of-the-art and GAN generators on both marginal and temporal dynamics metrics. Besides, it identifies accurate correlation structures in high dimension. When few data points are available, we verify the effectiveness of CEGEN, when combined with transfer learning methods on Monte Carlo simulations. Finally, we illustrate the robustness of our method on various real-world datasets. △ Less

Submitted 6 October, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

Comments: 14 page, 9 Figures

arXiv:2011.09192 [pdf, other]

Game Plan: What AI can do for Football, and What Football can do for AI

Authors: Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adria Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Perolat, Bart De Vylder , et al. (11 additional authors not shown)

Abstract: The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with t… ▽ More The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with the goal of better addressing new scientific challenges involved in the analysis of both individual players' and coordinated teams' behaviors. The research challenges associated with predictive and prescriptive football analytics require new developments and progress at the intersection of statistical learning, game theory, and computer vision. In this paper, we provide an overarching perspective highlighting how the combination of these fields, in particular, forms a unique microcosm for AI research, while offering mutual benefits for professional teams, spectators, and broadcasters in the years to come. We illustrate that this duality makes football analytics a game changer of tremendous value, in terms of not only changing the game of football itself, but also in terms of what this domain can mean for the field of AI. We review the state-of-the-art and exemplify the types of analysis enabled by combining the aforementioned fields, including illustrative examples of counterfactual analysis using predictive models, and the combination of game-theoretic analysis of penalty kicks with statistical learning of player attributes. We conclude by highlighting envisioned downstream impacts, including possibilities for extensions to other sports (real and virtual). △ Less

Submitted 18 November, 2020; originally announced November 2020.

arXiv:2007.03458 [pdf, other]

Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications

Authors: Sarah Perrin, Julien Perolat, Mathieu Laurière, Matthieu Geist, Romuald Elie, Olivier Pietquin

Abstract: In this paper, we deepen the analysis of continuous time Fictitious Play learning algorithm to the consideration of various finite state Mean Field Game settings (finite horizon, $γ$-discounted), allowing in particular for the introduction of an additional common noise. We first present a theoretical convergence analysis of the continuous time Fictitious Play process and prove that the induced e… ▽ More In this paper, we deepen the analysis of continuous time Fictitious Play learning algorithm to the consideration of various finite state Mean Field Game settings (finite horizon, $γ$-discounted), allowing in particular for the introduction of an additional common noise. We first present a theoretical convergence analysis of the continuous time Fictitious Play process and prove that the induced exploitability decreases at a rate $O(\frac{1}{t})$. Such analysis emphasizes the use of exploitability as a relevant metric for evaluating the convergence towards a Nash equilibrium in the context of Mean Field Games. These theoretical contributions are supported by numerical experiments provided in either model-based or model-free settings. We provide hereby for the first time converging learning dynamics for Mean Field Games in the presence of common noise. △ Less

Submitted 26 October, 2020; v1 submitted 5 July, 2020; originally announced July 2020.

arXiv:2005.06526 [pdf, other]

COVID-19 pandemic control: balancing detection policy and lockdown intervention under ICU sustainability

Authors: Arthur Charpentier, Romuald Elie, Mathieu Laurière, Viet Chi Tran

Abstract: We consider here an extended SIR model, including several features of the recent COVID-19 outbreak: in particular the infected and recovered individuals can either be detected (+) or undetected (-) and we also integrate an intensive care unit (ICU) capacity. Our model enables a tractable quantitative analysis of the optimal policy for the control of the epidemic dynamics using both lockdown and de… ▽ More We consider here an extended SIR model, including several features of the recent COVID-19 outbreak: in particular the infected and recovered individuals can either be detected (+) or undetected (-) and we also integrate an intensive care unit (ICU) capacity. Our model enables a tractable quantitative analysis of the optimal policy for the control of the epidemic dynamics using both lockdown and detection intervention levers. With parametric specification based on literature on COVID-19, we investigate the sensitivities of various quantities on the optimal strategies, taking into account the subtle trade-off between the sanitary and the socio-economic cost of the pandemic, together with the limited capacity level of ICU. We identify the optimal lockdown policy as an intervention structured in 4 successive phases: First a quick and strong lockdown intervention to stop the exponential growth of the contagion; second a short transition phase to reduce the prevalence of the virus; third a long period with full ICU capacity and stable virus prevalence; finally a return to normal social interactions with disappearance of the virus. The optimal scenario hereby avoids the second wave of infection, provided the lockdown is released sufficiently slowly. We also provide optimal intervention measures with increasing ICU capacity, as well as optimization over the effort on detection of infectious and immune individuals. Whenever massive resources are introduced to detect infected individuals, the pressure on social distancing can be released, whereas the impact of detection of immune individuals reveals to be more moderate. △ Less

Submitted 21 May, 2020; v1 submitted 13 May, 2020; originally announced May 2020.

MSC Class: 49N90; 92D30; 34H05

arXiv:2004.08221 [pdf, other]

doi 10.1051/mmnp/2020022

Contact rate epidemic control of COVID-19: an equilibrium view

Authors: Romuald Elie, Emma Hubert, Gabriel Turinici

Abstract: We consider the control of the COVID-19 pandemic through a standard SIR compartmental model. This control is induced by the aggregation of individuals' decisions to limit their social interactions: when the epidemic is ongoing, an individual can diminish his/her contact rate in order to avoid getting infected, but this effort comes at a social cost. If each individual lowers his/her contact rate,… ▽ More We consider the control of the COVID-19 pandemic through a standard SIR compartmental model. This control is induced by the aggregation of individuals' decisions to limit their social interactions: when the epidemic is ongoing, an individual can diminish his/her contact rate in order to avoid getting infected, but this effort comes at a social cost. If each individual lowers his/her contact rate, the epidemic vanishes faster, but the effort cost may be high. A Mean Field Nash equilibrium at the population level is formed, resulting in a lower effective transmission rate of the virus. We prove theoretically that equilibrium exists and compute it numerically. However, this equilibrium selects a sub-optimal solution in comparison to the societal optimum (a centralized decision respected fully by all individuals), meaning that the cost of anarchy is strictly positive. We provide numerical examples and a sensitivity analysis, as well as an extension to a SEIR compartmental model to account for the relatively long latent phase of the COVID-19 disease. In all the scenarii considered, the divergence between the individual and societal strategies happens both before the peak of the epidemic, due to individuals' fears, and after, when a significant propagation is still underway. △ Less

Submitted 10 May, 2020; v1 submitted 17 April, 2020; originally announced April 2020.

MSC Class: 92D30; 92Bxx; 91A16

arXiv:2003.10014 [pdf, other]

Reinforcement Learning in Economics and Finance

Authors: Arthur Charpentier, Romuald Elie, Carl Remlinger

Abstract: Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent picks an action, he can not infer ex-post the rewards… ▽ More Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent picks an action, he can not infer ex-post the rewards induced by other action choices. In reinforcement learning, his actions have consequences: they influence not only rewards, but also future states of the world. The goal of reinforcement learning is to find an optimal policy -- a mapping from the states of the world to the set of actions, in order to maximize cumulative reward, which is a long term strategy. Exploring might be sub-optimal on a short-term horizon but could lead to optimal long-term ones. Many problems of optimal control, popular in economics for more than forty years, can be expressed in the reinforcement learning framework, and recent advances in computational science, provided in particular by deep learning algorithms, can be used by economists in order to solve complex behavioral problems. In this article, we propose a state-of-the-art of reinforcement learning techniques, and present applications in economics, game theory, operation research and finance. △ Less

Submitted 22 March, 2020; originally announced March 2020.

arXiv:2001.10206 [pdf, other]

Large Banking Systems with Default and Recovery: A Mean Field Game Model

Authors: Romuald Élie, Tomoyuki Ichiba, Mathieu Laurière

Abstract: We consider a mean-field model for large banking systems, which takes into account default and recovery of the institutions. Building on models used for groups of interacting neurons, we first study a McKean-Vlasov dynamics and its evolutionary Fokker-Planck equation in which the mean-field interactions occur through a mean-reverting term and through a hitting time corresponding to a default level… ▽ More We consider a mean-field model for large banking systems, which takes into account default and recovery of the institutions. Building on models used for groups of interacting neurons, we first study a McKean-Vlasov dynamics and its evolutionary Fokker-Planck equation in which the mean-field interactions occur through a mean-reverting term and through a hitting time corresponding to a default level. The latter feature reflects the impact of a financial institution's default on the global distribution of reserves in the banking system. The systemic risk problem of financial institutions is understood as a blow-up phenomenon of the Fokker-Planck equation. Then, we incorporate in the model an optimization component by letting the institutions control part of their dynamics in order to minimize their expected risk. Phrasing this optimization problem as a mean-field game, we provide an explicit solution in a special case and, in the general case, we report numerical experiments based on a finite difference scheme. △ Less

Submitted 28 January, 2020; originally announced January 2020.

arXiv:1911.06079 [pdf, ps, other]

Mean-field reflected backward stochastic differential equations

Authors: Boualem Djehiche, Romuald Elie, Said Hamadène

Abstract: In this paper, we study a class of reflected backward stochastic differential equations (BSDEs) of mean-field type, where the mean-field interaction in terms of the distribution of the $Y$-component of the solution enters in both the driver and the lower obstacle. We consider in details the case where the lower obstacle is a deterministic function of $(Y,\E[Y])$ and discuss the more general depend… ▽ More In this paper, we study a class of reflected backward stochastic differential equations (BSDEs) of mean-field type, where the mean-field interaction in terms of the distribution of the $Y$-component of the solution enters in both the driver and the lower obstacle. We consider in details the case where the lower obstacle is a deterministic function of $(Y,\E[Y])$ and discuss the more general dependence on the distribution of $Y$. Under mild Lipschitz and integrability conditions on the coefficients, we obtain the well-posedness of such a class of equations. Under further monotonicity conditions, we show convergence of the standard penalization scheme to the solution of the equation, which hence satisfies a minimality property. This class of equations is motivated by applications in pricing life insurance contracts with surrender options. △ Less

Submitted 14 November, 2019; originally announced November 2019.

arXiv:1907.02633 [pdf, other]

On the Convergence of Model Free Learning in Mean Field Games

Authors: Romuald Elie, Julien Pérolat, Mathieu Laurière, Matthieu Geist, Olivier Pietquin

Abstract: Learning by experience in Multi-Agent Systems (MAS) is a difficult and exciting task, due to the lack of stationarity of the environment, whose dynamics evolves as the population learns. In order to design scalable algorithms for systems with a large population of interacting agents (e.g. swarms), this paper focuses on Mean Field MAS, where the number of agents is asymptotically infinite. Recently… ▽ More Learning by experience in Multi-Agent Systems (MAS) is a difficult and exciting task, due to the lack of stationarity of the environment, whose dynamics evolves as the population learns. In order to design scalable algorithms for systems with a large population of interacting agents (e.g. swarms), this paper focuses on Mean Field MAS, where the number of agents is asymptotically infinite. Recently, a very active burgeoning field studies the effects of diverse reinforcement learning algorithms for agents with no prior information on a stationary Mean Field Game (MFG) and learn their policy through repeated experience. We adopt a high perspective on this problem and analyze in full generality the convergence of a fictitious iterative scheme using any single agent learning algorithm at each step. We quantify the quality of the computed approximate Nash equilibrium, in terms of the accumulated errors arising at each learning iteration step. Notably, we show for the first time convergence of model free learning algorithms towards non-stationary MFG equilibria, relying only on classical assumptions on the MFG dynamics. We illustrate our theoretical results with a numerical experiment in a continuous action-space environment, where the approximate best response of the iterative fictitious play scheme is computed with a deep RL algorithm. △ Less

Submitted 20 February, 2020; v1 submitted 4 July, 2019; originally announced July 2019.

Journal ref: AAAI 2020 conference proceedings

arXiv:1902.10405 [pdf, other]

Mean-field moral hazard for optimal energy demand response management

Authors: Romuald Elie, Emma Hubert, Thibaut Mastrolia, Dylan Possamaï

Abstract: We study the problem of demand response contracts in electricity markets by quantifying the impact of considering a mean-field of consumers, whose consumption is impacted by a common noise. We formulate the problem as a Principal-Agent problem with moral hazard in which the Principal - she - is an electricity producer who observes continuously the consumption of a continuum of risk-averse consumer… ▽ More We study the problem of demand response contracts in electricity markets by quantifying the impact of considering a mean-field of consumers, whose consumption is impacted by a common noise. We formulate the problem as a Principal-Agent problem with moral hazard in which the Principal - she - is an electricity producer who observes continuously the consumption of a continuum of risk-averse consumers, and designs contracts in order to reduce her production costs. More precisely, the producer incentivises the consumers to reduce the average and the volatility of their consumption in different usages, without observing the efforts they make. We prove that the producer can benefit from considering the mean-field of consumers by indexing contracts on the consumption of one Agent and aggregate consumption statistics from the distribution of the entire population of consumers. In the case of linear energy valuation, we provide closed-form expression for this new type of optimal contracts that maximises the utility of the producer. In most cases, we show that this new type of contracts allows the Principal to choose the risks she wants to bear, and to reduce the problem at hand to an uncorrelated one. △ Less

Submitted 24 March, 2020; v1 submitted 27 February, 2019; originally announced February 2019.

Comments: 54 pages, 7 figures

arXiv:1708.05957 [pdf, ps, other]

A new Mertens decomposition of $\mathscr{Y}^{g,ξ}$-submartingale systems. Application to BSDEs with weak constraints at stopping times

Authors: Roxana Dumitrescu, Romuald Elie, Wissal Sabbagh, Chao Zhou

Abstract: We first introduce the concept of $\mathscr{Y}^{g,ξ}$-submartingale systems, where the nonlinear operator $\mathscr{Y}^{g,ξ}$ corresponds to the first component of the solution of a reflected BSDE with generator $g$ and lower obstacle $ξ$. We first show that, in the case of a left-limited right-continuous obstacle, any $\mathscr{Y}^{g,ξ}$-submartingale system can be aggregated by a process which i… ▽ More We first introduce the concept of $\mathscr{Y}^{g,ξ}$-submartingale systems, where the nonlinear operator $\mathscr{Y}^{g,ξ}$ corresponds to the first component of the solution of a reflected BSDE with generator $g$ and lower obstacle $ξ$. We first show that, in the case of a left-limited right-continuous obstacle, any $\mathscr{Y}^{g,ξ}$-submartingale system can be aggregated by a process which is right-lower semicontinuous. We then prove a \textit{Mertens decomposition}, by using an original approach which does not make use of the standard penalization technique. These results are in particular useful for the treatment of control/stopping game problems and, to the best of our knowledge, they are completely new in the literature. As an application, we introduce a new class of \textit{Backward Stochastic Differential Equations (in short BSDEs) with weak constraints at stopping times}, which are related to the partial hedging of American options. We study the wellposedness of such equations and, using the $\mathscr{Y}^{g,ξ}$-Mertens decomposition, we show that the family of minimal time-$t$-values $Y_t$, with $(Y,Z)$ a supersolution of the BSDE with weak constraints, admits a representation in terms of a reflected backward stochastic differential equation. △ Less

Submitted 25 May, 2023; v1 submitted 20 August, 2017; originally announced August 2017.

arXiv:1706.01934 [pdf, other]

An adverse selection approach to power pricing

Authors: Clémence Alasseur, Ivar Ekeland, Romuald Elie, Nicolás Hernández Santibáñez, Dylan Possamaï

Abstract: We study the optimal design of electricity contracts among a population of consumers with different needs. This question is tackled within the framework of Principal-Agent problems in presence of adverse selection. The particular features of electricity induce an unusual structure on the production cost, with no decreasing return to scale. We are nevertheless able to provide an explicit solution f… ▽ More We study the optimal design of electricity contracts among a population of consumers with different needs. This question is tackled within the framework of Principal-Agent problems in presence of adverse selection. The particular features of electricity induce an unusual structure on the production cost, with no decreasing return to scale. We are nevertheless able to provide an explicit solution for the problem at hand. The optimal contracts are either linear or polynomial with respect to the consumption. Whenever the outside options offered by competitors are not uniform among the different type of consumers, we exhibit situations where the electricity provider should contract with consumers with either low or high appetite for electricity. △ Less

Submitted 15 September, 2019; v1 submitted 6 June, 2017; originally announced June 2017.

Comments: 39 pages, 9 figures

arXiv:1701.08861 [pdf, ps, other]

On a class of path-dependent singular stochastic control problems

Authors: Romuald Elie, Ludovic Moreau, Dylan Possamaï

Abstract: This paper studies a class of non$-$Markovian singular stochastic control problems, for which we provide a novel probabilistic representation. The solution of such control problem is proved to identify with the solution of a $Z-$constrained BSDE, with dynamics associated to a non singular underlying forward process. Due to the non$-$Markovian environment, our main argumentation relies on the use o… ▽ More This paper studies a class of non$-$Markovian singular stochastic control problems, for which we provide a novel probabilistic representation. The solution of such control problem is proved to identify with the solution of a $Z-$constrained BSDE, with dynamics associated to a non singular underlying forward process. Due to the non$-$Markovian environment, our main argumentation relies on the use of comparison arguments for path dependent PDEs. Our representation allows in particular to quantify the regularity of the solution to the singular stochastic control problem in terms of the space and time initial data. Our framework also extends to the consideration of degenerate diffusions, leading to the representation of the solution as the infimum of solutions to $Z-$constrained BSDEs. As an application, we study the utility maximisation problem with transaction costs for non$-$Markovian dynamics. △ Less

Submitted 24 February, 2018; v1 submitted 30 January, 2017; originally announced January 2017.

Comments: 33 pages

arXiv:1608.05226 [pdf, ps, other]

A tale of a Principal and many many Agents

Authors: Romuald Elie, Thibaut Mastrolia, Dylan Possamaï

Abstract: In this paper, we investigate a moral hazard problem in finite time with lump$-$sum and continuous payments, involving infinitely many Agents with mean field type interactions, hired by one Principal. By reinterpreting the mean$-$field game faced by each Agent in terms of a mean field forward backward stochastic differential equation (FBSDE for short), we are able to rewrite the Principal's proble… ▽ More In this paper, we investigate a moral hazard problem in finite time with lump$-$sum and continuous payments, involving infinitely many Agents with mean field type interactions, hired by one Principal. By reinterpreting the mean$-$field game faced by each Agent in terms of a mean field forward backward stochastic differential equation (FBSDE for short), we are able to rewrite the Principal's problem as a control problem of McKean$-$Vlasov SDEs. We review one general approache to tackle it, introduced recently in [1, 43, 44, 45, 46] using dynamic programming and Hamilton$-$Jacobi$-$Bellman (HJB for short) equations, and mention a second one based on the stochastic Pontryagin maximum principle, which follows [10]. We solve completely and explicitly the problem in special cases, going beyond the usual linear$-$quadratic framework. We finally show in our examples that the optimal contract in the $N-$players' model converges to the mean$-$field optimal contract when the number of agents goes to $+\infty$, thus illustrating in our specific setting the general results of [8]. △ Less

Submitted 24 February, 2018; v1 submitted 18 August, 2016; originally announced August 2016.

Comments: 38 pages

arXiv:1605.08099 [pdf, ps, other]

Contracting theory with competitive interacting agents

Authors: Romuald Elie, Dylan Possamaï

Abstract: In a framework close to the one developed by Holmström and Milgrom [44], we study the optimal contracting scheme between a Principal and several Agents. Each hired Agent is in charge of one project, and can make efforts towards managing his own project, as well as impact (positively or negatively) the projects of the other Agents. Considering economic Agents in competition with relative performanc… ▽ More In a framework close to the one developed by Holmström and Milgrom [44], we study the optimal contracting scheme between a Principal and several Agents. Each hired Agent is in charge of one project, and can make efforts towards managing his own project, as well as impact (positively or negatively) the projects of the other Agents. Considering economic Agents in competition with relative performance concerns, we derive the optimal contracts in both first best and moral hazard settings. The enhanced resolution methodology relies heavily on the connection between Nash equilibria and multidimensional quadratic BSDEs. The optimal contracts are linear and each agent is paid a fixed proportion of the terminal value of all the projects of the firm. Besides, each Agent receives his reservation utility, and those with high competitive appetence are assigned less volatile projects, and shall even receive help from the other Agents. From the principal point of view, it is in the firm interest in our model to strongly diversify the competitive appetence of the Agents. △ Less

Submitted 25 May, 2016; originally announced May 2016.

Comments: 36 pages

arXiv:1605.06301 [pdf, ps, other]

doi 10.1214/17-AAP1310

BSDEs with mean reflection

Authors: Philippe Briand, Romuald Elie, Ying Hu

Abstract: In this paper, we study a new type of BSDE, where the distribution of the Y-component of the solution is required to satisfy an additional constraint, written in terms of the expectation of a loss function. This constraint is imposed at any deterministic time t and is typically weaker than the classical pointwise one associated to reflected BSDEs. Focusing on solutions (Y, Z, K) with deterministic… ▽ More In this paper, we study a new type of BSDE, where the distribution of the Y-component of the solution is required to satisfy an additional constraint, written in terms of the expectation of a loss function. This constraint is imposed at any deterministic time t and is typically weaker than the classical pointwise one associated to reflected BSDEs. Focusing on solutions (Y, Z, K) with deterministic K, we obtain the well-posedness of such equation, in the presence of a natural Skorokhod type condition. Such condition indeed ensures the minimality of the enhanced solution, under an additional structural condition on the driver. Our results extend to the more general framework where the constraint is written in terms of a static risk measure on Y. In particular, we provide an application to the super hedging of claims under running risk management constraint. △ Less

Submitted 20 May, 2016; originally announced May 2016.

Journal ref: The Annals of Applied Probability 2018, Vol. 24, No. 1, 1129-1171

arXiv:1409.5369 [pdf, ps, other]

Regularity of BSDEs with a convex constraint on the gains-process

Authors: Bruno Bouchard, Romuald Elie, Ludovic Moreau

Abstract: We consider the minimal super-solution of a backward stochastic differential equation with constraint on the gains-process. The terminal condition is given by a function of the terminal value of a forward stochastic differential equation. Under boundedness assumptions on the coefficients, we show that the first component of the solution is Lipschitz in space and 1/2-Hölder in time with respect to… ▽ More We consider the minimal super-solution of a backward stochastic differential equation with constraint on the gains-process. The terminal condition is given by a function of the terminal value of a forward stochastic differential equation. Under boundedness assumptions on the coefficients, we show that the first component of the solution is Lipschitz in space and 1/2-Hölder in time with respect to the initial data of the forward process. Its path is continuous before the time horizon at which its left-limit is given by a face-lifted version of its natural boundary condition. This first component is actually equal to its own face-lift. We only use probabilistic arguments. In particular, our results can be extended to certain non-Markovian settings. △ Less

Submitted 18 September, 2014; originally announced September 2014.

arXiv:1310.1181 [pdf, other]

On the expectation of normalized Brownian functionals up to first hitting times

Authors: Romuald Elie, Mathieu Rosenbaum, Marc Yor

Abstract: Let B be a Brownian motion and T its first hitting time of the level 1. For U a uniform random variable independent of B, we study in depth the distribution of T^{-1/2}B_{UT}, that is the rescaled Brownian motion sampled at uniform time. In particular, we show that this variable is centered. Let B be a Brownian motion and T its first hitting time of the level 1. For U a uniform random variable independent of B, we study in depth the distribution of T^{-1/2}B_{UT}, that is the rescaled Brownian motion sampled at uniform time. In particular, we show that this variable is centered. △ Less

Submitted 4 October, 2013; originally announced October 2013.

arXiv:1307.6020 [pdf, other]

When terminal facelift enforces Delta constraints

Authors: Jean-François Chassagneux, Romuald Elie, Idris Kharroubi

Abstract: This paper deals with the super-replication of non path-dependent European claims under additional convex constraints on the number of shares held in the portfolio. The corresponding super-replication price of a given claim has been widely studied in the literature and its terminal value, which dominates the claim of interest, is the so-called facelift transform of the claim. We investigate under… ▽ More This paper deals with the super-replication of non path-dependent European claims under additional convex constraints on the number of shares held in the portfolio. The corresponding super-replication price of a given claim has been widely studied in the literature and its terminal value, which dominates the claim of interest, is the so-called facelift transform of the claim. We investigate under which conditions the super-replication price and strategy of a large class of claims coincide with the exact replication price and strategy of the facelift transform of this claim. In one dimension, we observe that this property is satisfied for any local volatility model. In any dimension, we exhibit an analytical necessary and sufficient condition for this property, which combines the dynamics of the stock together with the characteristics of the closed convex set of constraints. To obtain this condition, we introduce the notion of first order viability property for linear parabolic PDEs. We investigate in details several practical cases of interest: multidimensional Black Scholes model, non-tradable assets or short selling restrictions. △ Less

Submitted 23 July, 2013; originally announced July 2013.

Comments: 37 pages, 1 figure

MSC Class: 93E20; 91G20; 60H30

arXiv:1210.5364 [pdf, ps, other]

BSDEs with weak terminal condition

Authors: Bruno Bouchard, Romuald Elie, Anthony Réveillac

Abstract: We introduce a new class of Backward Stochastic Differential Equations in which the $T$-terminal value $Y_{T}$ of the solution $(Y,Z)$ is not fixed as a random variable, but only satisfies a weak constraint of the form $E[Ψ(Y_{T})]\ge m$, for some (possibly random) non-decreasing map $Ψ$ and some threshold $m$. We name them \textit{BSDEs with weak terminal condition} and obtain a representation of… ▽ More We introduce a new class of Backward Stochastic Differential Equations in which the $T$-terminal value $Y_{T}$ of the solution $(Y,Z)$ is not fixed as a random variable, but only satisfies a weak constraint of the form $E[Ψ(Y_{T})]\ge m$, for some (possibly random) non-decreasing map $Ψ$ and some threshold $m$. We name them \textit{BSDEs with weak terminal condition} and obtain a representation of the minimal time $t$-values $Y_{t}$ such that $(Y,Z)$ is a supersolution of the BSDE with weak terminal condition. It provides a non-Markovian BSDE formulation of the PDE characterization obtained for Markovian stochastic target problems under controlled loss in Bouchard, Elie and Touzi \cite{BoElTo09}. We then study the main properties of this minimal value. In particular, we analyze its continuity and convexity with respect to the $m$-parameter appearing in the weak terminal condition, and show how it can be related to a dual optimal control problem in Meyer form. These last properties generalize to a non Markovian framework previous results on quantile hedging and hedging under loss constraints obtained in Föllmer and Leukert \cite{FoLe99,FoLe00}, and in Bouchard, Elie and Touzi \cite{BoElTo09}. △ Less

Submitted 24 February, 2014; v1 submitted 19 October, 2012; originally announced October 2012.

arXiv:1210.1407 [pdf, ps, other]

doi 10.1214/11-AAP771

Discrete-time approximation of multidimensional BSDEs with oblique reflections

Authors: Jean-Francois Chassagneux, Romuald Elie, Idris Kharroubi

Abstract: In this paper, we study the discrete-time approximation of multidimensional reflected BSDEs of the type of those presented by Hu and Tang [Probab. Theory Related Fields 147 (2010) 89-121] and generalized by Hamadène and Zhang [Stochastic Process. Appl. 120 (2010) 403-426]. In comparison to the penalizing approach followed by Hamadène and Jeanblanc [Math. Oper. Res. 32 (2007) 182-192] or Elie and K… ▽ More In this paper, we study the discrete-time approximation of multidimensional reflected BSDEs of the type of those presented by Hu and Tang [Probab. Theory Related Fields 147 (2010) 89-121] and generalized by Hamadène and Zhang [Stochastic Process. Appl. 120 (2010) 403-426]. In comparison to the penalizing approach followed by Hamadène and Jeanblanc [Math. Oper. Res. 32 (2007) 182-192] or Elie and Kharroubi [Statist. Probab. Lett. 80 (2010) 1388-1396], we study a more natural scheme based on oblique projections. We provide a control on the error of the algorithm by introducing and studying the notion of multidimensional discretely reflected BSDE. In the particular case where the driver does not depend on the variable $Z$, the error on the grid points is of order $1/2-\varepsilon$, $\varepsilon>0$. △ Less

Submitted 4 October, 2012; originally announced October 2012.

Comments: Published in at http://dx.doi.org/10.1214/11-AAP771 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP771

Journal ref: Annals of Applied Probability 2012, Vol. 22, No. 3, 971-1007

arXiv:0909.2624 [pdf, ps, other]

Double Kernel estimation of sensitivities

Authors: Romuald Elie

Abstract: This paper adresses the general issue of estimating the sensitivity of the expectation of a random variable with respect to a parameter characterizing its evolution. In finance for example, the sensitivities of the price of a contingent claim are called the Greeks. A new way of estimating the Greeks has been recently introduced by Elie, Fermanian and Touzi through a randomization of the paramete… ▽ More This paper adresses the general issue of estimating the sensitivity of the expectation of a random variable with respect to a parameter characterizing its evolution. In finance for example, the sensitivities of the price of a contingent claim are called the Greeks. A new way of estimating the Greeks has been recently introduced by Elie, Fermanian and Touzi through a randomization of the parameter of interest combined with non parametric estimation techniques. This paper studies another type of those estimators whose interest is to be closely related to the score function, which is well known to be the optimal Greek weight. This estimator relies on the use of two distinct kernel functions and the main interest of this paper is to provide its asymptotic properties. Under a little more stringent condition, its rate of convergence equals the one of those introduced by Elie, Fermanian and Touzi and outperforms the finite differences estimator. In addition to the technical interest of the proofs, this result is very encouraging in the dynamic of creating new type of estimators for sensitivities. △ Less

Submitted 14 September, 2009; originally announced September 2009.

MSC Class: 62G08 (Primary); 11K45 (Secondary)

Journal ref: Journal of Applied Probability 46, 3 (2009)

arXiv:0909.0998 [pdf, ps, other]

Probabilistic Representation and Approximation for Coupled Systems of Variational Inequalities

Authors: Romuald Elie, Idris Kharroubi

Abstract: Our study is dedicated to the probabilistic representation and numerical approximation of solutions to coupled systems of variational inequalities. The dynamics of each component of the solution is driven by a different linear parabolic operator and suffers a non-linear dependence in all the components of the solution. This dynamics is combined with a global structural constraint between all the c… ▽ More Our study is dedicated to the probabilistic representation and numerical approximation of solutions to coupled systems of variational inequalities. The dynamics of each component of the solution is driven by a different linear parabolic operator and suffers a non-linear dependence in all the components of the solution. This dynamics is combined with a global structural constraint between all the components of the solution including the practical example of optimal switching problems. In this paper, we interpret the unique viscosity solution to this type of coupled systems of variational inequalities as the solution to one-dimensional constrained BSDEs with jumps introduced recently in [6]. In the spirit of [3], this new representation allows for the introduction of a natural entirely probabilistic numerical scheme for the resolution of these systems. △ Less

Submitted 4 March, 2011; v1 submitted 5 September, 2009; originally announced September 2009.

Journal ref: Statistics and Probability Letters 80, 17-18 (2010) 1388-1396

arXiv:0903.3372 [pdf, ps, other]

Adding constraints to BSDEs with Jumps: an alternative to multidimensional reflections

Authors: Romuald Elie, Idris Kharroubi

Abstract: This paper is dedicated to the analysis of backward stochastic differential equations (BSDEs) with jumps, subject to an additional global constraint involving all the components of the solution. We study the existence and uniqueness of a minimal solution for these so-called constrained BSDEs with jumps via a penalization procedure. This new type of BSDE offers a nice and practical unifying framewo… ▽ More This paper is dedicated to the analysis of backward stochastic differential equations (BSDEs) with jumps, subject to an additional global constraint involving all the components of the solution. We study the existence and uniqueness of a minimal solution for these so-called constrained BSDEs with jumps via a penalization procedure. This new type of BSDE offers a nice and practical unifying framework to the notions of constrained BSDEs presented in [19] and BSDEs with constrained jumps introduced in [14]. More remarkably, the solution of a multidimensional Brownian reflected BSDE studied in [11] and [13] can also be represented via a well chosen one-dimensional constrained BSDE with jumps.This last result is very promising from a numerical point of view for the resolution of high dimensional optimal switching problems and more generally for systems of coupled variational inequalities △ Less

Submitted 9 March, 2011; v1 submitted 19 March, 2009; originally announced March 2009.

arXiv:0710.4392 [pdf, ps, other]

doi 10.1214/105051607000000186

Kernel estimation of Greek weights by parameter randomization

Authors: Romuald Elie, Jean-David Fermanian, Nizar Touzi

Abstract: A Greek weight associated to a parameterized random variable $Z(λ)$ is a random variable $π$ such that $\nabla_λE[φ(Z(λ))]=E[φ(Z(λ))π]$ for any function $φ$. The importance of the set of Greek weights for the purpose of Monte Carlo simulations has been highlighted in the recent literature. Our main concern in this paper is to devise methods which produce the optimal weight, which is well known t… ▽ More A Greek weight associated to a parameterized random variable $Z(λ)$ is a random variable $π$ such that $\nabla_λE[φ(Z(λ))]=E[φ(Z(λ))π]$ for any function $φ$. The importance of the set of Greek weights for the purpose of Monte Carlo simulations has been highlighted in the recent literature. Our main concern in this paper is to devise methods which produce the optimal weight, which is well known to be given by the score, in a general context where the density of $Z(λ)$ is not explicitly known. To do this, we randomize the parameter $λ$ by introducing an a priori distribution, and we use classical kernel estimation techniques in order to estimate the score function. By an integration by parts argument on the limit of this first kernel estimator, we define an alternative simpler kernel-based estimator which turns out to be closely related to the partial gradient of the kernel-based estimator of $\mathbb{E}[φ(Z(λ))]$. Similarly to the finite differences technique, and unlike the so-called Malliavin method, our estimators are biased, but their implementation does not require any advanced mathematical calculation. We provide an asymptotic analysis of the mean squared error of these estimators, as well as their asymptotic distributions. For a discontinuous payoff function, the kernel estimator outperforms the classical finite differences one in terms of the asymptotic rate of convergence. This result is confirmed by our numerical experiments. △ Less

Submitted 24 October, 2007; originally announced October 2007.

Comments: Published in at http://dx.doi.org/10.1214/105051607000000186 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP431 MSC Class: 11K45; 62G08 (Primary) 60G07 (Secondary)

Journal ref: Annals of Applied Probability 2007, Vol. 17, No. 4, 1399-1423

Showing 1–44 of 44 results for author: Elie, R