-
NeuroPapyri: A Deep Attention Embedding Network for Handwritten Papyri Retrieval
Authors:
Giuseppe De Gregorio,
Simon Perrin,
Rodrigo C. G. Pena,
Isabelle Marthot-Santaniello,
Harold Mouchère
Abstract:
The intersection of computer vision and machine learning has emerged as a promising avenue for advancing historical research, facilitating a more profound exploration of our past. However, the application of machine learning approaches in historical palaeography is often met with criticism due to their perceived ``black box'' nature. In response to this challenge, we introduce NeuroPapyri, an inno…
▽ More
The intersection of computer vision and machine learning has emerged as a promising avenue for advancing historical research, facilitating a more profound exploration of our past. However, the application of machine learning approaches in historical palaeography is often met with criticism due to their perceived ``black box'' nature. In response to this challenge, we introduce NeuroPapyri, an innovative deep learning-based model specifically designed for the analysis of images containing ancient Greek papyri. To address concerns related to transparency and interpretability, the model incorporates an attention mechanism. This attention mechanism not only enhances the model's performance but also provides a visual representation of the image regions that significantly contribute to the decision-making process. Specifically calibrated for processing images of papyrus documents with lines of handwritten text, the model utilizes individual attention maps to inform the presence or absence of specific characters in the input image. This paper presents the NeuroPapyri model, including its architecture and training methodology. Results from the evaluation demonstrate NeuroPapyri's efficacy in document retrieval, showcasing its potential to advance the analysis of historical manuscripts.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
Gemma 2: Improving Open Language Models at a Practical Size
Authors:
Gemma Team,
Morgane Riviere,
Shreya Pathak,
Pier Giuseppe Sessa,
Cassidy Hardin,
Surya Bhupatiraju,
Léonard Hussenot,
Thomas Mesnard,
Bobak Shahriari,
Alexandre Ramé,
Johan Ferret,
Peter Liu,
Pouya Tafti,
Abe Friesen,
Michelle Casbon,
Sabela Ramos,
Ravin Kumar,
Charline Le Lan,
Sammy Jerome,
Anton Tsitsulin,
Nino Vieillard,
Piotr Stanczyk,
Sertan Girgin,
Nikola Momchev,
Matt Hoffman
, et al. (172 additional authors not shown)
Abstract:
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al…
▽ More
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.
△ Less
Submitted 2 August, 2024; v1 submitted 31 July, 2024;
originally announced August 2024.
-
BOND: Aligning LLMs with Best-of-N Distillation
Authors:
Pier Giuseppe Sessa,
Robert Dadashi,
Léonard Hussenot,
Johan Ferret,
Nino Vieillard,
Alexandre Ramé,
Bobak Shariari,
Sarah Perrin,
Abe Friesen,
Geoffrey Cideron,
Sertan Girgin,
Piotr Stanczyk,
Andrea Michi,
Danila Sinopalnikov,
Sabela Ramos,
Amélie Héliou,
Aliaksei Severyn,
Matt Hoffman,
Nikola Momchev,
Olivier Bachem
Abstract:
Reinforcement learning from human feedback (RLHF) is a key driver of quality and safety in state-of-the-art large language models. Yet, a surprisingly simple and strong inference-time strategy is Best-of-N sampling that selects the best generation among N candidates. In this paper, we propose Best-of-N Distillation (BOND), a novel RLHF algorithm that seeks to emulate Best-of-N but without its sign…
▽ More
Reinforcement learning from human feedback (RLHF) is a key driver of quality and safety in state-of-the-art large language models. Yet, a surprisingly simple and strong inference-time strategy is Best-of-N sampling that selects the best generation among N candidates. In this paper, we propose Best-of-N Distillation (BOND), a novel RLHF algorithm that seeks to emulate Best-of-N but without its significant computational overhead at inference time. Specifically, BOND is a distribution matching algorithm that forces the distribution of generations from the policy to get closer to the Best-of-N distribution. We use the Jeffreys divergence (a linear combination of forward and backward KL) to balance between mode-covering and mode-seeking behavior, and derive an iterative formulation that utilizes a moving anchor for efficiency. We demonstrate the effectiveness of our approach and several design choices through experiments on abstractive summarization and Gemma models. Aligning Gemma policies with BOND outperforms other RLHF algorithms by improving results on several benchmarks.
△ Less
Submitted 19 July, 2024;
originally announced July 2024.
-
Approximating the Core via Iterative Coalition Sampling
Authors:
Ian Gemp,
Marc Lanctot,
Luke Marris,
Yiran Mao,
Edgar Duéñez-Guzmán,
Sarah Perrin,
Andras Gyorgy,
Romuald Elie,
Georgios Piliouras,
Michael Kaisers,
Daniel Hennes,
Kalesha Bullard,
Kate Larson,
Yoram Bachrach
Abstract:
The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has long been known that the core (and approximations, such as the least-core) are hard to compute. This limits our ability to analyze cooperative games in general, a…
▽ More
The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has long been known that the core (and approximations, such as the least-core) are hard to compute. This limits our ability to analyze cooperative games in general, and to fully embrace cooperative game theory contributions in domains such as explainable AI (XAI), where the core can complement the Shapley values to identify influential features or instances supporting predictions by black-box models. We propose novel iterative algorithms for computing variants of the core, which avoid the computational bottleneck of many other approaches; namely solving large linear programs. As such, they scale better to very large problems as we demonstrate across different classes of cooperative games, including weighted voting games, induced subgraph games, and marginal contribution networks. We also explore our algorithms in the context of XAI, providing further evidence of the power of the core for such applications.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Learning Correlated Equilibria in Mean-Field Games
Authors:
Paul Muller,
Romuald Elie,
Mark Rowland,
Mathieu Lauriere,
Julien Perolat,
Sarah Perrin,
Matthieu Geist,
Georgios Piliouras,
Olivier Pietquin,
Karl Tuyls
Abstract:
The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-…
▽ More
The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-Field games, an approximation of anonymous $N$-player games, where the number of players is infinite and the population's state distribution, instead of every individual player's state, is the object of interest. The practical computability of Mean-Field Nash equilibria, the most studied Mean-Field equilibrium to date, however, typically depends on beneficial non-generic structural properties such as monotonicity or contraction properties, which are required for known algorithms to converge. In this work, we provide an alternative route for studying Mean-Field games, by developing the concepts of Mean-Field correlated and coarse-correlated equilibria. We show that they can be efficiently learnt in \emph{all games}, without requiring any additional assumption on the structure of the game, using three classical algorithms. Furthermore, we establish correspondences between our notions and those already present in the literature, derive optimality bounds for the Mean-Field - $N$-player transition, and empirically demonstrate the convergence of these algorithms on simple games.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Learning in Mean Field Games: A Survey
Authors:
Mathieu Laurière,
Sarah Perrin,
Julien Pérolat,
Sertan Girgin,
Paul Muller,
Romuald Élie,
Matthieu Geist,
Olivier Pietquin
Abstract:
Non-cooperative and cooperative games with a very large number of players have many applications but remain generally intractable when the number of players increases. Introduced by Lasry and Lions, and Huang, Caines and Malhamé, Mean Field Games (MFGs) rely on a mean-field approximation to allow the number of players to grow to infinity. Traditional methods for solving these games generally rely…
▽ More
Non-cooperative and cooperative games with a very large number of players have many applications but remain generally intractable when the number of players increases. Introduced by Lasry and Lions, and Huang, Caines and Malhamé, Mean Field Games (MFGs) rely on a mean-field approximation to allow the number of players to grow to infinity. Traditional methods for solving these games generally rely on solving partial or stochastic differential equations with a full knowledge of the model. Recently, Reinforcement Learning (RL) has appeared promising to solve complex problems at scale. The combination of RL and MFGs is promising to solve games at a very large scale both in terms of population size and environment complexity. In this survey, we review the quickly growing recent literature on RL methods to learn equilibria and social optima in MFGs. We first identify the most common settings (static, stationary, and evolutive) of MFGs. We then present a general framework for classical iterative methods (based on best-response computation or policy evaluation) to solve MFGs in an exact way. Building on these algorithms and the connection with Markov Decision Processes, we explain how RL can be used to learn MFG solutions in a model-free way. Last, we present numerical illustrations on a benchmark problem, and conclude with some perspectives.
△ Less
Submitted 26 July, 2024; v1 submitted 25 May, 2022;
originally announced May 2022.
-
Scalable Deep Reinforcement Learning Algorithms for Mean Field Games
Authors:
Mathieu Laurière,
Sarah Perrin,
Sertan Girgin,
Paul Muller,
Ayush Jain,
Theophile Cabannes,
Georgios Piliouras,
Julien Pérolat,
Romuald Élie,
Olivier Pietquin,
Matthieu Geist
Abstract:
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quant…
▽ More
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values. This is far from being trivial in the case of non-linear function approximation that enjoy good generalization properties, e.g. neural networks. We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm. The second one is an online mixing method based on regularization that does not require memorizing historical data or previous estimates. It is used to extend Online Mirror Descent. We demonstrate numerically that these methods efficiently enable the use of Deep RL algorithms to solve various MFGs. In addition, we show that these methods outperform SotA baselines from the literature.
△ Less
Submitted 17 June, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
Solving N-player dynamic routing games with congestion: a mean field approach
Authors:
Theophile Cabannes,
Mathieu Lauriere,
Julien Perolat,
Raphael Marinier,
Sertan Girgin,
Sarah Perrin,
Olivier Pietquin,
Alexandre M. Bayen,
Eric Goubault,
Romuald Elie
Abstract:
The recent emergence of navigational tools has changed traffic patterns and has now enabled new types of congestion-aware routing control like dynamic road pricing. Using the fundamental diagram of traffic flows - applied in macroscopic and mesoscopic traffic modeling - the article introduces a new N-player dynamic routing game with explicit congestion dynamics. The model is well-posed and can rep…
▽ More
The recent emergence of navigational tools has changed traffic patterns and has now enabled new types of congestion-aware routing control like dynamic road pricing. Using the fundamental diagram of traffic flows - applied in macroscopic and mesoscopic traffic modeling - the article introduces a new N-player dynamic routing game with explicit congestion dynamics. The model is well-posed and can reproduce heterogeneous departure times and congestion spill back phenomena. However, as Nash equilibrium computations are PPAD-complete, solving the game becomes intractable for large but realistic numbers of vehicles N. Therefore, the corresponding mean field game is also introduced. Experiments were performed on several classical benchmark networks of the traffic community: the Pigou, Braess, and Sioux Falls networks with heterogeneous origin, destination and departure time tuples. The Pigou and the Braess examples reveal that the mean field approximation is generally very accurate and computationally efficient as soon as the number of vehicles exceeds a few dozen. On the Sioux Falls network (76 links, 100 time steps), this approach enables learning traffic dynamics with more than 14,000 vehicles.
△ Less
Submitted 27 October, 2021; v1 submitted 22 October, 2021;
originally announced October 2021.
-
Generalization in Mean Field Games by Learning Master Policies
Authors:
Sarah Perrin,
Mathieu Laurière,
Julien Pérolat,
Romuald Élie,
Matthieu Geist,
Olivier Pietquin
Abstract:
Mean Field Games (MFGs) can potentially scale multi-agent systems to extremely large populations of agents. Yet, most of the literature assumes a single initial distribution for the agents, which limits the practical applications of MFGs. Machine Learning has the potential to solve a wider diversity of MFG problems thanks to generalizations capacities. We study how to leverage these generalization…
▽ More
Mean Field Games (MFGs) can potentially scale multi-agent systems to extremely large populations of agents. Yet, most of the literature assumes a single initial distribution for the agents, which limits the practical applications of MFGs. Machine Learning has the potential to solve a wider diversity of MFG problems thanks to generalizations capacities. We study how to leverage these generalization properties to learn policies enabling a typical agent to behave optimally against any population distribution. In reference to the Master equation in MFGs, we coin the term ``Master policies'' to describe them and we prove that a single Master policy provides a Nash equilibrium, whatever the initial distribution. We propose a method to learn such Master policies. Our approach relies on three ingredients: adding the current population distribution as part of the observation, approximating Master policies with neural networks, and training via Reinforcement Learning and Fictitious Play. We illustrate on numerical examples not only the efficiency of the learned Master policy but also its generalization capabilities beyond the distributions used for training.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint
Authors:
Matthieu Geist,
Julien Pérolat,
Mathieu Laurière,
Romuald Elie,
Sarah Perrin,
Olivier Bachem,
Rémi Munos,
Olivier Pietquin
Abstract:
Concave Utility Reinforcement Learning (CURL) extends RL from linear to concave utilities in the occupancy measure induced by the agent's policy. This encompasses not only RL but also imitation learning and exploration, among others. Yet, this more general paradigm invalidates the classical Bellman equations, and calls for new algorithms. Mean-field Games (MFGs) are a continuous approximation of m…
▽ More
Concave Utility Reinforcement Learning (CURL) extends RL from linear to concave utilities in the occupancy measure induced by the agent's policy. This encompasses not only RL but also imitation learning and exploration, among others. Yet, this more general paradigm invalidates the classical Bellman equations, and calls for new algorithms. Mean-field Games (MFGs) are a continuous approximation of many-agent RL. They consider the limit case of a continuous distribution of identical agents, anonymous with symmetric interests, and reduce the problem to the study of a single representative agent in interaction with the full population. Our core contribution consists in showing that CURL is a subclass of MFGs. We think this important to bridge together both communities. It also allows to shed light on aspects of both fields: we show the equivalence between concavity in CURL and monotonicity in the associated MFG, between optimality conditions in CURL and Nash equilibrium in MFG, or that Fictitious Play (FP) for this class of MFGs is simply Frank-Wolfe, bringing the first convergence rate for discrete-time FP for MFGs. We also experimentally demonstrate that, using algorithms recently introduced for solving MFGs, we can address the CURL problem more efficiently.
△ Less
Submitted 16 February, 2022; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Mean Field Games Flock! The Reinforcement Learning Way
Authors:
Sarah Perrin,
Mathieu Laurière,
Julien Pérolat,
Matthieu Geist,
Romuald Élie,
Olivier Pietquin
Abstract:
We present a method enabling a large number of agents to learn how to flock, which is a natural behavior observed in large populations of animals. This problem has drawn a lot of interest but requires many structural assumptions and is tractable only in small dimensions. We phrase this problem as a Mean Field Game (MFG), where each individual chooses its acceleration depending on the population be…
▽ More
We present a method enabling a large number of agents to learn how to flock, which is a natural behavior observed in large populations of animals. This problem has drawn a lot of interest but requires many structural assumptions and is tractable only in small dimensions. We phrase this problem as a Mean Field Game (MFG), where each individual chooses its acceleration depending on the population behavior. Combining Deep Reinforcement Learning (RL) and Normalizing Flows (NF), we obtain a tractable solution requiring only very weak assumptions. Our algorithm finds a Nash Equilibrium and the agents adapt their velocity to match the neighboring flock's average one. We use Fictitious Play and alternate: (1) computing an approximate best response with Deep RL, and (2) estimating the next population distribution with NF. We show numerically that our algorithm learn multi-group or high-dimensional flocking with obstacles.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Scaling up Mean Field Games with Online Mirror Descent
Authors:
Julien Perolat,
Sarah Perrin,
Romuald Elie,
Mathieu Laurière,
Georgios Piliouras,
Matthieu Geist,
Karl Tuyls,
Olivier Pietquin
Abstract:
We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD). We show that continuous-time OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions. This theoretical result nicely extends to multi-population games and to settings involving common noise. A thorough experimental investigation on vari…
▽ More
We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD). We show that continuous-time OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions. This theoretical result nicely extends to multi-population games and to settings involving common noise. A thorough experimental investigation on various single and multi-population MFGs shows that OMD outperforms traditional algorithms such as Fictitious Play (FP). We empirically show that OMD scales up and converges significantly faster than FP by solving, for the first time to our knowledge, examples of MFGs with hundreds of billions states. This study establishes the state-of-the-art for learning in large-scale multi-agent and multi-population games.
△ Less
Submitted 28 February, 2021;
originally announced March 2021.
-
Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications
Authors:
Sarah Perrin,
Julien Perolat,
Mathieu Laurière,
Matthieu Geist,
Romuald Elie,
Olivier Pietquin
Abstract:
In this paper, we deepen the analysis of continuous time Fictitious Play learning algorithm to the consideration of various finite state Mean Field Game settings (finite horizon, $γ$-discounted), allowing in particular for the introduction of an additional common noise.
We first present a theoretical convergence analysis of the continuous time Fictitious Play process and prove that the induced e…
▽ More
In this paper, we deepen the analysis of continuous time Fictitious Play learning algorithm to the consideration of various finite state Mean Field Game settings (finite horizon, $γ$-discounted), allowing in particular for the introduction of an additional common noise.
We first present a theoretical convergence analysis of the continuous time Fictitious Play process and prove that the induced exploitability decreases at a rate $O(\frac{1}{t})$. Such analysis emphasizes the use of exploitability as a relevant metric for evaluating the convergence towards a Nash equilibrium in the context of Mean Field Games. These theoretical contributions are supported by numerical experiments provided in either model-based or model-free settings. We provide hereby for the first time converging learning dynamics for Mean Field Games in the presence of common noise.
△ Less
Submitted 26 October, 2020; v1 submitted 5 July, 2020;
originally announced July 2020.
-
Decision method choice in a human posture recognition context
Authors:
Stéphane Perrin,
Eric Benoit,
Didier Coquin
Abstract:
Human posture recognition provides a dynamic field that has produced many methods. Using fuzzy subsets based data fusion methods to aggregate the results given by different types of recognition processes is a convenient way to improve recognition methods. Nevertheless, choosing a defuzzification method to imple-ment the decision is a crucial point of this approach. The goal of this paper is to pre…
▽ More
Human posture recognition provides a dynamic field that has produced many methods. Using fuzzy subsets based data fusion methods to aggregate the results given by different types of recognition processes is a convenient way to improve recognition methods. Nevertheless, choosing a defuzzification method to imple-ment the decision is a crucial point of this approach. The goal of this paper is to present an approach where the choice of the defuzzification method is driven by the constraints of the final data user, which are expressed as limitations on indica-tors like confidence or accuracy. A practical experimentation illustrating this ap-proach is presented: from a depth camera sensor, human posture is interpreted and the defuzzification method is selected in accordance with the constraints of the final information consumer. The paper illustrates the interest of the approach in a context of postures based human robot communication.
△ Less
Submitted 11 July, 2018;
originally announced July 2018.