-
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
Authors:
William Berman,
Alexander Peysakhovich
Abstract:
We train a model to generate images from multimodal prompts of interleaved text and images such as "a <picture of a man> man and his <picture of a dog> dog in an <picture of a cartoon> animated style." We bootstrap a multimodal dataset by extracting semantically meaningful image crops corresponding to words in the image captions of synthetically generated and publicly available text-image data. Ou…
▽ More
We train a model to generate images from multimodal prompts of interleaved text and images such as "a <picture of a man> man and his <picture of a dog> dog in an <picture of a cartoon> animated style." We bootstrap a multimodal dataset by extracting semantically meaningful image crops corresponding to words in the image captions of synthetically generated and publicly available text-image data. Our model, MUMU, is composed of a vision-language model encoder with a diffusion decoder and is trained on a single 8xH100 GPU node. Despite being only trained on crops from the same image, MUMU learns to compose inputs from different images into a coherent output. For example, an input of a realistic person and a cartoon will output the same person in the cartoon style, and an input of a standing subject and a scooter will output the subject riding the scooter. As a result, our model generalizes to tasks such as style transfer and character consistency. Our results show the promise of using multimodal models as general purpose controllers for image generation.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Attention Sorting Combats Recency Bias In Long Context Language Models
Authors:
Alexander Peysakhovich,
Adam Lerer
Abstract:
Current language models often fail to incorporate long contexts efficiently during generation. We show that a major contributor to this issue are attention priors that are likely learned during pre-training: relevant information located earlier in context is attended to less on average. Yet even when models fail to use the information from a relevant document in their response, they still pay pref…
▽ More
Current language models often fail to incorporate long contexts efficiently during generation. We show that a major contributor to this issue are attention priors that are likely learned during pre-training: relevant information located earlier in context is attended to less on average. Yet even when models fail to use the information from a relevant document in their response, they still pay preferential attention to that document compared to an irrelevant document at the same position. We leverage this fact to introduce ``attention sorting'': perform one step of decoding, sort documents by the attention they receive (highest attention going last), repeat the process, generate the answer with the newly sorted context. We find that attention sorting improves performance of long context models. Our findings highlight some challenges in using off-the-shelf language models for retrieval augmented generation.
△ Less
Submitted 28 September, 2023;
originally announced October 2023.
-
Diagnosis Uncertain Models For Medical Risk Prediction
Authors:
Alexander Peysakhovich,
Rich Caruana,
Yin Aphinyanaphongs
Abstract:
We consider a patient risk models which has access to patient features such as vital signs, lab values, and prior history but does not have access to a patient's diagnosis. For example, this occurs in a model deployed at intake time for triage purposes. We show that such `all-cause' risk models have good generalization across diagnoses but have a predictable failure mode. When the same lab/vital/h…
▽ More
We consider a patient risk models which has access to patient features such as vital signs, lab values, and prior history but does not have access to a patient's diagnosis. For example, this occurs in a model deployed at intake time for triage purposes. We show that such `all-cause' risk models have good generalization across diagnoses but have a predictable failure mode. When the same lab/vital/history profiles can result from diagnoses with different risk profiles (e.g. E.coli vs. MRSA) the risk estimate is a probability weighted average of these two profiles. This leads to an under-estimation of risk for rare but highly risky diagnoses. We propose a fix for this problem by explicitly modeling the uncertainty in risk prediction coming from uncertainty in patient diagnoses. This gives practitioners an interpretable way to understand patient risk beyond a single risk number.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Implementing Fairness Constraints in Markets Using Taxes and Subsidies
Authors:
Alexander Peysakhovich,
Christian Kroer,
Nicolas Usunier
Abstract:
Fisher markets are those where buyers with budgets compete for scarce items, a natural model for many real world markets including online advertising. A market equilibrium is a set of prices and allocations of items such that supply meets demand. We show how market designers can use taxes or subsidies in Fisher markets to ensure that market equilibrium outcomes fall within certain constraints. We…
▽ More
Fisher markets are those where buyers with budgets compete for scarce items, a natural model for many real world markets including online advertising. A market equilibrium is a set of prices and allocations of items such that supply meets demand. We show how market designers can use taxes or subsidies in Fisher markets to ensure that market equilibrium outcomes fall within certain constraints. We show how these taxes and subsidies can be computed even in an online setting where the market designer does not have access to private valuations. We adapt various types of fairness constraints proposed in existing literature to the market case and show who benefits and who loses from these constraints, as well as the extent to which properties of markets including Pareto optimality, envy-freeness, and incentive compatibility are preserved. We find that some prior discussed constraints have few guarantees in terms of who is made better or worse off by their imposition.
△ Less
Submitted 13 March, 2023; v1 submitted 5 October, 2022;
originally announced October 2022.
-
Efficient Heterogeneous Treatment Effect Estimation With Multiple Experiments and Multiple Outcomes
Authors:
Leon Yao,
Caroline Lo,
Israel Nir,
Sarah Tan,
Ariel Evnine,
Adam Lerer,
Alex Peysakhovich
Abstract:
Learning heterogeneous treatment effects (HTEs) is an important problem across many fields. Most existing methods consider the setting with a single treatment arm and a single outcome metric. However, in many real world domains, experiments are run consistently - for example, in internet companies, A/B tests are run every day to measure the impacts of potential changes across many different metric…
▽ More
Learning heterogeneous treatment effects (HTEs) is an important problem across many fields. Most existing methods consider the setting with a single treatment arm and a single outcome metric. However, in many real world domains, experiments are run consistently - for example, in internet companies, A/B tests are run every day to measure the impacts of potential changes across many different metrics of interest. We show that even if an analyst cares only about the HTEs in one experiment for one metric, precision can be improved greatly by analyzing all of the data together to take advantage of cross-experiment and cross-outcome metric correlations. We formalize this idea in a tensor factorization framework and propose a simple and scalable model which we refer to as the low rank or LR-learner. Experiments in both synthetic and real data suggest that the LR-learner can be much more precise than independent HTE estimation.
△ Less
Submitted 10 June, 2022;
originally announced June 2022.
-
Pseudo-Euclidean Attract-Repel Embeddings for Undirected Graphs
Authors:
Alexander Peysakhovich,
Anna Klimovskaia Susmel,
Leon Bottou
Abstract:
Dot product embeddings take a graph and construct vectors for nodes such that dot products between two vectors give the strength of the edge. Dot products make a strong transitivity assumption, however, many important forces generating graphs in the real world lead to non-transitive relationships. We remove the transitivity assumption by embedding nodes into a pseudo-Euclidean space - giving each…
▽ More
Dot product embeddings take a graph and construct vectors for nodes such that dot products between two vectors give the strength of the edge. Dot products make a strong transitivity assumption, however, many important forces generating graphs in the real world lead to non-transitive relationships. We remove the transitivity assumption by embedding nodes into a pseudo-Euclidean space - giving each node an attract and a repel vector. The inner product between two nodes is defined by taking the dot product in attract vectors and subtracting the dot product in repel vectors. Pseudo-Euclidean embeddings can compress networks efficiently, allow for multiple notions of nearest neighbors each with their own interpretation, and can be `slotted' into existing models such as exponential family embeddings or graph neural networks for better link prediction.
△ Less
Submitted 23 March, 2023; v1 submitted 17 June, 2021;
originally announced June 2021.
-
Online Market Equilibrium with Application to Fair Division
Authors:
Yuan Gao,
Christian Kroer,
Alex Peysakhovich
Abstract:
Computing market equilibria is a problem of both theoretical and applied interest. Much research to date focuses on the case of static Fisher markets with full information on buyers' utility functions and item supplies. Motivated by real-world markets, we consider an online setting: individuals have linear, additive utility functions; items arrive sequentially and must be allocated and priced irre…
▽ More
Computing market equilibria is a problem of both theoretical and applied interest. Much research to date focuses on the case of static Fisher markets with full information on buyers' utility functions and item supplies. Motivated by real-world markets, we consider an online setting: individuals have linear, additive utility functions; items arrive sequentially and must be allocated and priced irrevocably. We define the notion of an online market equilibrium in such a market as time-indexed allocations and prices which guarantee buyer optimality and market clearance in hindsight. We propose a simple, scalable and interpretable allocation and pricing dynamics termed as PACE. When items are drawn i.i.d. from an unknown distribution (with a possibly continuous support), we show that PACE leads to an online market equilibrium asymptotically. In particular, PACE ensures that buyers' time-averaged utilities converge to the equilibrium utilities w.r.t. a static market with item supplies being the unknown distribution and that buyers' time-averaged expenditures converge to their per-period budget. Hence, many desirable properties of market equilibrium-based fair division such as no envy, Pareto optimality, and the proportional-share guarantee are also attained asymptotically in the online setting. Next, we extend the dynamics to handle quasilinear buyer utilities, which gives the first online algorithm for computing first-price pacing equilibria. Finally, numerical experiments on real and synthetic datasets show that the dynamics converges quickly under various metrics.
△ Less
Submitted 2 October, 2021; v1 submitted 23 March, 2021;
originally announced March 2021.
-
Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian
Authors:
Jack Parker-Holder,
Luke Metz,
Cinjon Resnick,
Hengyuan Hu,
Adam Lerer,
Alistair Letcher,
Alex Peysakhovich,
Aldo Pacchiano,
Jakob Foerster
Abstract:
Over the last decade, a single algorithm has changed many facets of our lives - Stochastic Gradient Descent (SGD). In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs). While SGD is guaranteed to converge to a local optimum (under loose assumption…
▽ More
Over the last decade, a single algorithm has changed many facets of our lives - Stochastic Gradient Descent (SGD). In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs). While SGD is guaranteed to converge to a local optimum (under loose assumptions), in some cases it may matter which local optimum is found, and this is often context-dependent. Examples frequently arise in machine learning, from shape-versus-texture-features to ensemble methods and zero-shot coordination. In these settings, there are desired solutions which SGD on 'standard' loss functions will not find, since it instead converges to the 'easy' solutions. In this paper, we present a different approach. Rather than following the gradient, which corresponds to a locally greedy direction, we instead follow the eigenvectors of the Hessian, which we call "ridges". By iteratively following and branching amongst the ridges, we effectively span the loss surface to find qualitatively different solutions. We show both theoretically and experimentally that our method, called Ridge Rider (RR), offers a promising direction for a variety of challenging problems.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Evaluating and Rewarding Teamwork Using Cooperative Game Abstractions
Authors:
Tom Yan,
Christian Kroer,
Alexander Peysakhovich
Abstract:
Can we predict how well a team of individuals will perform together? How should individuals be rewarded for their contributions to the team performance? Cooperative game theory gives us a powerful set of tools for answering these questions: the Characteristic Function (CF) and solution concepts like the Shapley Value (SV). There are two major difficulties in applying these techniques to real world…
▽ More
Can we predict how well a team of individuals will perform together? How should individuals be rewarded for their contributions to the team performance? Cooperative game theory gives us a powerful set of tools for answering these questions: the Characteristic Function (CF) and solution concepts like the Shapley Value (SV). There are two major difficulties in applying these techniques to real world problems: first, the CF is rarely given to us and needs to be learned from data. Second, the SV is combinatorial in nature. We introduce a parametric model called cooperative game abstractions (CGAs) for estimating CFs from data. CGAs are easy to learn, readily interpretable, and crucially allow linear-time computation of the SV. We provide identification results and sample complexity bounds for CGA models as well as error bounds in the estimation of the SV using CGAs. We apply our methods to study teams of artificial RL agents as well as real world teams from professional sports.
△ Less
Submitted 16 June, 2020;
originally announced June 2020.
-
"Other-Play" for Zero-Shot Coordination
Authors:
Hengyuan Hu,
Adam Lerer,
Alex Peysakhovich,
Jakob Foerster
Abstract:
We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e.g. humans). Standard Multi-Agent Reinforcement Learning (MARL) methods typically focus on the self-play (SP) setting where agents construct strategies by playing the game with themselves repeatedly. Unfortunately, applying SP naively to the zero-shot coord…
▽ More
We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e.g. humans). Standard Multi-Agent Reinforcement Learning (MARL) methods typically focus on the self-play (SP) setting where agents construct strategies by playing the game with themselves repeatedly. Unfortunately, applying SP naively to the zero-shot coordination problem can produce agents that establish highly specialized conventions that do not carry over to novel partners they have not been trained with. We introduce a novel learning algorithm called other-play (OP), that enhances self-play by looking for more robust strategies, exploiting the presence of known symmetries in the underlying problem. We characterize OP theoretically as well as experimentally. We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents. In preliminary results we also show that our OP agents obtains higher average scores when paired with human players, compared to state-of-the-art SP agents.
△ Less
Submitted 12 May, 2021; v1 submitted 5 March, 2020;
originally announced March 2020.
-
Robust Market Equilibria with Uncertain Preferences
Authors:
Riley Murray,
Christian Kroer,
Alex Peysakhovich,
Parikshit Shah
Abstract:
The problem of allocating scarce items to individuals is an important practical question in market design. An increasingly popular set of mechanisms for this task uses the concept of market equilibrium: individuals report their preferences, have a budget of real or fake currency, and a set of prices for items and allocations is computed that sets demand equal to supply. An important real world iss…
▽ More
The problem of allocating scarce items to individuals is an important practical question in market design. An increasingly popular set of mechanisms for this task uses the concept of market equilibrium: individuals report their preferences, have a budget of real or fake currency, and a set of prices for items and allocations is computed that sets demand equal to supply. An important real world issue with such mechanisms is that individual valuations are often only imperfectly known. In this paper, we show how concepts from classical market equilibrium can be extended to reflect such uncertainty. We show that in linear, divisible Fisher markets a robust market equilibrium (RME) always exists; this also holds in settings where buyers may retain unspent money. We provide theoretical analysis of the allocative properties of RME in terms of envy and regret. Though RME are hard to compute for general uncertainty sets, we consider some natural and tractable uncertainty sets which lead to well behaved formulations of the problem that can be solved via modern convex programming methods. Finally, we show that very mild uncertainty about valuations can cause RME allocations to outperform those which take estimates as having no underlying uncertainty.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
Scalable Fair Division for 'At Most One' Preferences
Authors:
Christian Kroer,
Alexander Peysakhovich
Abstract:
Allocating multiple scarce items across a set of individuals is an important practical problem. In the case of divisible goods and additive preferences a convex program can be used to find the solution that maximizes Nash welfare (MNW). The MNW solution is equivalent to finding the equilibrium of a market economy (aka. the competitive equilibrium from equal incomes, CEEI) and thus has good propert…
▽ More
Allocating multiple scarce items across a set of individuals is an important practical problem. In the case of divisible goods and additive preferences a convex program can be used to find the solution that maximizes Nash welfare (MNW). The MNW solution is equivalent to finding the equilibrium of a market economy (aka. the competitive equilibrium from equal incomes, CEEI) and thus has good properties such as Pareto optimality, envy-freeness, and incentive compatibility in the large. Unfortunately, this equivalence (and nice properties) breaks down for general preference classes. Motivated by real world problems such as course allocation and recommender systems we study the case of additive `at most one' (AMO) preferences - individuals want at most 1 of each item and lotteries are allowed. We show that in this case the MNW solution is still a convex program and importantly is a CEEI solution when the instance gets large but has a `low rank' structure. Thus a polynomial time algorithm can be used to scale CEEI (which is in general PPAD-hard) for AMO preferences. We examine whether the properties guaranteed in the limit hold approximately in finite samples using several real datasets.
△ Less
Submitted 24 September, 2019;
originally announced September 2019.
-
Fair Division Without Disparate Impact
Authors:
Alexander Peysakhovich,
Christian Kroer
Abstract:
We consider the problem of dividing items between individuals in a way that is fair both in the sense of distributional fairness and in the sense of not having disparate impact across protected classes. An important existing mechanism for distributionally fair division is competitive equilibrium from equal incomes (CEEI). Unfortunately, CEEI will not, in general, respect disparate impact constrain…
▽ More
We consider the problem of dividing items between individuals in a way that is fair both in the sense of distributional fairness and in the sense of not having disparate impact across protected classes. An important existing mechanism for distributionally fair division is competitive equilibrium from equal incomes (CEEI). Unfortunately, CEEI will not, in general, respect disparate impact constraints. We consider two types of disparate impact measures: requiring that allocations be similar across protected classes and requiring that average utility levels be similar across protected classes. We modify the standard CEEI algorithm in two ways: equitable equilibrium from equal incomes, which removes disparate impact in allocations, and competitive equilibrium from equitable incomes which removes disparate impact in attained utility levels. We show analytically that removing disparate impact in outcomes breaks several of CEEI's desirable properties such as envy, regret, Pareto optimality, and incentive compatibility. By contrast, we can remove disparate impact in attained utility levels without affecting these properties. Finally, we experimentally evaluate the tradeoffs between efficiency, equity, and disparate impact in a recommender-system based market.
△ Less
Submitted 6 June, 2019;
originally announced June 2019.
-
Robust Multi-agent Counterfactual Prediction
Authors:
Alexander Peysakhovich,
Christian Kroer,
Adam Lerer
Abstract:
We consider the problem of using logged data to make predictions about what would happen if we changed the `rules of the game' in a multi-agent system. This task is difficult because in many cases we observe actions individuals take but not their private information or their full reward functions. In addition, agents are strategic, so when the rules change, they will also change their actions. Exi…
▽ More
We consider the problem of using logged data to make predictions about what would happen if we changed the `rules of the game' in a multi-agent system. This task is difficult because in many cases we observe actions individuals take but not their private information or their full reward functions. In addition, agents are strategic, so when the rules change, they will also change their actions. Existing methods (e.g. structural estimation, inverse reinforcement learning) make counterfactual predictions by constructing a model of the game, adding the assumption that agents' behavior comes from optimizing given some goals, and then inverting observed actions to learn agent's underlying utility function (a.k.a. type). Once the agent types are known, making counterfactual predictions amounts to solving for the equilibrium of the counterfactual environment. This approach imposes heavy assumptions such as rationality of the agents being observed, correctness of the analyst's model of the environment/parametric form of the agents' utility functions, and various other conditions to make point identification possible. We propose a method for analyzing the sensitivity of counterfactual conclusions to violations of these assumptions. We refer to this method as robust multi-agent counterfactual prediction (RMAC). We apply our technique to investigating the robustness of counterfactual claims for classic environments in market design: auctions, school choice, and social choice. Importantly, we show RMAC can be used in regimes where point identification is impossible (e.g. those which have multiple equilibria or non-injective maps from type distributions to outcomes).
△ Less
Submitted 3 April, 2019;
originally announced April 2019.
-
PyTorch-BigGraph: A Large-scale Graph Embedding System
Authors:
Adam Lerer,
Ledell Wu,
Jiajun Shen,
Timothee Lacroix,
Luca Wehrstedt,
Abhijit Bose,
Alex Peysakhovich
Abstract:
Graph embedding methods produce unsupervised node features from graphs that can then be used for a variety of machine learning tasks. Modern graphs, particularly in industrial applications, contain billions of nodes and trillions of edges, which exceeds the capability of existing embedding systems. We present PyTorch-BigGraph (PBG), an embedding system that incorporates several modifications to tr…
▽ More
Graph embedding methods produce unsupervised node features from graphs that can then be used for a variety of machine learning tasks. Modern graphs, particularly in industrial applications, contain billions of nodes and trillions of edges, which exceeds the capability of existing embedding systems. We present PyTorch-BigGraph (PBG), an embedding system that incorporates several modifications to traditional multi-relation embedding systems that allow it to scale to graphs with billions of nodes and trillions of edges. PBG uses graph partitioning to train arbitrarily large embeddings on either a single machine or in a distributed environment. We demonstrate comparable performance with existing embedding systems on common benchmarks, while allowing for scaling to arbitrarily large graphs and parallelization on multiple machines. We train and evaluate embeddings on several large social network graphs as well as the full Freebase dataset, which contains over 100 million nodes and 2 billion edges.
△ Less
Submitted 9 April, 2019; v1 submitted 28 March, 2019;
originally announced March 2019.
-
Discovering Context Effects from Raw Choice Data
Authors:
Arjun Seshadri,
Alexander Peysakhovich,
Johan Ugander
Abstract:
Many applications in preference learning assume that decisions come from the maximization of a stable utility function. Yet a large experimental literature shows that individual choices and judgements can be affected by "irrelevant" aspects of the context in which they are made. An important class of such contexts is the composition of the choice set. In this work, our goal is to discover such cho…
▽ More
Many applications in preference learning assume that decisions come from the maximization of a stable utility function. Yet a large experimental literature shows that individual choices and judgements can be affected by "irrelevant" aspects of the context in which they are made. An important class of such contexts is the composition of the choice set. In this work, our goal is to discover such choice set effects from raw choice data. We introduce an extension of the Multinomial Logit (MNL) model, called the context dependent random utility model (CDM), which allows for a particular class of choice set effects. We show that the CDM can be thought of as a second-order approximation to a general choice system, can be inferred optimally using maximum likelihood and, importantly, is easily interpretable. We apply the CDM to both real and simulated choice data to perform principled exploratory analyses for the presence of choice set effects.
△ Less
Submitted 31 January, 2020; v1 submitted 8 February, 2019;
originally announced February 2019.
-
Computing large market equilibria using abstractions
Authors:
Christian Kroer,
Alexander Peysakhovich,
Eric Sodomka,
Nicolas E. Stier-Moses
Abstract:
Computing market equilibria is an important practical problem for market design, for example in fair division of items. However, computing equilibria requires large amounts of information (typically the valuation of every buyer for every item) and computing power. We consider ameliorating these issues by applying a method used for solving complex games: constructing a coarsened abstraction of a gi…
▽ More
Computing market equilibria is an important practical problem for market design, for example in fair division of items. However, computing equilibria requires large amounts of information (typically the valuation of every buyer for every item) and computing power. We consider ameliorating these issues by applying a method used for solving complex games: constructing a coarsened abstraction of a given market, solving for the equilibrium in the abstraction, and lifting the prices and allocations back to the original market. We show how to bound important quantities such as regret, envy, Nash social welfare, Pareto optimality, and maximin share/proportionality when the abstracted prices and allocations are used in place of the real equilibrium. We then study two abstraction methods of interest for practitioners: (1) filling in unknown valuations using techniques from matrix completion, (2) reducing the problem size by aggregating groups of buyers/items into smaller numbers of representative buyers/items and solving for equilibrium in this coarsened market. We find that in real data allocations/prices that are relatively close to equilibria can be computed from even very coarse abstractions.
△ Less
Submitted 3 September, 2021; v1 submitted 18 January, 2019;
originally announced January 2019.
-
Reinforcement Learning and Inverse Reinforcement Learning with System 1 and System 2
Authors:
Alexander Peysakhovich
Abstract:
Inferring a person's goal from their behavior is an important problem in applications of AI (e.g. automated assistants, recommender systems). The workhorse model for this task is the rational actor model - this amounts to assuming that people have stable reward functions, discount the future exponentially, and construct optimal plans. Under the rational actor assumption techniques such as inverse…
▽ More
Inferring a person's goal from their behavior is an important problem in applications of AI (e.g. automated assistants, recommender systems). The workhorse model for this task is the rational actor model - this amounts to assuming that people have stable reward functions, discount the future exponentially, and construct optimal plans. Under the rational actor assumption techniques such as inverse reinforcement learning (IRL) can be used to infer a person's goals from their actions. A competing model is the dual-system model. Here decisions are the result of an interplay between a fast, automatic, heuristic-based system 1 and a slower, deliberate, calculating system 2. We generalize the dual system framework to the case of Markov decision problems and show how to compute optimal plans for dual-system agents. We show that dual-system agents exhibit behaviors that are incompatible with rational actor assumption. We show that naive applications of rational-actor IRL to the behavior of dual-system agents can generate wrong inference about the agents' goals and suggest interventions that actually reduce the agent's overall utility. Finally, we adapt a simple IRL algorithm to correctly infer the goals of dual system decision-makers. This allows us to make interventions that help, rather than hinder, the dual-system agent's ability to reach their true goals.
△ Less
Submitted 13 March, 2019; v1 submitted 19 November, 2018;
originally announced November 2018.
-
Improving pairwise comparison models using Empirical Bayes shrinkage
Authors:
Stephen Ragain,
Alexander Peysakhovich,
Johan Ugander
Abstract:
Comparison data arises in many important contexts, e.g. shopping, web clicks, or sports competitions. Typically we are given a dataset of comparisons and wish to train a model to make predictions about the outcome of unseen comparisons. In many cases available datasets have relatively few comparisons (e.g. there are only so many NFL games per year) or efficiency is important (e.g. we want to quick…
▽ More
Comparison data arises in many important contexts, e.g. shopping, web clicks, or sports competitions. Typically we are given a dataset of comparisons and wish to train a model to make predictions about the outcome of unseen comparisons. In many cases available datasets have relatively few comparisons (e.g. there are only so many NFL games per year) or efficiency is important (e.g. we want to quickly estimate the relative appeal of a product). In such settings it is well known that shrinkage estimators outperform maximum likelihood estimators. A complicating matter is that standard comparison models such as the conditional multinomial logit model are only models of conditional outcomes (who wins) and not of comparisons themselves (who competes). As such, different models of the comparison process lead to different shrinkage estimators. In this work we derive a collection of methods for estimating the pairwise uncertainty of pairwise predictions based on different assumptions about the comparison process. These uncertainty estimates allow us both to examine model uncertainty as well as perform Empirical Bayes shrinkage estimation of the model parameters. We demonstrate that our shrunk estimators outperform standard maximum likelihood methods on real comparison data from online comparison surveys as well as from several sports contexts.
△ Less
Submitted 24 July, 2018;
originally announced July 2018.
-
Backplay: "Man muss immer umkehren"
Authors:
Cinjon Resnick,
Roberta Raileanu,
Sanyam Kapoor,
Alexander Peysakhovich,
Kyunghyun Cho,
Joan Bruna
Abstract:
Model-free reinforcement learning (RL) requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to improve the sample efficiency when we have access to demonstrations. Our approach, Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fix…
▽ More
Model-free reinforcement learning (RL) requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to improve the sample efficiency when we have access to demonstrations. Our approach, Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point backwards during the course of training until we reach the initial state. Our contributions are that we analytically characterize the types of environments where Backplay can improve training speed, demonstrate the effectiveness of Backplay both in large grid worlds and a complex four player zero-sum game (Pommerman), and show that Backplay compares favorably to other competitive methods known to improve sample efficiency. This includes reward shaping, behavioral cloning, and reverse curriculum generation.
△ Less
Submitted 21 April, 2022; v1 submitted 18 July, 2018;
originally announced July 2018.
-
Learning Existing Social Conventions via Observationally Augmented Self-Play
Authors:
Adam Lerer,
Alexander Peysakhovich
Abstract:
In order for artificial agents to coordinate effectively with people, they must act consistently with existing conventions (e.g. how to navigate in traffic, which language to speak, or how to coordinate with teammates). A group's conventions can be viewed as a choice of equilibrium in a coordination game. We consider the problem of an agent learning a policy for a coordination game in a simulated…
▽ More
In order for artificial agents to coordinate effectively with people, they must act consistently with existing conventions (e.g. how to navigate in traffic, which language to speak, or how to coordinate with teammates). A group's conventions can be viewed as a choice of equilibrium in a coordination game. We consider the problem of an agent learning a policy for a coordination game in a simulated environment and then using this policy when it enters an existing group. When there are multiple possible conventions we show that learning a policy via multi-agent reinforcement learning (MARL) is likely to find policies which achieve high payoffs at training time but fail to coordinate with the real group into which the agent enters. We assume access to a small number of samples of behavior from the true convention and show that we can augment the MARL objective to help it find policies consistent with the real group's convention. In three environments from the literature - traffic, communication, and team coordination - we observe that augmenting MARL with a small amount of imitation learning greatly increases the probability that the strategy found by MARL fits well with the existing social convention. We show that this works even in an environment where standard training methods very rarely find the true convention of the agent's partners.
△ Less
Submitted 13 March, 2019; v1 submitted 26 June, 2018;
originally announced June 2018.
-
Consequentialist conditional cooperation in social dilemmas with imperfect information
Authors:
Alexander Peysakhovich,
Adam Lerer
Abstract:
Social dilemmas, where mutual cooperation can lead to high payoffs but participants face incentives to cheat, are ubiquitous in multi-agent interaction. We wish to construct agents that cooperate with pure cooperators, avoid exploitation by pure defectors, and incentivize cooperation from the rest. However, often the actions taken by a partner are (partially) unobserved or the consequences of indi…
▽ More
Social dilemmas, where mutual cooperation can lead to high payoffs but participants face incentives to cheat, are ubiquitous in multi-agent interaction. We wish to construct agents that cooperate with pure cooperators, avoid exploitation by pure defectors, and incentivize cooperation from the rest. However, often the actions taken by a partner are (partially) unobserved or the consequences of individual actions are hard to predict. We show that in a large class of games good strategies can be constructed by conditioning one's behavior solely on outcomes (ie. one's past rewards). We call this consequentialist conditional cooperation. We show how to construct such strategies using deep reinforcement learning techniques and demonstrate, both analytically and experimentally, that they are effective in social dilemmas beyond simple matrix games. We also show the limitations of relying purely on consequences and discuss the need for understanding both the consequences of and the intentions behind an action.
△ Less
Submitted 2 March, 2018; v1 submitted 18 October, 2017;
originally announced October 2017.
-
Prosocial learning agents solve generalized Stag Hunts better than selfish ones
Authors:
Alexander Peysakhovich,
Adam Lerer
Abstract:
Deep reinforcement learning has become an important paradigm for constructing agents that can enter complex multi-agent situations and improve their policies through experience. One commonly used technique is reactive training - applying standard RL methods while treating other agents as a part of the learner's environment. It is known that in general-sum games reactive training can lead groups of…
▽ More
Deep reinforcement learning has become an important paradigm for constructing agents that can enter complex multi-agent situations and improve their policies through experience. One commonly used technique is reactive training - applying standard RL methods while treating other agents as a part of the learner's environment. It is known that in general-sum games reactive training can lead groups of agents to converge to inefficient outcomes. We focus on one such class of environments: Stag Hunt games. Here agents either choose a risky cooperative policy (which leads to high payoffs if both choose it but low payoffs to an agent who attempts it alone) or a safe one (which leads to a safe payoff no matter what). We ask how we can change the learning rule of a single agent to improve its outcomes in Stag Hunts that include other reactive learners. We extend existing work on reward-shaping in multi-agent reinforcement learning and show that that making a single agent prosocial, that is, making them care about the rewards of their partners can increase the probability that groups converge to good outcomes. Thus, even if we control a single agent in a group making that agent prosocial can increase our agent's long-run payoff. We show experimentally that this result carries over to a variety of more complex environments with Stag Hunt-like dynamics including ones where agents must learn from raw input pixels.
△ Less
Submitted 8 December, 2017; v1 submitted 8 September, 2017;
originally announced September 2017.
-
Maintaining cooperation in complex social dilemmas using deep reinforcement learning
Authors:
Adam Lerer,
Alexander Peysakhovich
Abstract:
Social dilemmas are situations where individuals face a temptation to increase their payoffs at a cost to total welfare. Building artificially intelligent agents that achieve good outcomes in these situations is important because many real world interactions include a tension between selfish interests and the welfare of others. We show how to modify modern reinforcement learning methods to constru…
▽ More
Social dilemmas are situations where individuals face a temptation to increase their payoffs at a cost to total welfare. Building artificially intelligent agents that achieve good outcomes in these situations is important because many real world interactions include a tension between selfish interests and the welfare of others. We show how to modify modern reinforcement learning methods to construct agents that act in ways that are simple to understand, nice (begin by cooperating), provokable (try to avoid being exploited), and forgiving (try to return to mutual cooperation). We show both theoretically and experimentally that such agents can maintain cooperation in Markov social dilemmas. Our construction does not require training methods beyond a modification of self-play, thus if an environment is such that good strategies can be constructed in the zero-sum case (eg. Atari) then we can construct agents that solve social dilemmas in this environment.
△ Less
Submitted 2 March, 2018; v1 submitted 4 July, 2017;
originally announced July 2017.
-
Multi-Agent Cooperation and the Emergence of (Natural) Language
Authors:
Angeliki Lazaridou,
Alexander Peysakhovich,
Marco Baroni
Abstract:
The current mainstream approach to train natural language systems is to expose them to large amounts of text. This passive learning is problematic if we are interested in developing interactive machines, such as conversational agents. We propose a framework for language learning that relies on multi-agent communication. We study this learning in the context of referential games. In these games, a…
▽ More
The current mainstream approach to train natural language systems is to expose them to large amounts of text. This passive learning is problematic if we are interested in developing interactive machines, such as conversational agents. We propose a framework for language learning that relies on multi-agent communication. We study this learning in the context of referential games. In these games, a sender and a receiver see a pair of images. The sender is told one of them is the target and is allowed to send a message from a fixed, arbitrary vocabulary to the receiver. The receiver must rely on this message to identify the target. Thus, the agents develop their own language interactively out of the need to communicate. We show that two networks with simple configurations are able to learn to coordinate in the referential game. We further explore how to make changes to the game environment to cause the "word meanings" induced in the game to better reflect intuitive semantic properties of the images. In addition, we present a simple strategy for grounding the agents' code into natural language. Both of these are necessary steps towards developing machines that are able to communicate with humans productively.
△ Less
Submitted 5 March, 2017; v1 submitted 21 December, 2016;
originally announced December 2016.
-
Combining observational and experimental data to find heterogeneous treatment effects
Authors:
Alexander Peysakhovich,
Akos Lada
Abstract:
Every design choice will have different effects on different units. However traditional A/B tests are often underpowered to identify these heterogeneous effects. This is especially true when the set of unit-level attributes is high-dimensional and our priors are weak about which particular covariates are important. However, there are often observational data sets available that are orders of magni…
▽ More
Every design choice will have different effects on different units. However traditional A/B tests are often underpowered to identify these heterogeneous effects. This is especially true when the set of unit-level attributes is high-dimensional and our priors are weak about which particular covariates are important. However, there are often observational data sets available that are orders of magnitude larger. We propose a method to combine these two data sources to estimate heterogeneous treatment effects. First, we use observational time series data to estimate a mapping from covariates to unit-level effects. These estimates are likely biased but under some conditions the bias preserves unit-level relative rank orderings. If these conditions hold, we only need sufficient experimental data to identify a monotonic, one-dimensional transformation from observationally predicted treatment effects to real treatment effects. This reduces power demands greatly and makes the detection of heterogeneous effects much easier. As an application, we show how our method can be used to improve Facebook page recommendations.
△ Less
Submitted 7 November, 2016;
originally announced November 2016.