Search | arXiv e-print repository

AI For Global Climate Cooperation 2023 Competition Proceedings

Authors: Yoshua Bengio, Prateek Gupta, Lu Li, Soham Phade, Sunil Srinivasa, Andrew Williams, Tianyu Zhang, Yang Zhang, Stephan Zheng

Abstract: The international community must collaborate to mitigate climate change and sustain economic growth. However, collaboration is hard to achieve, partly because no global authority can ensure compliance with international climate agreements. Combining AI with climate-economic simulations offers a promising solution to design international frameworks, including negotiation protocols and climate agree… ▽ More The international community must collaborate to mitigate climate change and sustain economic growth. However, collaboration is hard to achieve, partly because no global authority can ensure compliance with international climate agreements. Combining AI with climate-economic simulations offers a promising solution to design international frameworks, including negotiation protocols and climate agreements, that promote and incentivize collaboration. In addition, these frameworks should also have policy goals fulfillment, and sustained commitment, taking into account climate-economic dynamics and strategic behaviors. These challenges require an interdisciplinary approach across machine learning, economics, climate science, law, policy, ethics, and other fields. Towards this objective, we organized AI for Global Climate Cooperation, a Mila competition in which teams submitted proposals and analyses of international frameworks, based on (modifications of) RICE-N, an AI-driven integrated assessment model (IAM). In particular, RICE-N supports modeling regional decision-making using AI agents. Furthermore, the IAM then models the climate-economic impact of those decisions into the future. Whereas the first track focused only on performance metrics, the proposals submitted to the second track were evaluated both quantitatively and qualitatively. The quantitative evaluation focused on a combination of (i) the degree of mitigation of global temperature rise and (ii) the increase in economic productivity. On the other hand, an interdisciplinary panel of human experts in law, policy, sociology, economics and environmental science, evaluated the solutions qualitatively. In particular, the panel considered the effectiveness, simplicity, feasibility, ethics, and notions of climate justice of the protocols. In the third track, the participants were asked to critique and improve RICE-N. △ Less

Submitted 10 July, 2023; originally announced July 2023.

arXiv:2304.04668 [pdf, other]

MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning

Authors: Arundhati Banerjee, Soham Phade, Stefano Ermon, Stephan Zheng

Abstract: We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes. This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people. Moreover, the principal should be few-shot adaptable and minimize the number of in… ▽ More We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes. This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people. Moreover, the principal should be few-shot adaptable and minimize the number of interventions, because interventions are often costly. We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents with different learning strategies and reward functions. We validate this approach step-by-step. First, in a Stackelberg setting with a best-response agent, we show that meta-learning enables quick convergence to the theoretically known Stackelberg equilibrium at test time, although noisy observations severely increase the sample complexity. We then show that our model-based meta-learning approach is cost-effective in intervening on bandit agents with unseen explore-exploit strategies. Finally, we outperform baselines that use either meta-learning or agent behavior modeling, in both $0$-shot and $K=1$-shot settings with partial agent information. △ Less

Submitted 9 January, 2024; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: Published in TMLR

arXiv:2212.06891 [pdf, other]

Interactive Learning with Pricing for Optimal and Stable Allocations in Markets

Authors: Yigit Efe Erginbas, Soham Phade, Kannan Ramchandran

Abstract: Large-scale online recommendation systems must facilitate the allocation of a limited number of items among competing users while learning their preferences from user feedback. As a principled way of incorporating market constraints and user incentives in the design, we consider our objectives to be two-fold: maximal social welfare with minimal instability. To maximize social welfare, our proposed… ▽ More Large-scale online recommendation systems must facilitate the allocation of a limited number of items among competing users while learning their preferences from user feedback. As a principled way of incorporating market constraints and user incentives in the design, we consider our objectives to be two-fold: maximal social welfare with minimal instability. To maximize social welfare, our proposed framework enhances the quality of recommendations by exploring allocations that optimistically maximize the rewards. To minimize instability, a measure of users' incentives to deviate from recommended allocations, the algorithm prices the items based on a scheme derived from the Walrasian equilibria. Though it is known that these equilibria yield stable prices for markets with known user preferences, our approach accounts for the inherent uncertainty in the preferences and further ensures that the users accept their recommendations under offered prices. To the best of our knowledge, our approach is the first to integrate techniques from combinatorial bandits, optimal resource allocation, and collaborative filtering to obtain an algorithm that achieves sub-linear social welfare regret as well as sub-linear instability. Empirical studies on synthetic and real-world data also demonstrate the efficacy of our strategy compared to approaches that do not fully incorporate all these aspects. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2207.04143

arXiv:2208.07004 [pdf, other]

AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N

Authors: Tianyu Zhang, Andrew Williams, Soham Phade, Sunil Srinivasa, Yang Zhang, Prateek Gupta, Yoshua Bengio, Stephan Zheng

Abstract: Comprehensive global cooperation is essential to limit global temperature increases while continuing economic development, e.g., reducing severe inequality or achieving long-term economic growth. Achieving long-term cooperation on climate change mitigation with n strategic agents poses a complex game-theoretic problem. For example, agents may negotiate and reach climate agreements, but there is no… ▽ More Comprehensive global cooperation is essential to limit global temperature increases while continuing economic development, e.g., reducing severe inequality or achieving long-term economic growth. Achieving long-term cooperation on climate change mitigation with n strategic agents poses a complex game-theoretic problem. For example, agents may negotiate and reach climate agreements, but there is no central authority to enforce adherence to those agreements. Hence, it is critical to design negotiation and agreement frameworks that foster cooperation, allow all agents to meet their individual policy objectives, and incentivize long-term adherence. This is an interdisciplinary challenge that calls for collaboration between researchers in machine learning, economics, climate science, law, policy, ethics, and other fields. In particular, we argue that machine learning is a critical tool to address the complexity of this domain. To facilitate this research, here we introduce RICE-N, a multi-region integrated assessment model that simulates the global climate and economy, and which can be used to design and evaluate the strategic outcomes for different negotiation and agreement frameworks. We also describe how to use multi-agent reinforcement learning to train rational agents using RICE-N. This framework underpinsAI for Global Climate Cooperation, a working group collaboration and competition on climate negotiation and agreement design. Here, we invite the scientific community to design and evaluate their solutions using RICE-N, machine learning, economic intuition, and other domain knowledge. More information can be found on www.ai4climatecoop.org. △ Less

Submitted 15 August, 2022; originally announced August 2022.

Comments: 12 pages (21 with appendices), 5 figures. For associated working group, see https://www.ai4climatecoop.org/

MSC Class: 93A16; 91-10; 68T07 ACM Class: I.2.11; J.2; J.4

arXiv:2207.04143 [pdf, other]

Interactive Recommendations for Optimal Allocations in Markets with Constraints

Authors: Yigit Efe Erginbas, Soham Phade, Kannan Ramchandran

Abstract: Recommendation systems when employed in markets play a dual role: they assist users in selecting their most desired items from a large pool and they help in allocating a limited number of items to the users who desire them the most. Despite the prevalence of capacity constraints on allocations in many real-world recommendation settings, a principled way of incorporating them in the design of these… ▽ More Recommendation systems when employed in markets play a dual role: they assist users in selecting their most desired items from a large pool and they help in allocating a limited number of items to the users who desire them the most. Despite the prevalence of capacity constraints on allocations in many real-world recommendation settings, a principled way of incorporating them in the design of these systems has been lacking. Motivated by this, we propose an interactive framework where the system provider can enhance the quality of recommendations to the users by opportunistically exploring allocations that maximize user rewards and respect the capacity constraints using appropriate pricing mechanisms. We model the problem as an instance of a low-rank combinatorial multi-armed bandit problem with selection constraints on the arms. We employ an integrated approach using techniques from collaborative filtering, combinatorial bandits, and optimal resource allocation to provide an algorithm that provably achieves sub-linear regret, namely $\tilde{\mathcal{O}} ( \sqrt{N M (N+M) RT} )$ in $T$ rounds for a problem with $N$ users, $M$ items and rank $R$ mean reward matrix. Empirical studies on synthetic and real-world data also demonstrate the effectiveness and performance of our approach. △ Less

Submitted 28 July, 2022; v1 submitted 8 July, 2022; originally announced July 2022.

arXiv:2201.01163 [pdf, other]

Analyzing Micro-Founded General Equilibrium Models with Many Agents using Deep Reinforcement Learning

Authors: Michael Curry, Alexander Trott, Soham Phade, Yu Bai, Stephan Zheng

Abstract: Real economies can be modeled as a sequential imperfect-information game with many heterogeneous agents, such as consumers, firms, and governments. Dynamic general equilibrium (DGE) models are often used for macroeconomic analysis in this setting. However, finding general equilibria is challenging using existing theoretical or computational methods, especially when using microfoundations to model… ▽ More Real economies can be modeled as a sequential imperfect-information game with many heterogeneous agents, such as consumers, firms, and governments. Dynamic general equilibrium (DGE) models are often used for macroeconomic analysis in this setting. However, finding general equilibria is challenging using existing theoretical or computational methods, especially when using microfoundations to model individual agents. Here, we show how to use deep multi-agent reinforcement learning (MARL) to find $ε$-meta-equilibria over agent types in microfounded DGE models. Whereas standard MARL fails to learn non-trivial solutions, our structured learning curricula enable stable convergence to meaningful solutions. Conceptually, our approach is more flexible and does not need unrealistic assumptions, e.g., continuous market clearing, that are commonly used for analytical tractability. Furthermore, our end-to-end GPU implementation enables fast real-time convergence with a large number of RL economic agents. We showcase our approach in open and closed real-business-cycle (RBC) models with 100 worker-consumers, 10 firms, and a social planner who taxes and redistributes. We validate the learned solutions are $ε$-meta-equilibria through best-response analyses, show that they align with economic intuitions, and show our approach can learn a spectrum of qualitatively distinct $ε$-meta-equilibria in open RBC models. As such, we show that hardware-accelerated MARL is a promising framework for modeling the complexity of economies based on microfoundations. △ Less

Submitted 23 February, 2022; v1 submitted 3 January, 2022; originally announced January 2022.

arXiv:2101.08722 [pdf, other]

Mechanism Design for Cumulative Prospect Theoretic Agents: A General Framework and the Revelation Principle

Authors: Soham R. Phade, Venkat Anantharam

Abstract: This paper initiates a discussion of mechanism design when the participating agents exhibit preferences that deviate from expected utility theory (EUT). In particular, we consider mechanism design for systems where the agents are modeled as having cumulative prospect theory (CPT) preferences, which is a generalization of EUT preferences. We point out some of the key modifications needed in the the… ▽ More This paper initiates a discussion of mechanism design when the participating agents exhibit preferences that deviate from expected utility theory (EUT). In particular, we consider mechanism design for systems where the agents are modeled as having cumulative prospect theory (CPT) preferences, which is a generalization of EUT preferences. We point out some of the key modifications needed in the theory of mechanism design that arise from agents having CPT preferences and some of the shortcomings of the classical mechanism design framework. In particular, we show that the revelation principle, which has traditionally played a fundamental role in mechanism design, does not continue to hold under CPT. We develop an appropriate framework that we call mediated mechanism design which allows us to recover the revelation principle for CPT agents. We conclude with some interesting directions for future work. △ Less

Submitted 21 January, 2021; originally announced January 2021.

arXiv:2012.02125 [pdf, other]

On the Impossibility of Convergence of Mixed Strategies with No Regret Learning

Authors: Vidya Muthukumar, Soham Phade, Anant Sahai

Abstract: We study the limiting behavior of the mixed strategies that result from optimal no-regret learning strategies in a repeated game setting where the stage game is any 2 by 2 competitive game. We consider optimal no-regret algorithms that are mean-based and monotonic in their argument. We show that for any such algorithm, the limiting mixed strategies of the players cannot converge almost surely to a… ▽ More We study the limiting behavior of the mixed strategies that result from optimal no-regret learning strategies in a repeated game setting where the stage game is any 2 by 2 competitive game. We consider optimal no-regret algorithms that are mean-based and monotonic in their argument. We show that for any such algorithm, the limiting mixed strategies of the players cannot converge almost surely to any Nash equilibrium. This negative result is also shown to hold under a broad relaxation of these assumptions, including popular variants of Online-Mirror-Descent with optimism and/or adaptive step-sizes. Finally, we conjecture that the monotonicity assumption can be removed, and provide partial evidence for this conjecture. Our results identify the inherent stochasticity in players' realizations as a critical factor underlying this divergence in outcomes between using the opponent's mixtures and realizations to make updates. △ Less

Submitted 2 March, 2022; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: 47 pages, 12 figures

arXiv:2008.07793 [pdf, other]

Utility-based Resource Allocation and Pricing for Serverless Computing

Authors: Vipul Gupta, Soham Phade, Thomas Courtade, Kannan Ramchandran

Abstract: Serverless computing platforms currently rely on basic pricing schemes that are static and do not reflect customer feedback. This leads to significant inefficiencies from a total utility perspective. As one of the fastest-growing cloud services, serverless computing provides an opportunity to better serve both users and providers through the incorporation of market-based strategies for pricing and… ▽ More Serverless computing platforms currently rely on basic pricing schemes that are static and do not reflect customer feedback. This leads to significant inefficiencies from a total utility perspective. As one of the fastest-growing cloud services, serverless computing provides an opportunity to better serve both users and providers through the incorporation of market-based strategies for pricing and resource allocation. With the help of utility functions to model the delay-sensitivity of customers, we propose a novel scheduler to allocate resources for serverless computing. The resulting resource allocation scheme is optimal in the sense that it maximizes the aggregate utility of all users across the system, thus maximizing social welfare. Our approach gives rise to a natural dynamic pricing scheme that is obtained by solving an optimization problem in its dual form. We further develop feedback mechanisms that allow the cloud provider to converge to optimal resource allocation, even when the users' utilities are private and unknown to the service provider. Simulations show that our approach can track market demand and achieve significantly higher social welfare (or, equivalently, cost savings for customers) compared to existing schemes. △ Less

Submitted 24 January, 2022; v1 submitted 18 August, 2020; originally announced August 2020.

Comments: 31 pages, 10 figures

arXiv:2004.09592 [pdf, other]

Black-Box Strategies and Equilibrium for Games with Cumulative Prospect Theoretic Players

Authors: Soham R. Phade, Venkat Anantharam

Abstract: The betweenness property of preference relations states that a probability mixture of two lotteries should lie between them in preference. It is a weakened form of the independence property and hence satisfied in expected utility theory (EUT). Experimental violations of betweenness are well-documented and several preference theories, notably cumulative prospect theory (CPT), do not satisfy between… ▽ More The betweenness property of preference relations states that a probability mixture of two lotteries should lie between them in preference. It is a weakened form of the independence property and hence satisfied in expected utility theory (EUT). Experimental violations of betweenness are well-documented and several preference theories, notably cumulative prospect theory (CPT), do not satisfy betweenness. We prove that CPT preferences satisfy betweenness if and only if they conform with EUT preferences. In game theory, lack of betweenness in the players' preference relations makes it essential to distinguish between the two interpretations of a mixed action by a player - conscious randomizations by the player and the uncertainty in the beliefs of the opponents. We elaborate on this distinction and study its implication for the definition of Nash equilibrium. This results in four different notions of equilibrium, with pure and mixed action Nash equilibrium being two of them. We dub the other two pure and mixed black-box strategy Nash equilibrium respectively. We resolve the issue of existence of such equilibria and examine how these different notions of equilibrium compare with each other. △ Less

Submitted 20 April, 2020; originally announced April 2020.

arXiv:1812.00501 [pdf, ps, other]

doi 10.1007/978-3-030-16989-3_4

Optimal Resource Allocation over Networks via Lottery-Based Mechanisms

Authors: Soham R. Phade, Venkat Anantharam

Abstract: We show that, in a resource allocation problem, the ex ante aggregate utility of players with cumulative-prospect-theoretic preferences can be increased over deterministic allocations by implementing lotteries. We formulate an optimization problem, called the system problem, to find the optimal lottery allocation. The system problem exhibits a two-layer structure comprised of a permutation profile… ▽ More We show that, in a resource allocation problem, the ex ante aggregate utility of players with cumulative-prospect-theoretic preferences can be increased over deterministic allocations by implementing lotteries. We formulate an optimization problem, called the system problem, to find the optimal lottery allocation. The system problem exhibits a two-layer structure comprised of a permutation profile and optimal allocations given the permutation profile. For any fixed permutation profile, we provide a market-based mechanism to find the optimal allocations and prove the existence of equilibrium prices. We show that the system problem has a duality gap, in general, and that the primal problem is NP-hard. We then consider a relaxation of the system problem and derive some qualitative features of the optimal lottery structure. △ Less

Submitted 2 December, 2018; originally announced December 2018.

arXiv:1804.08005 [pdf, other]

Learning in Games with Cumulative Prospect Theoretic Preferences

Authors: Soham R. Phade, Venkat Anantharam

Abstract: We consider repeated games where the players behave according to cumulative prospect theory (CPT). We show that, when the players have calibrated strategies and behave according to CPT, the natural analog of the notion of correlated equilibrium in the CPT case, as defined by Keskin, is not enough to capture all subsequential limits of the empirical distribution of action play. We define the notion… ▽ More We consider repeated games where the players behave according to cumulative prospect theory (CPT). We show that, when the players have calibrated strategies and behave according to CPT, the natural analog of the notion of correlated equilibrium in the CPT case, as defined by Keskin, is not enough to capture all subsequential limits of the empirical distribution of action play. We define the notion of a mediated CPT correlated equilibrium via an extension of the stage game to a so-called mediated game. We then show, along the lines of the result of Foster and Vohra about convergence to the set of correlated equilibria when the players behave according to expected utility theory that, in the CPT case, under calibrated learning the empirical distribution of action play converges to the set of all mediated CPT correlated equilibria. We also show that, in general, the set of CPT correlated equilibria is not approachable in the Blackwell approachability sense. We observe that a mediated game is a specific type of a game with communication, as introduced by Myerson, and as a consequence we get that the revelation principle does not hold under CPT. △ Less

Submitted 16 July, 2020; v1 submitted 21 April, 2018; originally announced April 2018.

arXiv:1712.00859 [pdf, other]

doi 10.1287/deca.2018.0378

On the Geometry of Nash and Correlated Equilibria with Cumulative Prospect Theoretic Preferences

Authors: Soham R. Phade, Venkat Anantharam

Abstract: It is known that the set of all correlated equilibria of an n-player non-cooperative game is a convex polytope and includes all the Nash equilibria. Further, the Nash equilibria all lie on the boundary of this polytope. We study the geometry of both these equilibrium notions when the players have cumulative prospect theoretic (CPT) preferences. The set of CPT correlated equilibria includes all the… ▽ More It is known that the set of all correlated equilibria of an n-player non-cooperative game is a convex polytope and includes all the Nash equilibria. Further, the Nash equilibria all lie on the boundary of this polytope. We study the geometry of both these equilibrium notions when the players have cumulative prospect theoretic (CPT) preferences. The set of CPT correlated equilibria includes all the CPT Nash equilibria but it need not be a convex polytope. We show that it can, in fact, be disconnected. However, all the CPT Nash equilibria continue to lie on its boundary. We also characterize the sets of CPT correlated equilibria and CPT Nash equilibria for all 2x2 games. △ Less

Submitted 3 December, 2017; originally announced December 2017.

Showing 1–13 of 13 results for author: Phade, S