Zum Hauptinhalt springen

Showing 1–19 of 19 results for author: Sunehag, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.05162  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    A Review of Cooperation in Multi-agent Learning

    Authors: Yali Du, Joel Z. Leibo, Usman Islam, Richard Willis, Peter Sunehag

    Abstract: Cooperation in multi-agent learning (MAL) is a topic at the intersection of numerous disciplines, including game theory, economics, social sciences, and evolutionary biology. Research in this area aims to understand both how agents can coordinate effectively when goals are aligned and how they may cooperate in settings where gains from working together are possible but possibilities for conflict a… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: 29 pages, 3 figures

  2. arXiv:2302.01180  [pdf, other

    cs.AI cs.NE

    Diversity Through Exclusion (DTE): Niche Identification for Reinforcement Learning through Value-Decomposition

    Authors: Peter Sunehag, Alexander Sasha Vezhnevets, Edgar Duéñez-Guzmán, Igor Mordach, Joel Z. Leibo

    Abstract: Many environments contain numerous available niches of variable value, each associated with a different local optimum in the space of behaviors (policy space). In such situations it is often difficult to design a learning process capable of evading distraction by poor local optima long enough to stumble upon the best available niche. In this work we propose a generic reinforcement learning (RL) al… ▽ More

    Submitted 3 February, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: Full length paper accompanying short format appearing at AAMAS 2023

    ACM Class: I.2

  3. arXiv:2211.13746  [pdf, other

    cs.MA cs.AI cs.GT cs.NE

    Melting Pot 2.0

    Authors: John P. Agapiou, Alexander Sasha Vezhnevets, Edgar A. Duéñez-Guzmán, Jayd Matyas, Yiran Mao, Peter Sunehag, Raphael Köster, Udari Madhushani, Kavya Kopparapu, Ramona Comanescu, DJ Strouse, Michael B. Johanson, Sukhdeep Singh, Julia Haas, Igor Mordatch, Dean Mobbs, Joel Z. Leibo

    Abstract: Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by "solipsistic" approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures ge… ▽ More

    Submitted 30 October, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: 69 pages, 54 figures. arXiv admin note: text overlap with arXiv:2107.06857

  4. arXiv:2107.06857  [pdf, other

    cs.MA cs.AI

    Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot

    Authors: Joel Z. Leibo, Edgar Duéñez-Guzmán, Alexander Sasha Vezhnevets, John P. Agapiou, Peter Sunehag, Raphael Koster, Jayd Matyas, Charles Beattie, Igor Mordatch, Thore Graepel

    Abstract: Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess generalization to novel situations as their primary objective (unlike supervised-learning benchmarks). Our contribution, Melting Pot, is a MARL evaluation suite that fills this gap, and uses reinforcement learning to reduce the human labor required to create novel test scenarios. This works because one agent's b… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

    Comments: Accepted to ICML 2021 and presented as a long talk; 33 pages; 9 figures

    Journal ref: In International Conference on Machine Learning 2021 (pp. 6187-6199). PMLR

  5. arXiv:2006.06051  [pdf, other

    cs.LG cs.GT cs.MA stat.ML

    Learning to Incentivize Other Learning Agents

    Authors: Jiachen Yang, Ang Li, Mehrdad Farajtabar, Peter Sunehag, Edward Hughes, Hongyuan Zha

    Abstract: The challenge of developing powerful and general Reinforcement Learning (RL) agents has received increasing attention in recent years. Much of this effort has focused on the single-agent setting, in which an agent maximizes a predefined extrinsic reward function. However, a long-term question inevitably arises: how will such independent agents cooperate when they are continually learning and actin… ▽ More

    Submitted 19 October, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: 20 pages, 11 figures. To appear in 34th Conference on Neural Information Processing Systems (NeurIPS 2020)

  6. arXiv:1812.07019  [pdf, other

    cs.NE cs.MA q-bio.PE

    Malthusian Reinforcement Learning

    Authors: Joel Z. Leibo, Julien Perolat, Edward Hughes, Steven Wheelwright, Adam H. Marblestone, Edgar Duéñez-Guzmán, Peter Sunehag, Iain Dunning, Thore Graepel

    Abstract: Here we explore a new algorithmic framework for multi-agent reinforcement learning, called Malthusian reinforcement learning, which extends self-play to include fitness-linked population size dynamics that drive ongoing innovation. In Malthusian RL, increases in a subpopulation's average return drive subsequent increases in its size, just as Thomas Malthus argued in 1798 was the relationship betwe… ▽ More

    Submitted 3 March, 2019; v1 submitted 17 December, 2018; originally announced December 2018.

    Comments: 9 pages, 2 tables, 4 figures

  7. arXiv:1706.05296  [pdf, other

    cs.AI

    Value-Decomposition Networks For Cooperative Multi-Agent Learning

    Authors: Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel

    Abstract: We study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal. This class of learning problems is difficult because of the often large combined action and observation spaces. In the fully centralized and decentralized approaches, we find the problem of spurious rewards and a phenomenon we call the "lazy agent" problem, which arises due to partial observab… ▽ More

    Submitted 16 June, 2017; originally announced June 2017.

    ACM Class: I.2.11

  8. arXiv:1512.07679  [pdf, other

    cs.AI cs.LG cs.NE stat.ML

    Deep Reinforcement Learning in Large Discrete Action Spaces

    Authors: Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, Ben Coppin

    Abstract: Being able to reason in an environment with a large number of discrete actions is essential to bringing reinforcement learning to a larger class of problems. Recommender systems, industrial plants and language models are only some of the many real-world tasks involving large numbers of discrete actions for which current methods are difficult or even often impossible to apply. An ability to general… ▽ More

    Submitted 4 April, 2016; v1 submitted 23 December, 2015; originally announced December 2015.

  9. arXiv:1512.01124  [pdf, other

    cs.AI cs.HC cs.LG

    Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions

    Authors: Peter Sunehag, Richard Evans, Gabriel Dulac-Arnold, Yori Zwols, Daniel Visentin, Ben Coppin

    Abstract: Many real-world problems come with action spaces represented as feature vectors. Although high-dimensional control is a largely unsolved problem, there has recently been progress for modest dimensionalities. Here we report on a successful attempt at addressing problems of dimensionality as high as $2000$, of a particular form. Motivated by important applications such as recommendation systems that… ▽ More

    Submitted 16 December, 2015; v1 submitted 3 December, 2015; originally announced December 2015.

  10. arXiv:1308.4828  [pdf, ps, other

    cs.LG

    The Sample-Complexity of General Reinforcement Learning

    Authors: Tor Lattimore, Marcus Hutter, Peter Sunehag

    Abstract: We present a new algorithm for general reinforcement learning where the true environment is known to belong to a finite class of N arbitrary models. The algorithm is shown to be near-optimal for all but O(N log^2 N) time-steps with high probability. Infinite classes are also considered where we show that compactness is a key criterion for determining the existence of uniform sample-complexity boun… ▽ More

    Submitted 22 August, 2013; originally announced August 2013.

    Comments: 16 pages

  11. arXiv:1307.3435  [pdf, ps, other

    cs.AI

    On Nicod's Condition, Rules of Induction and the Raven Paradox

    Authors: Hadi Mohasel Afshar, Peter Sunehag

    Abstract: Philosophers writing about the ravens paradox often note that Nicod's Condition (NC) holds given some set of background information, and fails to hold against others, but rarely go any further. That is, it is usually not explored which background information makes NC true or false. The present paper aims to fill this gap. For us, "(objective) background knowledge" is restricted to information that… ▽ More

    Submitted 15 July, 2013; v1 submitted 12 July, 2013; originally announced July 2013.

    Comments: On raven paradox, Nicod's condition, projectability, induction

  12. arXiv:1307.0127  [pdf, ps, other

    cs.LG stat.ML

    Concentration and Confidence for Discrete Bayesian Sequence Predictors

    Authors: Tor Lattimore, Marcus Hutter, Peter Sunehag

    Abstract: Bayesian sequence prediction is a simple technique for predicting future symbols sampled from an unknown measure on infinite sequences over a countable alphabet. While strong bounds on the expected cumulative error are known, there are only limited results on the distribution of this error. We prove tight high-probability bounds on the cumulative error, which is measured in terms of the Kullback-L… ▽ More

    Submitted 29 June, 2013; originally announced July 2013.

    Comments: 17 pages

  13. arXiv:1210.0077  [pdf, ps, other

    cs.AI cs.LG

    Optimistic Agents are Asymptotically Optimal

    Authors: Peter Sunehag, Marcus Hutter

    Abstract: We use optimism to introduce generic asymptotically optimal reinforcement learning agents. They achieve, with an arbitrary finite or compact class of environments, asymptotically optimal behavior. Furthermore, in the finite deterministic case we provide finite error bounds.

    Submitted 29 September, 2012; originally announced October 2012.

    Comments: 13 LaTeX pages

    Journal ref: Proc. 25th Australasian Joint Conference on Artificial Intelligence (AusAI 2012) 15-26

  14. arXiv:1201.2056  [pdf, ps, other

    cs.IT cs.LG

    Adaptive Context Tree Weighting

    Authors: Alexander O'Neill, Marcus Hutter, Wen Shao, Peter Sunehag

    Abstract: We describe an adaptive context tree weighting (ACTW) algorithm, as an extension to the standard context tree weighting (CTW) algorithm. Unlike the standard CTW algorithm, which weights all observations equally regardless of the depth, ACTW gives increasing weight to more recent observations, aiming to improve performance in cases where the input sequence is from a non-stationary distribution. Dat… ▽ More

    Submitted 10 January, 2012; originally announced January 2012.

    Comments: 11 LaTeX pages, 7 tables

  15. arXiv:1111.6117  [pdf, ps, other

    cs.AI

    Principles of Solomonoff Induction and AIXI

    Authors: Peter Sunehag, Marcus Hutter

    Abstract: We identify principles characterizing Solomonoff Induction by demands on an agent's external behaviour. Key concepts are rationality, computability, indifference and time consistency. Furthermore, we discuss extensions to the full AI case to derive AIXI.

    Submitted 25 November, 2011; originally announced November 2011.

    Comments: 14 LaTeX pages

    Journal ref: Proc. Solomonoff 85th Memorial Conference (SOL 2011) pages 386-398

  16. arXiv:1111.3854  [pdf, ps, other

    cs.IT

    (Non-)Equivalence of Universal Priors

    Authors: Ian Wood, Peter Sunehag, Marcus Hutter

    Abstract: Ray Solomonoff invented the notion of universal induction featuring an aptly termed "universal" prior probability function over all possible computable environments. The essential property of this prior was its ability to dominate all other such priors. Later, Levin introduced another construction --- a mixture of all possible priors or `universal mixture'. These priors are well known to be equiva… ▽ More

    Submitted 16 November, 2011; originally announced November 2011.

    Comments: 10 LaTeX pages, 1 figure

  17. arXiv:1108.3614  [pdf, ps, other

    cs.AI cs.RO

    Feature Reinforcement Learning In Practice

    Authors: Phuong Nguyen, Peter Sunehag, Marcus Hutter

    Abstract: Following a recent surge in using history-based methods for resolving perceptual aliasing in reinforcement learning, we introduce an algorithm based on the feature reinforcement learning framework called PhiMDP. To create a practical algorithm we devise a stochastic search procedure for a class of context trees based on parallel tempering and a specialized proposal distribution. We provide the fir… ▽ More

    Submitted 17 August, 2011; originally announced August 2011.

  18. arXiv:1107.5520  [pdf, ps, other

    cs.LG

    Axioms for Rational Reinforcement Learning

    Authors: Peter Sunehag, Marcus Hutter

    Abstract: We provide a formal, simple and intuitive theory of rational decision making including sequential decisions that affect the environment. The theory has a geometric flavor, which makes the arguments easy to visualize and understand. Our theory is for complete decision makers, which means that they have a complete set of preferences. Our main result shows that a complete rational decision maker impl… ▽ More

    Submitted 27 July, 2011; originally announced July 2011.

    Comments: 16 LaTeX pages

    Journal ref: Proc. 22nd International Conf. on Algorithmic Learning Theory (ALT-2011) pages 338-352

  19. arXiv:1007.2075  [pdf, ps, other

    cs.LG cs.IT

    Consistency of Feature Markov Processes

    Authors: Peter Sunehag, Marcus Hutter

    Abstract: We are studying long term sequence prediction (forecasting). We approach this by investigating criteria for choosing a compact useful state representation. The state is supposed to summarize useful information from the history. We want a method that is asymptotically consistent in the sense it will provably eventually only choose between alternatives that satisfy an optimality property related to… ▽ More

    Submitted 13 July, 2010; originally announced July 2010.

    Comments: 16 LaTeX pages

    Journal ref: Proc. 21st International Conf. on Algorithmic Learning Theory (ALT-2010) pages 360-374