Zum Hauptinhalt springen

Showing 1–28 of 28 results for author: Guez, A

.
  1. arXiv:2406.02035  [pdf, other

    cs.LG cs.AI

    A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning

    Authors: Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Avila Pires, Yunhao Tang, Clare Lyle, Mark Rowland, Nicolas Heess, Diana Borsa, Arthur Guez, Will Dabney

    Abstract: Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents. Self-predictive learning provides means to jointly learn a latent representation and dynamics model by bootstrapping from future latent representations (BYOL). Recent work has developed theoretical insights into these algorithms by studying a continuous-time ODE model for self-predictive representation le… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  4. arXiv:2306.10587  [pdf, other

    cs.LG cs.AI stat.ML

    Acceleration in Policy Optimization

    Authors: Veronica Chelu, Tom Zahavy, Arthur Guez, Doina Precup, Sebastian Flennerhag

    Abstract: We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates. Leveraging the connection between policy iteration and policy gradient methods, we view policy optimization algorithms as iteratively solving a sequence of surrogate objectives, local lower bound… ▽ More

    Submitted 5 September, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

  5. arXiv:2206.05314  [pdf, other

    cs.LG cs.AI

    Large-Scale Retrieval for Reinforcement Learning

    Authors: Peter C. Humphreys, Arthur Guez, Olivier Tieleman, Laurent Sifre, Théophane Weber, Timothy Lillicrap

    Abstract: Effective decision making involves flexibly relating past experiences and relevant contextual information to a novel situation. In deep reinforcement learning (RL), the dominant paradigm is for an agent to amortise information that helps decision making into its network weights via gradient descent on training losses. Here, we pursue an alternative approach in which agents can utilise large-scale… ▽ More

    Submitted 16 December, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: Thirty-sixth Annual Conference on Neural Information Processing Systems (NeurIPS 2022), 16 pages

  6. arXiv:2204.08957  [pdf, other

    cs.LG cs.AI

    COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

    Authors: Jongmin Lee, Cosmin Paduraru, Daniel J. Mankowitz, Nicolas Heess, Doina Precup, Kee-Eung Kim, Arthur Guez

    Abstract: We consider the offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset. This problem setting is appealing in many real-world scenarios, where direct interaction with the environment is costly or risky, and where the resulting policy should… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: 24 pages, 6 figures, Accepted at ICLR 2022 (spotlight)

  7. arXiv:2202.08417  [pdf, other

    cs.LG

    Retrieval-Augmented Reinforcement Learning

    Authors: Anirudh Goyal, Abram L. Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adria Puigdomenech Badia, Arthur Guez, Mehdi Mirza, Peter C. Humphreys, Ksenia Konyushkova, Laurent Sifre, Michal Valko, Simon Osindero, Timothy Lillicrap, Nicolas Heess, Charles Blundell

    Abstract: Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take many updates to integrate experiences into the parametric model, (3) experiences that are not fully integrated do not appropriately influence the… ▽ More

    Submitted 24 May, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

  8. arXiv:2104.06159  [pdf, other

    cs.LG cs.AI

    Muesli: Combining Improvements in Policy Optimization

    Authors: Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, Hado van Hasselt

    Abstract: We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by ex… ▽ More

    Submitted 31 March, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

  9. arXiv:2011.09464  [pdf, other

    cs.LG

    Counterfactual Credit Assignment in Model-Free Reinforcement Learning

    Authors: Thomas Mesnard, Théophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Tom Stepleton, Nicolas Heess, Arthur Guez, Éric Moulines, Marcus Hutter, Lars Buesing, Rémi Munos

    Abstract: Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards. In particular, this requires separating skill from luck, i.e. disentangling the effect of an action on rewards from that of external factors and subsequent actions. To achieve this, we adapt the notion of counterfactuals from causality theory to a model-free RL setup. The key idea is to… ▽ More

    Submitted 14 December, 2021; v1 submitted 18 November, 2020; originally announced November 2020.

  10. arXiv:2011.04021  [pdf, other

    cs.AI cs.LG

    On the role of planning in model-based deep reinforcement learning

    Authors: Jessica B. Hamrick, Abram L. Friesen, Feryal Behbahani, Arthur Guez, Fabio Viola, Sims Witherspoon, Thomas Anthony, Lars Buesing, Petar Veličković, Théophane Weber

    Abstract: Model-based planning is often thought to be necessary for deep, careful reasoning and generalization in artificial agents. While recent successes of model-based reinforcement learning (MBRL) with deep function approximation have strengthened this hypothesis, the resulting diversity of model-based methods has also made it difficult to track which components drive success and why. In this paper, we… ▽ More

    Submitted 17 March, 2021; v1 submitted 8 November, 2020; originally announced November 2020.

    Comments: Published at ICLR 2021

  11. arXiv:2010.01298  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Beyond Tabula-Rasa: a Modular Reinforcement Learning Approach for Physically Embedded 3D Sokoban

    Authors: Peter Karkus, Mehdi Mirza, Arthur Guez, Andrew Jaegle, Timothy Lillicrap, Lars Buesing, Nicolas Heess, Theophane Weber

    Abstract: Intelligent robots need to achieve abstract objectives using concrete, spatiotemporally complex sensory information and motor control. Tabula rasa deep reinforcement learning (RL) has tackled demanding tasks in terms of either visual, abstract, or physical reasoning, but solving these jointly remains a formidable challenge. One recent, unsolved benchmark task that integrates these challenges is Mu… ▽ More

    Submitted 3 October, 2020; originally announced October 2020.

  12. arXiv:2009.05524  [pdf, other

    cs.AI cs.LG

    Physically Embedded Planning Problems: New Challenges for Reinforcement Learning

    Authors: Mehdi Mirza, Andrew Jaegle, Jonathan J. Hunt, Arthur Guez, Saran Tunyasuvunakool, Alistair Muldal, Théophane Weber, Peter Karkus, Sébastien Racanière, Lars Buesing, Timothy Lillicrap, Nicolas Heess

    Abstract: Recent work in deep reinforcement learning (RL) has produced algorithms capable of mastering challenging games such as Go, chess, or shogi. In these works the RL agent directly observes the natural state of the game and controls that state directly with its actions. However, when humans play such games, they do not just reason about the moves but also interact with their physical environment. They… ▽ More

    Submitted 29 October, 2020; v1 submitted 11 September, 2020; originally announced September 2020.

    Comments: 17 pages + appendix. Updated text and references

  13. arXiv:2002.08329  [pdf, other

    cs.LG stat.ML

    Value-driven Hindsight Modelling

    Authors: Arthur Guez, Fabio Viola, Théophane Weber, Lars Buesing, Steven Kapturowski, Doina Precup, David Silver, Nicolas Heess

    Abstract: Value estimation is a critical component of the reinforcement learning (RL) paradigm. The question of how to effectively learn value predictors from data is one of the major problems studied by the RL community, and different approaches exploit structure in the problem domain in different ways. Model learning can make use of the rich transition structure present in sequences of observations, but t… ▽ More

    Submitted 20 October, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: 9 pages + reference + appendix. NeurIPS 2020 version

  14. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

    Authors: Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver

    Abstract: Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the… ▽ More

    Submitted 21 February, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

  15. arXiv:1910.00528  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Augmenting learning using symmetry in a biologically-inspired domain

    Authors: Shruti Mishra, Abbas Abdolmaleki, Arthur Guez, Piotr Trochim, Doina Precup

    Abstract: Invariances to translation, rotation and other spatial transformations are a hallmark of the laws of motion, and have widespread use in the natural sciences to reduce the dimensionality of systems of equations. In supervised learning, such as in image classification tasks, rotation, translation and scale invariances are used to augment training datasets. In this work, we use data augmentation in a… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

  16. arXiv:1901.03559  [pdf, other

    cs.LG cs.AI stat.ML

    An investigation of model-free planning

    Authors: Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap

    Abstract: The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been propos… ▽ More

    Submitted 20 May, 2019; v1 submitted 11 January, 2019; originally announced January 2019.

  17. arXiv:1811.06272  [pdf, other

    cs.LG stat.ML

    Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search

    Authors: Lars Buesing, Theophane Weber, Yori Zwols, Sebastien Racaniere, Arthur Guez, Jean-Baptiste Lespiau, Nicolas Heess

    Abstract: Learning policies on data synthesized by models can in principle quench the thirst of reinforcement learning algorithms for large amounts of real experience, which is often costly to acquire. However, simulating plausible experience de novo is a hard problem for many complex environments, often resulting in biases for model-based policy evaluation and search. Instead of de novo synthesis of data,… ▽ More

    Submitted 15 November, 2018; originally announced November 2018.

  18. arXiv:1802.04697  [pdf, other

    cs.AI cs.LG stat.ML

    Learning to Search with MCTSnets

    Authors: Arthur Guez, Théophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Rémi Munos, David Silver

    Abstract: Planning problems are among the most important and well-studied problems in artificial intelligence. They are most typically solved by tree search algorithms that simulate ahead into the future, evaluate future states, and back-up those evaluations to the root of a search tree. Among these algorithms, Monte-Carlo tree search (MCTS) is one of the most general, powerful and widely used. A typical im… ▽ More

    Submitted 17 July, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

    Comments: ICML 2018 (camera-ready version)

  19. arXiv:1712.01815  [pdf, other

    cs.AI cs.LG

    Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

    Authors: David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis

    Abstract: The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game… ▽ More

    Submitted 5 December, 2017; originally announced December 2017.

  20. arXiv:1707.06203  [pdf, other

    cs.LG cs.AI stat.ML

    Imagination-Augmented Agents for Deep Reinforcement Learning

    Authors: Théophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, Demis Hassabis, David Silver, Daan Wierstra

    Abstract: We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects. In contrast to most existing model-based reinforcement learning and planning methods, which prescribe how a model should be used to arrive at a policy, I2As learn to interpret predictions from a learned environment model to construct implicit plans in… ▽ More

    Submitted 14 February, 2018; v1 submitted 19 July, 2017; originally announced July 2017.

  21. arXiv:1702.06465  [pdf, ps, other

    cond-mat.quant-gas nucl-th

    Universal behavior of few-boson systems using potential models

    Authors: A. Kievsky, M. Viviani, R. Álvarez-Rodrí guez, M. Gattobigio, A. Deltuva

    Abstract: The universal behavior of a three-boson system close to the unitary limit is encoded in a simple dependence of many observables in terms of few parameters. For example the product of the three-body parameter $κ_*$ and the two-body scattering length $a$, $κ_* a$ depends on the angle $ξ$ defined by $E_3/E_2=\tan^2ξ$. A similar dependence is observed in the ratio $a_{AD}/a$ with $a_{AD}$ the boson-di… ▽ More

    Submitted 21 February, 2017; originally announced February 2017.

    Comments: presented at the 23rd European Conference on Few-Body Problems in Physics

    Journal ref: Few-Body Syst (2017) 58: 66

  22. arXiv:1612.08810  [pdf, other

    cs.LG cs.AI cs.NE

    The Predictron: End-To-End Learning and Planning

    Authors: David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris

    Abstract: One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning. In this document we introduce the predictron architecture. The predictron consists of a fully abstract model, represented by a Markov reward process, that can be rolled forward multiple "imagined" planning steps. Each forward pass of the predictron accumulates internal rewards and… ▽ More

    Submitted 20 July, 2017; v1 submitted 28 December, 2016; originally announced December 2016.

    Comments: Camera-ready version, ICML 2017, with supplement

  23. arXiv:1602.07714  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Learning values across many orders of magnitude

    Authors: Hado van Hasselt, Arthur Guez, Matteo Hessel, Volodymyr Mnih, David Silver

    Abstract: Most learning algorithms are not invariant to the scale of the function that is being approximated. We propose to adaptively normalize the targets used in learning. This is useful in value-based reinforcement learning, where the magnitude of appropriate value approximations can change over time when we update the policy of behavior. Our main motivation is prior work on learning to play Atari games… ▽ More

    Submitted 16 August, 2016; v1 submitted 24 February, 2016; originally announced February 2016.

    Comments: Paper accepted for publication at NIPS 2016. This version includes the appendix

  24. arXiv:1512.04860  [pdf, other

    cs.AI cs.LG

    Increasing the Action Gap: New Operators for Reinforcement Learning

    Authors: Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos

    Abstract: This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation and e… ▽ More

    Submitted 15 December, 2015; originally announced December 2015.

    Journal ref: Bellemare, Marc G., Ostrovski, G., Guez, A., Thomas, Philip S., and Munos, Remi. Increasing the Action Gap: New Operators for Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2016

  25. arXiv:1509.06461  [pdf, other

    cs.LG

    Deep Reinforcement Learning with Double Q-learning

    Authors: Hado van Hasselt, Arthur Guez, David Silver

    Abstract: The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-learni… ▽ More

    Submitted 8 December, 2015; v1 submitted 22 September, 2015; originally announced September 2015.

    Comments: AAAI 2016

  26. arXiv:1402.1958  [pdf, other

    cs.AI cs.LG stat.ML

    Better Optimism By Bayes: Adaptive Planning with Rich Models

    Authors: Arthur Guez, David Silver, Peter Dayan

    Abstract: The computational costs of inference and planning have confined Bayesian model-based reinforcement learning to one of two dismal fates: powerful Bayes-adaptive planning but only for simplistic models, or powerful, Bayesian non-parametric models but using simple, myopic planning strategies such as Thompson sampling. We ask whether it is feasible and truly beneficial to combine rich probabilistic mo… ▽ More

    Submitted 9 February, 2014; originally announced February 2014.

    Comments: 11 pages, 11 figures

  27. arXiv:1306.3483  [pdf, other

    math.DG

    On the realization problem of plane real algebraic curves as Hessian curves

    Authors: Angelito Camacho Calderón, Adriana Ortiz Rodrí guez

    Abstract: The Hessian Topology is a subject having interesting relations with several areas, for instance, differential geometry, implicit differential equations, analysis and singularity theory. In this article we study the problem of realization of a real plane curve as the Hessian curve of a smooth function. The plane curves we consider are constituted either by only outer ovals or inner ovals. We prove… ▽ More

    Submitted 14 June, 2013; originally announced June 2013.

    Comments: 11 pages, 1 figure

    MSC Class: 53A15; 53A05

  28. arXiv:1205.3109  [pdf, other

    cs.LG cs.AI stat.ML

    Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

    Authors: Arthur Guez, David Silver, Peter Dayan

    Abstract: Bayesian model-based reinforcement learning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, finding the resulting Bayes-optimal policies is notoriously taxing, since the search space becomes enormous. In this paper we introduce a tractable, sample-based method for approximate Bayes-optima… ▽ More

    Submitted 18 December, 2013; v1 submitted 14 May, 2012; originally announced May 2012.

    Comments: 14 pages, 7 figures, includes supplementary material. Advances in Neural Information Processing Systems (NIPS) 2012

    Journal ref: (2012) Advances in Neural Information Processing Systems 25, pages 1034-1042