Skip to main content

Showing 1–50 of 50 results for author: Rowland, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.02035  [pdf, other

    cs.LG cs.AI

    A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning

    Authors: Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Avila Pires, Yunhao Tang, Clare Lyle, Mark Rowland, Nicolas Heess, Diana Borsa, Arthur Guez, Will Dabney

    Abstract: Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents. Self-predictive learning provides means to jointly learn a latent representation and dynamics model by bootstrapping from future latent representations (BYOL). Recent work has developed theoretical insights into these algorithms by studying a continuous-time ODE model for self-predictive representation le… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2403.08635  [pdf, other

    cs.LG cs.AI stat.ML

    Human Alignment of Large Language Models through Online Preference Optimisation

    Authors: Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

    Abstract: Ensuring alignment of language models' outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human alignment has been extensively studied recently and several methods such as Reinforcement Learning from Human Feedback (RLHF), Direct Policy Optimisation (DPO) and Sequence Likelihood Calibration (SLiC) have emerged. In this paper, our contributio… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  3. arXiv:2402.08530  [pdf, other

    cs.LG cs.AI stat.ML

    A Distributional Analogue to the Successor Representation

    Authors: Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, André Barreto, Will Dabney, Marc G. Bellemare, Mark Rowland

    Abstract: This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this beha… ▽ More

    Submitted 24 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024. First two authors contributed equally

  4. arXiv:2402.07598  [pdf, other

    cs.LG stat.ML

    Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

    Authors: Mark Rowland, Li Kevin Wenliang, Rémi Munos, Clare Lyle, Yunhao Tang, Will Dabney

    Abstract: We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions with a generative model (up to logarithmic factors), resolving an open question of Zhang et al. (2023). Our analysis provides new theoretical results on categorical approaches to distributional RL, and also introduces a new distributiona… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  5. arXiv:2402.05766  [pdf, other

    cs.LG

    Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling

    Authors: Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney

    Abstract: We introduce off-policy distributional Q($λ$), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional Q($λ$) does not apply importance sampling for off-policy learning, which introduces intriguing interactions with signed measures. Such unique properties distributional Q($λ$) from other existing alternatives such as distributional Retrace. We cha… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  6. arXiv:2402.05749  [pdf, other

    cs.LG cs.AI

    Generalized Preference Optimization: A Unified Approach to Offline Alignment

    Authors: Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

    Abstract: Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices. We propose generalized preference optimization (GPO), a family of offline losses parameterized by a general class of convex functions. GPO enables a unified view over preference optimization, encompassing existing algorithms such as DPO, IPO and SLiC a… ▽ More

    Submitted 28 May, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted at ICML 2023 main conference

  7. arXiv:2312.07358  [pdf, other

    stat.ML cs.LG

    Distributional Bellman Operators over Mean Embeddings

    Authors: Li Kevin Wenliang, Grégoire Delétang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland

    Abstract: We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions. We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework, provide asymptotic convergence theory, and examine the empirical performance of the algorithms on a suite of tabular tasks.… ▽ More

    Submitted 4 March, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

  8. arXiv:2312.00886  [pdf, other

    stat.ML cs.AI cs.GT cs.LG cs.MA

    Nash Learning from Human Feedback

    Authors: Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot

    Abstract: Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human preferences. Typically, RLHF involves the initial step of learning a reward model from human feedback, often expressed as preferences between pairs of text generations produced by a pre-trained LLM. Subsequently, the LLM's policy is fine-tuned by optimizing it to… ▽ More

    Submitted 11 June, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

  9. arXiv:2310.19804  [pdf, other

    cs.LG cs.AI

    A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

    Authors: Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland

    Abstract: Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning. We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We leverage this new perspective to define a new metric that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). T… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: Published in TMLR

  10. arXiv:2310.12036  [pdf, other

    cs.AI cs.LG stat.ML

    A General Theoretical Paradigm to Understand Learning from Human Preferences

    Authors: Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos

    Abstract: The prevalent deployment of learning from human preferences through reinforcement learning (RLHF) relies on two important approximations: the first assumes that pairwise preferences can be substituted with pointwise rewards. The second assumes that a reward model trained on these pointwise rewards can generalize from collected data to out-of-distribution data sampled by the policy. Recently, Direc… ▽ More

    Submitted 21 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

  11. arXiv:2306.10171  [pdf, other

    cs.LG cs.AI stat.ML

    Bootstrapped Representations in Reinforcement Learning

    Authors: Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G. Bellemare, Will Dabney

    Abstract: In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, auxiliary objectives are often incorporated i… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  12. arXiv:2305.18501  [pdf, other

    cs.LG

    DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

    Authors: Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko

    Abstract: Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings. However, in the optimal control case, the impact of multi-step learning has been relatively limited despite a number of prior efforts. Fundamentally, this might be because multi-step policy improvements require operations that cannot be approximated by stochastic samples, hence hin… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  13. arXiv:2305.18388  [pdf, other

    cs.LG stat.ML

    The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation

    Authors: Mark Rowland, Yunhao Tang, Clare Lyle, Rémi Munos, Marc G. Bellemare, Will Dabney

    Abstract: We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this task. We reach the surprising conclusion that even if a practitioner has no interest in the return distribution beyond the mean, QTD (which learns predictions abou… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: ICML 2023

  14. arXiv:2305.18161  [pdf, other

    cs.LG

    VA-learning as a more efficient alternative to Q-learning

    Authors: Yunhao Tang, Rémi Munos, Mark Rowland, Michal Valko

    Abstract: In reinforcement learning, the advantage function is critical for policy improvement, but is often extracted from a learned Q-function. A natural question is: Why not learn the advantage function directly? In this work, we introduce VA-learning, which directly learns advantage function and value function using bootstrapping, without explicit reference to Q-functions. VA-learning learns off-policy… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  15. arXiv:2301.04462  [pdf, other

    cs.LG stat.ML

    An Analysis of Quantile Temporal-Difference Learning

    Authors: Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney

    Abstract: We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning. Despite these empirical successes, a theoretical understanding of QTD has proven elusive until now. Unlike classical TD learning, which can be analysed with standard stochastic appro… ▽ More

    Submitted 20 May, 2024; v1 submitted 11 January, 2023; originally announced January 2023.

    Comments: Accepted to JMLR

  16. arXiv:2212.04025  [pdf, other

    cs.LG cs.AI stat.ML

    A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces

    Authors: Charline Le Lan, Joshua Greaves, Jesse Farebrother, Mark Rowland, Fabian Pedregosa, Rishabh Agarwal, Marc G. Bellemare

    Abstract: Many machine learning problems encode their data as a matrix with a possibly very large number of rows and columns. In several applications like neuroscience, image compression or deep reinforcement learning, the principal subspace of such a matrix provides a useful, low-dimensional representation of individual data. Here, we are interested in determining the $d$-dimensional principal subspace of… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: 8 pages in main content, 2 pages of bibliography and 5 pages in Appendix

  17. arXiv:2212.03319  [pdf, other

    cs.LG cs.AI

    Understanding Self-Predictive Learning for Reinforcement Learning

    Authors: Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

    Abstract: We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirabl… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  18. arXiv:2209.14414  [pdf, other

    stat.ML cs.LG

    Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

    Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Menard

    Abstract: We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions. The performance of an agent is measured by the regret after interacting with the environment for $T$ episodes. We propose an optimistic posterior sampling algorithm for reinforcement learning (OPSRL), a simple variant of poste… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: text overlap with arXiv:2205.07704

  19. arXiv:2208.10138  [pdf, other

    cs.GT stat.ML

    Learning Correlated Equilibria in Mean-Field Games

    Authors: Paul Muller, Romuald Elie, Mark Rowland, Mathieu Lauriere, Julien Perolat, Sarah Perrin, Matthieu Geist, Georgios Piliouras, Olivier Pietquin, Karl Tuyls

    Abstract: The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

  20. arXiv:2207.07570  [pdf, other

    cs.LG

    The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning

    Authors: Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare

    Abstract: We study the multi-step off-policy learning approach to distributional RL. Despite the apparent similarity between value-based RL and distributional RL, our study reveals intriguing and fundamental differences between the two cases in the multi-step setting. We identify a novel notion of path-dependent distributional TD error, which is indispensable for principled multi-step distributional RL. The… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

  21. arXiv:2206.15378  [pdf, other

    cs.AI cs.GT cs.MA

    Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

    Authors: Julien Perolat, Bart de Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot , et al. (9 additional authors not shown)

    Abstract: We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additiona… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

  22. arXiv:2206.08736  [pdf, other

    stat.ML cs.LG

    Generalised Policy Improvement with Geometric Policy Composition

    Authors: Shantanu Thakoor, Mark Rowland, Diana Borsa, Will Dabney, Rémi Munos, André Barreto

    Abstract: We introduce a method for policy improvement that interpolates between the greedy approach of value-based reinforcement learning (RL) and the full planning approach typical of model-based RL. The new method builds on the concept of a geometric horizon model (GHM, also known as a gamma-model), which models the discounted state-visitation distribution of a given policy. We show that we can evaluate… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: ICML 2022

  23. arXiv:2206.02126  [pdf, other

    cs.LG

    Learning Dynamics and Generalization in Reinforcement Learning

    Authors: Clare Lyle, Mark Rowland, Will Dabney, Marta Kwiatkowska, Yarin Gal

    Abstract: Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal difference algorithms to gain novel insight into the tension between these two objectives. We show theoretically that temporal difference learning encourages agents to… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

  24. arXiv:2204.09560  [pdf, other

    cs.LG

    Understanding and Preventing Capacity Loss in Reinforcement Learning

    Authors: Clare Lyle, Mark Rowland, Will Dabney

    Abstract: The reinforcement learning (RL) problem is rife with sources of non-stationarity, making it a notoriously difficult problem domain for the application of neural networks. We identify a mechanism by which non-stationary prediction targets can prevent learning progress in deep RL agents: \textit{capacity loss}, whereby networks trained on a sequence of target values lose their ability to quickly upd… ▽ More

    Submitted 4 May, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

    Comments: Presented at ICLR 2022

  25. arXiv:2203.16177  [pdf, other

    cs.LG

    Marginalized Operators for Off-policy Reinforcement Learning

    Authors: Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko

    Abstract: In this work, we propose marginalized operators, a new class of off-policy evaluation operators for reinforcement learning. Marginalized operators strictly generalize generic multi-step operators, such as Retrace, as special cases. Marginalized operators also suggest a form of sample-based estimates with potential variance reduction, compared to sample-based estimates of the original multi-step op… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted at AISTATS 2022

  26. arXiv:2111.08350  [pdf, other

    cs.GT cs.MA

    Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO

    Authors: Paul Muller, Mark Rowland, Romuald Elie, Georgios Piliouras, Julien Perolat, Mathieu Lauriere, Raphael Marinier, Olivier Pietquin, Karl Tuyls

    Abstract: Recent advances in multiagent learning have seen the introduction ofa family of algorithms that revolve around the population-based trainingmethod PSRO, showing convergence to Nash, correlated and coarse corre-lated equilibria. Notably, when the number of agents increases, learningbest-responses becomes exponentially more difficult, and as such ham-pers PSRO training methods. The paradigm of mean-… ▽ More

    Submitted 29 August, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

    Comments: AAMAS

  27. arXiv:2106.14668  [pdf, other

    cs.GT cs.LG cs.MA

    Evolutionary Dynamics and $Φ$-Regret Minimization in Games

    Authors: Georgios Piliouras, Mark Rowland, Shayegan Omidshafiei, Romuald Elie, Daniel Hennes, Jerome Connor, Karl Tuyls

    Abstract: Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. Regret quantifies the difference between a learner's performance against a baseline in hindsight. It is well-known that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret u… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

  28. arXiv:2106.13125  [pdf, other

    cs.LG

    Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

    Authors: Yunhao Tang, Tadashi Kozuno, Mark Rowland, Rémi Munos, Michal Valko

    Abstract: Model-agnostic meta-reinforcement learning requires estimating the Hessian matrix of value functions. This is challenging from an implementation perspective, as repeatedly differentiating policy gradient estimates may lead to biased Hessian estimates. In this work, we provide a unifying framework for estimating higher-order derivatives of value functions, based on off-policy evaluation. Our framew… ▽ More

    Submitted 3 November, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: Accepted at Neural Information Processing Systems (NeurIPS), 2021. Code is available at https://github.com/robintyh1/neurips2021-meta-gradient-offpolicy-evaluation

  29. arXiv:2106.08229  [pdf, other

    cs.LG cs.AI

    MICo: Improved representations via sampling-based state similarity for Markov decision processes

    Authors: Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland

    Abstract: We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed… ▽ More

    Submitted 21 January, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

    Comments: Published at NeurIPS 2021

  30. arXiv:2106.06170  [pdf, other

    cs.LG

    Taylor Expansion of Discount Factors

    Authors: Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko

    Abstract: In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors. Our analysis suggests new ways for… ▽ More

    Submitted 14 June, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: Accepted at International Conference of Machine Learning (ICML), 2021

  31. arXiv:2103.00107  [pdf, other

    cs.LG cs.AI stat.ML

    Revisiting Peng's Q($λ$) for Modern Reinforcement Learning

    Authors: Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel

    Abstract: Off-policy multi-step reinforcement learning algorithms consist of conservative and non-conservative algorithms: the former actively cut traces, whereas the latter do not. Recently, Munos et al. (2016) proved the convergence of conservative algorithms to an optimal Q-function. In contrast, non-conservative algorithms are thought to be unsafe and have a limited or no theoretical guarantee. Nonethel… ▽ More

    Submitted 26 February, 2021; originally announced March 2021.

    Comments: 26 pages, 7 figures, 2 tables

  32. arXiv:2102.13089  [pdf, other

    cs.LG

    On The Effect of Auxiliary Tasks on Representation Dynamics

    Authors: Clare Lyle, Mark Rowland, Georg Ostrovski, Will Dabney

    Abstract: While auxiliary tasks play a key role in shaping the representations learnt by reinforcement learning agents, much is still unknown about the mechanisms through which this is achieved. This work develops our understanding of the relationship between auxiliary tasks, environment structure, and representations by analysing the dynamics of temporal difference algorithms. Through this approach, we est… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

    Comments: AISTATS 2021

  33. arXiv:2011.09192  [pdf, other

    cs.AI cs.GT cs.MA

    Game Plan: What AI can do for Football, and What Football can do for AI

    Authors: Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adria Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Perolat, Bart De Vylder , et al. (11 additional authors not shown)

    Abstract: The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with t… ▽ More

    Submitted 18 November, 2020; originally announced November 2020.

  34. arXiv:2007.06700  [pdf, other

    cs.LG stat.ML

    Revisiting Fundamentals of Experience Replay

    Authors: William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney

    Abstract: Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding. We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio). Our additive and a… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

    Comments: Published at ICML 2020. First two authors contributed equally and code available at https://github.com/google-research/google-research/tree/master/experience_replay

  35. arXiv:2006.02243  [pdf, other

    cs.LG stat.ML

    The Value-Improvement Path: Towards Better Representations for Reinforcement Learning

    Authors: Will Dabney, André Barreto, Mark Rowland, Robert Dadashi, John Quan, Marc G. Bellemare, David Silver

    Abstract: In value-based reinforcement learning (RL), unlike in supervised learning, the agent faces not a single, stationary, approximation problem, but a sequence of value prediction problems. Each time the policy improves, the nature of the problem changes, shifting both the distribution of states and their values. In this paper we take a novel perspective, arguing that the value prediction problems face… ▽ More

    Submitted 4 January, 2021; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: AAAI-21

  36. Navigating the Landscape of Multiplayer Games

    Authors: Shayegan Omidshafiei, Karl Tuyls, Wojciech M. Czarnecki, Francisco C. Santos, Mark Rowland, Jerome Connor, Daniel Hennes, Paul Muller, Julien Perolat, Bart De Vylder, Audrunas Gruslys, Remi Munos

    Abstract: Multiplayer games have long been used as testbeds in artificial intelligence research, aptly referred to as the Drosophila of artificial intelligence. Traditionally, researchers have focused on using well-known games to build strong agents. This progress, however, can be better informed by characterizing games and their topological landscape. Tackling this latter question can facilitate understand… ▽ More

    Submitted 17 November, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

  37. arXiv:2002.08456  [pdf, other

    cs.GT cs.LG stat.ML

    From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization

    Authors: Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, Georgios Piliouras, Marc Lanctot, Karl Tuyls

    Abstract: In this paper we investigate the Follow the Regularized Leader dynamics in sequential imperfect information games (IIG). We generalize existing results of Poincaré recurrence from normal-form games to zero-sum two-player imperfect information games and other sequential game settings. We then investigate how adapting the reward (by adding a regularization term) of the game can give strong convergen… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

    Comments: 43 pages

  38. arXiv:1910.07479  [pdf, other

    cs.LG stat.ML

    Conditional Importance Sampling for Off-Policy Learning

    Authors: Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana Borsa, Tom Schaul, Rémi Munos, Will Dabney

    Abstract: The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms th… ▽ More

    Submitted 30 July, 2020; v1 submitted 16 October, 2019; originally announced October 2019.

    Comments: AISTATS 2020 camera-ready version

  39. arXiv:1910.07478  [pdf, other

    cs.LG stat.ML

    Adaptive Trade-Offs in Off-Policy Learning

    Authors: Mark Rowland, Will Dabney, Rémi Munos

    Abstract: A great variety of off-policy learning algorithms exist in the literature, and new breakthroughs in this area continue to be made, improving theoretical understanding and yielding state-of-the-art reinforcement learning algorithms. In this paper, we take a unifying view of this space of algorithms, and consider their trade-offs of three fundamental quantities: update variance, fixed-point bias, an… ▽ More

    Submitted 30 July, 2020; v1 submitted 16 October, 2019; originally announced October 2019.

    Comments: AISTATS 2020 camera-ready version

  40. arXiv:1909.12823  [pdf, other

    cs.MA cs.AI cs.LG

    A Generalized Training Approach for Multiagent Learning

    Authors: Paul Muller, Shayegan Omidshafiei, Mark Rowland, Karl Tuyls, Julien Perolat, Siqi Liu, Daniel Hennes, Luke Marris, Marc Lanctot, Edward Hughes, Zhe Wang, Guy Lever, Nicolas Heess, Thore Graepel, Remi Munos

    Abstract: This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-… ▽ More

    Submitted 14 February, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

  41. arXiv:1909.09849  [pdf, other

    cs.MA cs.AI cs.LG

    Multiagent Evaluation under Incomplete Information

    Authors: Mark Rowland, Shayegan Omidshafiei, Karl Tuyls, Julien Perolat, Michal Valko, Georgios Piliouras, Remi Munos

    Abstract: This paper investigates the evaluation of learned multiagent strategies in the incomplete information setting, which plays a critical role in ranking and training of agents. Traditionally, researchers have relied on Elo ratings for this purpose, with recent works also using methods based on Nash equilibria. Unfortunately, Elo is unable to handle intransitive agent interactions, and other technique… ▽ More

    Submitted 10 January, 2020; v1 submitted 21 September, 2019; originally announced September 2019.

  42. arXiv:1905.03030  [pdf, other

    cs.LG cs.AI stat.ML

    Meta-learning of Sequential Strategies

    Authors: Pedro A. Ortega, Jane X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alex Pritzel, Pablo Sprechmann, Siddhant M. Jayakumar, Tom McGrath, Kevin Miller, Mohammad Azar, Ian Osband, Neil Rabinowitz, András György, Silvia Chiappa, Simon Osindero, Yee Whye Teh, Hado van Hasselt, Nando de Freitas, Matthew Botvinick, Shane Legg

    Abstract: In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal pred… ▽ More

    Submitted 18 July, 2019; v1 submitted 8 May, 2019; originally announced May 2019.

    Comments: DeepMind Technical Report (15 pages, 6 figures). Version V1.1

  43. arXiv:1903.12264  [pdf, other

    cs.CY cs.HC cs.LG stat.ML

    Validation of a recommender system for prompting omitted foods in online dietary assessment surveys

    Authors: Timur Osadchiy, Ivan Poliakov, Patrick Olivier, Maisie Rowland, Emma Foster

    Abstract: Recall assistance methods are among the key aspects that improve the accuracy of online dietary assessment surveys. These methods still mainly rely on experience of trained interviewers with nutritional background, but data driven approaches could improve cost-efficiency and scalability of automated dietary assessment. We evaluated the effectiveness of a recommender algorithm developed for an onli… ▽ More

    Submitted 20 March, 2019; originally announced March 2019.

    Report number: ISBN: 978-1-4503-6126-2

    Journal ref: PervasiveHealth 2019 Proceedings of the 13th EAI International Conference on Pervasive Computing Technologies for Healthcare

  44. arXiv:1903.03784  [pdf, other

    stat.ML cs.LG

    Orthogonal Estimation of Wasserstein Distances

    Authors: Mark Rowland, Jiri Hron, Yunhao Tang, Krzysztof Choromanski, Tamas Sarlos, Adrian Weller

    Abstract: Wasserstein distances are increasingly used in a wide variety of applications in machine learning. Sliced Wasserstein distances form an important subclass which may be estimated efficiently through one-dimensional sorting operations. In this paper, we propose a new variant of sliced Wasserstein distance, study the use of orthogonal coupling in Monte Carlo estimation of Wasserstein distances and dr… ▽ More

    Submitted 5 April, 2019; v1 submitted 9 March, 2019; originally announced March 2019.

    Comments: Published at AISTATS 2019

  45. arXiv:1903.01373  [pdf, other

    cs.MA cs.GT

    $α$-Rank: Multi-Agent Evaluation by Evolution

    Authors: Shayegan Omidshafiei, Christos Papadimitriou, Georgios Piliouras, Karl Tuyls, Mark Rowland, Jean-Baptiste Lespiau, Wojciech M. Czarnecki, Marc Lanctot, Julien Perolat, Remi Munos

    Abstract: We introduce $α$-Rank, a principled evolutionary dynamics methodology for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs). The approach leverages continuous- and discrete-time evolutionary dynamical systems applied to empirical games, and scales tractably in the number of… ▽ More

    Submitted 4 October, 2019; v1 submitted 4 March, 2019; originally announced March 2019.

  46. arXiv:1902.08102  [pdf, other

    stat.ML cs.LG

    Statistics and Samples in Distributional Reinforcement Learning

    Authors: Mark Rowland, Robert Dadashi, Saurabh Kumar, Rémi Munos, Marc G. Bellemare, Will Dabney

    Abstract: We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution. Our key insight is that DRL algorithms can be decomposed as the combination of some statistical estimator and a method for imputing a return distribution consistent with that set of statistics. With this new und… ▽ More

    Submitted 21 February, 2019; originally announced February 2019.

  47. arXiv:1807.00400  [pdf, other

    stat.ML cs.LG

    Antithetic and Monte Carlo kernel estimators for partial rankings

    Authors: Maria Lomeli, Mark Rowland, Arthur Gretton, Zoubin Ghahramani

    Abstract: In the modern age, rankings data is ubiquitous and it is useful for a variety of applications such as recommender systems, multi-object tracking and preference learning. However, most rankings data encountered in the real world is incomplete, which prevents the direct application of existing modelling tools for complete rankings. Our contribution is a novel way to extend kernel methods for complet… ▽ More

    Submitted 25 July, 2018; v1 submitted 1 July, 2018; originally announced July 2018.

  48. arXiv:1804.11271  [pdf, other

    stat.ML cs.LG

    Gaussian Process Behaviour in Wide Deep Neural Networks

    Authors: Alexander G. de G. Matthews, Mark Rowland, Jiri Hron, Richard E. Turner, Zoubin Ghahramani

    Abstract: Whilst deep neural networks have shown great empirical success, there is still much work to be done to understand their theoretical properties. In this paper, we study the relationship between random, wide, fully connected, feedforward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition. We show that, under broad conditions, as we make the architectur… ▽ More

    Submitted 16 August, 2018; v1 submitted 30 April, 2018; originally announced April 2018.

    Comments: This work substantially extends the work of Matthews et al. (2018) published at the International Conference on Learning Representations (ICLR) 2018

  49. arXiv:1804.02395  [pdf, other

    cs.LG cs.RO stat.ML

    Structured Evolution with Compact Architectures for Scalable Policy Optimization

    Authors: Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard E. Turner, Adrian Weller

    Abstract: We present a new method of blackbox optimization via gradient approximation with the use of structured random orthogonal matrices, providing more accurate estimators than baselines and with provable theoretical guarantees. We show that this algorithm can be successfully applied to learn better quality compact policies than those using standard gradient estimation techniques. The compact policies w… ▽ More

    Submitted 12 June, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

  50. arXiv:1710.10044  [pdf, other

    cs.AI cs.LG stat.ML

    Distributional Reinforcement Learning with Quantile Regression

    Authors: Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos

    Abstract: In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the observed long-term return. Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. In this paper, we build on… ▽ More

    Submitted 27 October, 2017; originally announced October 2017.