Skip to main content

Showing 1–50 of 80 results for author: Rocktäschel, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04268  [pdf, other

    cs.LG cs.AI

    Open-Endedness is Essential for Artificial Superhuman Intelligence

    Authors: Edward Hughes, Michael Dennis, Jack Parker-Holder, Feryal Behbahani, Aditi Mavalankar, Yuge Shi, Tom Schaul, Tim Rocktaschel

    Abstract: In recent years there has been a tremendous surge in the general capabilities of AI systems, mainly fuelled by training foundation models on internetscale data. Nevertheless, the creation of openended, ever self-improving AI remains elusive. In this position paper, we argue that the ingredients are now in place to achieve openendedness in AI systems with respect to a human observer. Furthermore, w… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2405.20835  [pdf, other

    cs.LG cs.AI cs.CL

    Outliers and Calibration Sets have Diminishing Effect on Quantization of Modern LLMs

    Authors: Davide Paglieri, Saurabh Dash, Tim Rocktäschel, Jack Parker-Holder

    Abstract: Post-Training Quantization (PTQ) enhances the efficiency of Large Language Models (LLMs) by enabling faster operation and compatibility with more accessible hardware through reduced memory usage, at the cost of small performance drops. We explore the role of calibration sets in PTQ, specifically their effect on hidden activations in various notable open-source LLMs. Calibration sets are crucial fo… ▽ More

    Submitted 5 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  3. arXiv:2402.16822  [pdf, other

    cs.CL cs.AI cs.LG

    Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

    Authors: Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu

    Abstract: As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to user inputs is of paramount importance. Existing methods for identifying adversarial prompts tend to focus on specific domains, lack diversity, or require extensive human annotations. To address these limitations, we present Rainbow Teaming, a novel app… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  4. arXiv:2402.15391  [pdf, other

    cs.LG cs.AI cs.CV

    Genie: Generative Interactive Environments

    Authors: Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal Behbahani, Stephanie Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, Tim Rocktäschel

    Abstract: We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotem… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: https://sites.google.com/corp/view/genie-2024/

  5. arXiv:2402.06782  [pdf, other

    cs.AI cs.CL

    Debating with More Persuasive LLMs Leads to More Truthful Answers

    Authors: Akbir Khan, John Hughes, Dan Valentine, Laura Ruis, Kshitij Sachan, Ansh Radhakrishnan, Edward Grefenstette, Samuel R. Bowman, Tim Rocktäschel, Ethan Perez

    Abstract: Common methods for aligning large language models (LLMs) with desired behaviour heavily rely on human-labelled data. However, as models grow increasingly sophisticated, they will surpass human expertise, and the role of human evaluation will evolve into non-experts overseeing experts. In anticipation of this, we ask: can weaker models assess the correctness of stronger models? We investigate this… ▽ More

    Submitted 30 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: For code please check: https://github.com/ucl-dark/llm_debate

  6. arXiv:2401.13460  [pdf, other

    cs.LG cs.AI cs.MA

    Multi-Agent Diagnostics for Robustness via Illuminated Diversity

    Authors: Mikayel Samvelyan, Davide Paglieri, Minqi Jiang, Jack Parker-Holder, Tim Rocktäschel

    Abstract: In the rapidly advancing field of multi-agent systems, ensuring robustness in unfamiliar and adversarial settings is crucial. Notwithstanding their outstanding performance in familiar environments, these systems often falter in new situations due to overfitting during the training phase. This is especially pronounced in settings where both cooperative and competitive behaviours are present, encaps… ▽ More

    Submitted 28 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  7. arXiv:2312.12568  [pdf, other

    cs.AI

    Scaling Opponent Shaping to High Dimensional Games

    Authors: Akbir Khan, Timon Willi, Newton Kwan, Andrea Tacchetti, Chris Lu, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster

    Abstract: In multi-agent settings with mixed incentives, methods developed for zero-sum games have been shown to lead to detrimental outcomes. To address this issue, opponent shaping (OS) methods explicitly learn to influence the learning dynamics of co-players and empirically lead to improved individual and collective outcomes. However, OS methods have only been evaluated in low-dimensional environments du… ▽ More

    Submitted 10 February, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  8. arXiv:2312.12564  [pdf, other

    cs.LG cs.GT cs.MA

    Leading the Pack: N-player Opponent Shaping

    Authors: Alexandra Souly, Timon Willi, Akbir Khan, Robert Kirk, Chris Lu, Edward Grefenstette, Tim Rocktäschel

    Abstract: Reinforcement learning solutions have great success in the 2-player general sum setting. In this setting, the paradigm of Opponent Shaping (OS), in which agents account for the learning of their co-players, has led to agents which are able to avoid collectively bad outcomes, whilst also maximizing their reward. These methods have currently been limited to 2-player game. However, the real world inv… ▽ More

    Submitted 26 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

  9. arXiv:2312.09187  [pdf, other

    cs.LG

    Vision-Language Models as a Source of Rewards

    Authors: Kate Baumli, Satinder Baveja, Feryal Behbahani, Harris Chan, Gheorghe Comanici, Sebastian Flennerhag, Maxime Gazeau, Kristian Holsheimer, Dan Horgan, Michael Laskin, Clare Lyle, Hussain Masoom, Kay McKinney, Volodymyr Mnih, Alexander Neitz, Dmitry Nikulin, Fabio Pardo, Jack Parker-Holder, John Quan, Tim Rocktäschel, Himanshu Sahni, Tom Schaul, Yannick Schroecker, Stephen Spencer, Richie Steigerwald , et al. (2 additional authors not shown)

    Abstract: Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of… ▽ More

    Submitted 12 July, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 10 pages, 5 figures

  10. arXiv:2312.02682  [pdf, other

    cs.LG cs.AI cs.RO

    H-GAP: Humanoid Control with a Generalist Planner

    Authors: Zhengyao Jiang, Yingchen Xu, Nolan Wagener, Yicheng Luo, Michael Janner, Edward Grefenstette, Tim Rocktäschel, Yuandong Tian

    Abstract: Humanoid control is an important research challenge offering avenues for integration into human-centric infrastructures and enabling physics-driven humanoid animations. The daunting challenges in this field stem from the difficulty of optimizing in high-dimensional action spaces and the instability introduced by the bipedal morphology of humanoids. However, the extensive collection of human motion… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: 18 pages including appendix, 4 figures

  11. arXiv:2311.12786  [pdf, other

    cs.LG

    Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

    Authors: Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktäschel, David Scott Krueger

    Abstract: Fine-tuning large pre-trained models has become the de facto strategy for developing both task-specific and general-purpose machine learning systems, including developing models that are safe to deploy. Despite its clear importance, there has been minimal work that explains how fine-tuning alters the underlying capabilities learned by a model during pretraining: does fine-tuning yield entirely nov… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  12. arXiv:2311.12716  [pdf, other

    cs.LG cs.AI

    minimax: Efficient Baselines for Autocurricula in JAX

    Authors: Minqi Jiang, Michael Dennis, Edward Grefenstette, Tim Rocktäschel

    Abstract: Unsupervised environment design (UED) is a form of automatic curriculum learning for training robust decision-making agents to zero-shot transfer into unseen environments. Such autocurricula have received much interest from the RL community. However, UED experiments, based on CPU rollouts and GPU model updates, have often required several weeks of training. This compute requirement is a major obst… ▽ More

    Submitted 23 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: Presented at ALOE 2023

  13. arXiv:2311.10090  [pdf, other

    cs.LG cs.AI cs.MA

    JaxMARL: Multi-Agent RL Environments in JAX

    Authors: Alexander Rutherford, Benjamin Ellis, Matteo Gallici, Jonathan Cook, Andrei Lupu, Gardar Ingvarsson, Timon Willi, Akbir Khan, Christian Schroeder de Witt, Alexandra Souly, Saptarashmi Bandyopadhyay, Mikayel Samvelyan, Minqi Jiang, Robert Tjarko Lange, Shimon Whiteson, Bruno Lacerda, Nick Hawes, Tim Rocktaschel, Chris Lu, Jakob Nicolaus Foerster

    Abstract: Benchmarks play an important role in the development of machine learning algorithms. For example, research in reinforcement learning (RL) has been heavily influenced by available environments and benchmarks. However, RL environments are traditionally run on the CPU, limiting their scalability with typical academic compute. Recent advancements in JAX have enabled the wider use of hardware accelerat… ▽ More

    Submitted 19 December, 2023; v1 submitted 16 November, 2023; originally announced November 2023.

  14. arXiv:2311.01829  [pdf, other

    cs.LG cs.MA cs.NE

    Mix-ME: Quality-Diversity for Multi-Agent Learning

    Authors: Garðar Ingvarsson, Mikayel Samvelyan, Bryan Lim, Manon Flageat, Antoine Cully, Tim Rocktäschel

    Abstract: In many real-world systems, such as adaptive robotics, achieving a single, optimised solution may be insufficient. Instead, a diverse set of high-performing solutions is often required to adapt to varying contexts and requirements. This is the realm of Quality-Diversity (QD), which aims to discover a collection of high-performing solutions, each with their own unique characteristics. QD methods ha… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: 15 pages, 7 figures. Submitted and accepted to the ALOE workshop at NeurIPS 2023

  15. arXiv:2309.16797  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

    Authors: Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, Tim Rocktäschel

    Abstract: Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strategies are often sub-optimal. In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. Driven by an LLM,… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  16. arXiv:2308.10797  [pdf, other

    cs.LG cs.AI

    Stabilizing Unsupervised Environment Design with a Learned Adversary

    Authors: Ishita Mediratta, Minqi Jiang, Jack Parker-Holder, Michael Dennis, Eugene Vinitsky, Tim Rocktäschel

    Abstract: A key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. This challenge motivates the problem setting of Unsupervised Environment Design (UED), whereby a student agent trains on an adaptive distribution of tasks proposed by a teacher agent. A pioneering approach for UED is PAIRED, which uses… ▽ More

    Submitted 22 August, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: CoLLAs 2023 - Oral; Second and third authors contributed equally

  17. arXiv:2303.03376  [pdf, other

    cs.LG cs.MA

    MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning

    Authors: Mikayel Samvelyan, Akbir Khan, Michael Dennis, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Roberta Raileanu, Tim Rocktäschel

    Abstract: Open-ended learning methods that automatically generate a curriculum of increasingly challenging tasks serve as a promising avenue toward generally capable reinforcement learning agents. Existing methods adapt curricula independently over either environment parameters (in single-agent settings) or co-player policies (in multi-agent settings). However, the strengths and weaknesses of co-players can… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: International Conference on Learning Representations (ICLR) 2023

  18. arXiv:2301.07608  [pdf, other

    cs.LG cs.AI cs.NE

    Human-Timescale Adaptation in an Open-Ended Task Space

    Authors: Adaptive Agent Team, Jakob Bauer, Kate Baumli, Satinder Baveja, Feryal Behbahani, Avishkar Bhoopchand, Nathalie Bradley-Schmieg, Michael Chang, Natalie Clay, Adrian Collister, Vibhavari Dasagi, Lucy Gonzalez, Karol Gregor, Edward Hughes, Sheleem Kashem, Maria Loks-Thompson, Hannah Openshaw, Jack Parker-Holder, Shreya Pathak, Nicolas Perez-Nieves, Nemanja Rakicevic, Tim Rocktäschel, Yannick Schroecker, Jakub Sygnowski, Karl Tuyls , et al. (3 additional authors not shown)

    Abstract: Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL). In this work, we demonstrate that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. In a… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

  19. arXiv:2211.07819  [pdf, other

    cs.AI cs.LG

    General Intelligence Requires Rethinking Exploration

    Authors: Minqi Jiang, Tim Rocktäschel, Edward Grefenstette

    Abstract: We are at the cusp of a transition from "learning from data" to "learning what data to learn from" as a central focus of artificial intelligence (AI) research. While the first-order learning problem is not completely solved, large models under unified architectures, such as transformers, have shifted the learning bottleneck from how to effectively train our models to how to effectively acquire and… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

  20. arXiv:2211.00539  [pdf, other

    cs.LG cs.AI

    Dungeons and Data: A Large-Scale NetHack Dataset

    Authors: Eric Hambro, Roberta Raileanu, Danielle Rothermel, Vegard Mella, Tim Rocktäschel, Heinrich Küttler, Naila Murray

    Abstract: Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go, StarCraft, or DOTA, have relied on both simulated environments and large-scale datasets. However, progress on this research has been hindered by the scarcity of open-sourced datasets and the prohibitive computational cost to work with them. Here we present the NetHack Learning Dat… ▽ More

    Submitted 24 November, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: 9 pages, published in the Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks. New links to hosting location. Revised results, same conclusions

  21. arXiv:2210.14986  [pdf, other

    cs.CL

    The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs

    Authors: Laura Ruis, Akbir Khan, Stella Biderman, Sara Hooker, Tim Rocktäschel, Edward Grefenstette

    Abstract: Despite widespread use of LLMs as conversational agents, evaluations of performance fail to capture a crucial aspect of communication: interpreting language in context -- incorporating its pragmatics. Humans interpret language using beliefs and prior knowledge about the world. For example, we intuitively understand the response "I wore gloves" to the question "Did you leave fingerprints?" as meani… ▽ More

    Submitted 3 December, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted as Spotlight at NeurIPS 2023

  22. arXiv:2210.12719  [pdf, other

    cs.LG cs.AI

    Learning General World Models in a Handful of Reward-Free Deployments

    Authors: Yingchen Xu, Jack Parker-Holder, Aldo Pacchiano, Philip J. Ball, Oleh Rybkin, Stephen J. Roberts, Tim Rocktäschel, Edward Grefenstette

    Abstract: Building generally capable agents is a grand challenge for deep reinforcement learning (RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate generalization, exploration should be task agnostic; 2) to facilitate scalability, exploration policies should collect large quantities of data without costly centralized retraining. Combining these two properties, we i… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: To be published at NeurIPS 2022. Code and videos available at https://ycxuyingchen.github.io/cascade/

  23. arXiv:2210.05805  [pdf, other

    cs.LG cs.AI

    Exploration via Elliptical Episodic Bonuses

    Authors: Mikael Henaff, Roberta Raileanu, Minqi Jiang, Tim Rocktäschel

    Abstract: In recent years, a number of reinforcement learning (RL) methods have been proposed to explore complex environments which differ across episodes. In this work, we show that the effectiveness of these methods critically relies on a count-based episodic term in their exploration bonus. As a result, despite their success in relatively simple, noise-free settings, these methods fall short in more real… ▽ More

    Submitted 4 January, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

  24. arXiv:2210.00066  [pdf, other

    cs.LG cs.AI cs.CL

    Improving Policy Learning via Language Dynamics Distillation

    Authors: Victor Zhong, Jesse Mu, Luke Zettlemoyer, Edward Grefenstette, Tim Rocktäschel

    Abstract: Recent work has shown that augmenting environments with language descriptions improves policy learning. However, for environments with complex language abstractions, learning how to ground language to observations is difficult due to sparse, delayed rewards. We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language d… ▽ More

    Submitted 30 September, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022. 16 pages, 12 figures

  25. arXiv:2208.10291  [pdf, other

    cs.LG

    Efficient Planning in a Compact Latent Action Space

    Authors: Zhengyao Jiang, Tianjun Zhang, Michael Janner, Yueying Li, Tim Rocktäschel, Edward Grefenstette, Yuandong Tian

    Abstract: Planning-based reinforcement learning has shown strong performance in tasks in discrete and low-dimensional continuous action spaces. However, planning usually brings significant computational overhead for decision-making, and scaling such methods to high-dimensional action spaces remains challenging. To advance efficient planning for high-dimensional continuous control, we propose Trajectory Auto… ▽ More

    Submitted 24 January, 2023; v1 submitted 22 August, 2022; originally announced August 2022.

    Comments: Accepted by ICLR2023. Code available at https://github.com/ZhengyaoJiang/latentplan

  26. arXiv:2207.11584  [pdf, other

    cs.LG cs.AI

    Hierarchical Kickstarting for Skill Transfer in Reinforcement Learning

    Authors: Michael Matthews, Mikayel Samvelyan, Jack Parker-Holder, Edward Grefenstette, Tim Rocktäschel

    Abstract: Practising and honing skills forms a fundamental component of how humans learn, yet artificial agents are rarely specifically trained to perform them. Instead, they are usually trained end-to-end, with the hope being that useful skills will be implicitly learned in order to maximise discounted return of some extrinsic reward function. In this paper, we investigate how skills can be incorporated in… ▽ More

    Submitted 15 August, 2022; v1 submitted 23 July, 2022; originally announced July 2022.

    Comments: 19 pages, 12 figures, to be published in the Conference on Lifelong Learning Agents 2022

  27. arXiv:2207.06105  [pdf, other

    cs.AI

    GriddlyJS: A Web IDE for Reinforcement Learning

    Authors: Christopher Bamford, Minqi Jiang, Mikayel Samvelyan, Tim Rocktäschel

    Abstract: Progress in reinforcement learning (RL) research is often driven by the design of new, challenging environments -- a costly undertaking requiring skills orthogonal to that of a typical machine learning researcher. The complexity of environment development has only increased with the rise of procedural-content generation (PCG) as the prevailing paradigm for producing varied environments capable of… ▽ More

    Submitted 12 October, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

  28. arXiv:2207.05219  [pdf, other

    cs.LG cs.AI stat.ML

    Grounding Aleatoric Uncertainty for Unsupervised Environment Design

    Authors: Minqi Jiang, Michael Dennis, Jack Parker-Holder, Andrei Lupu, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster

    Abstract: Adaptive curricula in reinforcement learning (RL) have proven effective for producing policies robust to discrepancies between the train and test environment. Recently, the Unsupervised Environment Design (UED) framework generalized RL curricula to generating sequences of entire environments, leading to new methods with robust minimax regret properties. Problematically, in partially-observable or… ▽ More

    Submitted 24 October, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: NeurIPS 2022

  29. arXiv:2205.15824  [pdf, other

    cs.LG

    Graph Backup: Data Efficient Backup Exploiting Markovian Transitions

    Authors: Zhengyao Jiang, Tianjun Zhang, Robert Kirk, Tim Rocktäschel, Edward Grefenstette

    Abstract: The successes of deep Reinforcement Learning (RL) are limited to settings where we have a large stream of online experiences, but applying RL in the data-efficient setting with limited access to online interactions is still challenging. A key to data-efficient RL is good value estimation, but current methods in this space fail to fully utilise the structure of the trajectory data gathered from the… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

  30. arXiv:2203.11889  [pdf, other

    cs.LG cs.AI cs.NE cs.SC stat.ML

    Insights From the NeurIPS 2021 NetHack Challenge

    Authors: Eric Hambro, Sharada Mohanty, Dmitrii Babaev, Minwoo Byeon, Dipam Chakraborty, Edward Grefenstette, Minqi Jiang, Daejin Jo, Anssi Kanervisto, Jongmin Kim, Sungwoong Kim, Robert Kirk, Vitaly Kurin, Heinrich Küttler, Taehwon Kwon, Donghoon Lee, Vegard Mella, Nantas Nardelli, Ivan Nazarov, Nikita Ovsov, Jack Parker-Holder, Roberta Raileanu, Karolis Ramanauskas, Tim Rocktäschel, Danielle Rothermel , et al. (4 additional authors not shown)

    Abstract: In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with developing a program or agent that can win (i.e., 'ascend' in) the popular dungeon-crawler game of NetHack by interacting with the NetHack Learning Environment (NLE), a scalable, procedurally generated, and challenging Gym environment for reinforcement learning (RL). The challeng… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: Under review at PMLR for the NeuRIPS 2021 Competition Workshop Track, 10 pages + 10 in appendices

  31. arXiv:2203.01302  [pdf, other

    cs.LG

    Evolving Curricula with Regret-Based Environment Design

    Authors: Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

    Abstract: It remains a significant challenge to train generally capable agents with reinforcement learning (RL). A promising avenue for improving the robustness of RL agents is through the use of curricula. One such class of methods frames environment design as a game between a student and a teacher, using regret-based objectives to produce environment instantiations (or levels) at the frontier of the stude… ▽ More

    Submitted 30 September, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

    Comments: First two authors contributed equally

  32. arXiv:2202.08938  [pdf, other

    cs.LG cs.AI cs.CL

    Improving Intrinsic Exploration with Language Abstractions

    Authors: Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rocktäschel, Edward Grefenstette

    Abstract: Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse. One common solution is to use intrinsic rewards to encourage agents to explore their environment. However, recent intrinsic exploration methods often use state-based novelty measures which reward low-level exploration and may not scale to domains requiring more abstract skills. Instead, we explore natural la… ▽ More

    Submitted 21 November, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

    Comments: NeurIPS 2022

  33. arXiv:2202.00104  [pdf, other

    cs.LG cs.AI cs.MA

    Generalization in Cooperative Multi-Agent Systems

    Authors: Anuj Mahajan, Mikayel Samvelyan, Tarun Gupta, Benjamin Ellis, Mingfei Sun, Tim Rocktäschel, Shimon Whiteson

    Abstract: Collective intelligence is a fundamental trait shared by several species of living organisms. It has allowed them to thrive in the diverse environmental conditions that exist on our planet. From simple organisations in an ant colony to complex systems in human groups, collective intelligence is vital for solving complex survival tasks. As is commonly observed, such natural systems are flexible to… ▽ More

    Submitted 21 February, 2022; v1 submitted 31 January, 2022; originally announced February 2022.

  34. A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

    Authors: Robert Kirk, Amy Zhang, Edward Grefenstette, Tim Rocktäschel

    Abstract: The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to produce RL algorithms whose policies generalise well to novel unseen situations at deployment time, avoiding overfitting to their training environments. Tackling this is vital if we are to deploy reinforcement learning algorithms in real world scenarios, where the environment will be diverse, dynamic and unpred… ▽ More

    Submitted 19 January, 2023; v1 submitted 18 November, 2021; originally announced November 2021.

    Comments: JAIR version. Added formal definitions of ZSPT and related concepts, JAIR formatting, other small rewrites; https://www.jair.org/index.php/jair/article/view/14174

    Journal ref: Journal of Artificial Intelligence Research (JAIR), 76:201-264, 2023

  35. arXiv:2110.02439  [pdf, other

    cs.LG cs.AI

    Replay-Guided Adversarial Environment Design

    Authors: Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

    Abstract: Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emer… ▽ More

    Submitted 13 January, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021

  36. arXiv:2109.13202  [pdf, other

    cs.LG stat.ML

    MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

    Authors: Mikayel Samvelyan, Robert Kirk, Vitaly Kurin, Jack Parker-Holder, Minqi Jiang, Eric Hambro, Fabio Petroni, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel

    Abstract: Progress in deep reinforcement learning (RL) is heavily driven by the availability of challenging benchmarks used for training agents. However, benchmarks that are widely adopted by the community are not explicitly designed for evaluating specific capabilities of RL methods. While there exist environments for assessing particular open problems in RL (such as exploration, transfer learning, unsuper… ▽ More

    Submitted 16 November, 2021; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: NeurIPS 2021: Datasets and Benchmarks Track

  37. arXiv:2107.12460  [pdf, other

    cs.LG cs.AI

    Don't Sweep your Learning Rate under the Rug: A Closer Look at Cross-modal Transfer of Pretrained Transformers

    Authors: Danielle Rothermel, Margaret Li, Tim Rocktäschel, Jakob Foerster

    Abstract: Self-supervised pre-training of large-scale transformer models on text corpora followed by finetuning has achieved state-of-the-art on a number of natural language processing tasks. Recently, Lu et al. (2021, arXiv:2103.05247) claimed that frozen pretrained transformers (FPTs) match or outperform training from scratch as well as unfrozen (fine-tuned) pretrained transformers in a set of transfer ta… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

    Comments: Accepted to ICML 2021 Workshop: Self-Supervised Learning for Reasoning and Perception

  38. arXiv:2102.04220  [pdf, other

    cs.LG

    Grid-to-Graph: Flexible Spatial Relational Inductive Biases for Reinforcement Learning

    Authors: Zhengyao Jiang, Pasquale Minervini, Minqi Jiang, Tim Rocktaschel

    Abstract: Although reinforcement learning has been successfully applied in many domains in recent years, we still lack agents that can systematically generalize. While relational inductive biases that fit a task can improve generalization of RL agents, these biases are commonly hard-coded directly in the agent's neural architecture. In this work, we show that we can incorporate relational inductive biases,… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: Accepted by AAMAS 2021

  39. arXiv:2010.03934  [pdf, other

    cs.LG cs.AI

    Prioritized Level Replay

    Authors: Minqi Jiang, Edward Grefenstette, Tim Rocktäschel

    Abstract: Environments with procedurally generated content serve as important benchmarks for testing systematic generalization in deep reinforcement learning. In this setting, each level is an algorithmically created environment instance with a unique configuration of its factors of variation. Training on a prespecified subset of levels allows for testing generalization to unseen levels. What can be learned… ▽ More

    Submitted 12 June, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

  40. arXiv:2010.01856  [pdf, other

    cs.LG stat.ML

    My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

    Authors: Vitaly Kurin, Maximilian Igl, Tim Rocktäschel, Wendelin Boehmer, Shimon Whiteson

    Abstract: Multitask Reinforcement Learning is a promising way to obtain models with better performance, generalisation, data efficiency, and robustness. Most existing work is limited to compatible settings, where the state and action space dimensions are the same across tasks. Graph Neural Networks (GNN) are one way to address incompatible environments, because they can process graphs of arbitrary size. The… ▽ More

    Submitted 14 April, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: ICLR 2021 Camera-Ready Version

  41. arXiv:2010.00685  [pdf, other

    cs.CL cs.AI

    How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds

    Authors: Prithviraj Ammanabrolu, Jack Urbanek, Margaret Li, Arthur Szlam, Tim Rocktäschel, Jason Weston

    Abstract: We seek to create agents that both act and communicate with other agents in pursuit of a goal. Towards this end, we extend LIGHT (Urbanek et al. 2019) -- a large-scale crowd-sourced fantasy text-game -- with a dataset of quests. These contain natural language motivations paired with in-game goals and human demonstrations; completing a quest might require dialogue or actions (or both). We introduce… ▽ More

    Submitted 25 May, 2021; v1 submitted 1 October, 2020; originally announced October 2020.

    Comments: In NAACL 2021

  42. arXiv:2009.02252  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    KILT: a Benchmark for Knowledge Intensive Language Tasks

    Authors: Fabio Petroni, Aleksandra Piktus, Angela Fan, Patrick Lewis, Majid Yazdani, Nicola De Cao, James Thorne, Yacine Jernite, Vladimir Karpukhin, Jean Maillard, Vassilis Plachouras, Tim Rocktäschel, Sebastian Riedel

    Abstract: Challenging problems such as open-domain question answering, fact checking, slot filling and entity linking require access to large, external knowledge sources. While some models do well on individual tasks, developing general models is difficult as each task might require computationally expensive indexing of custom knowledge sources, in addition to dedicated infrastructure. To catalyze research… ▽ More

    Submitted 27 May, 2021; v1 submitted 4 September, 2020; originally announced September 2020.

    Comments: accepted at NAACL 2021

  43. arXiv:2007.09185  [pdf, other

    cs.AI cs.CL cs.LG

    WordCraft: An Environment for Benchmarking Commonsense Agents

    Authors: Minqi Jiang, Jelena Luketina, Nantas Nardelli, Pasquale Minervini, Philip H. S. Torr, Shimon Whiteson, Tim Rocktäschel

    Abstract: The ability to quickly solve a wide range of real-world tasks requires a commonsense understanding of the world. Yet, how to best extract such knowledge from natural language corpora and integrate it with reinforcement learning (RL) agents remains an open challenge. This is partly due to the lack of lightweight simulation environments that sufficiently reflect the semantics of the real world and p… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

  44. arXiv:2007.06477  [pdf, other

    cs.AI cs.CL cs.LG cs.NE cs.SC

    Learning Reasoning Strategies in End-to-End Differentiable Proving

    Authors: Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp, Edward Grefenstette, Tim Rocktäschel

    Abstract: Attempts to render deep learning models interpretable, data-efficient, and robust have seen some success through hybridisation with rule-based systems, for example, in Neural Theorem Provers (NTPs). These neuro-symbolic models can induce interpretable rules and learn representations from data via back-propagation, while providing logical explanations for their predictions. However, they are restri… ▽ More

    Submitted 24 August, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: Proceedings of the 37th International Conference on Machine Learning (ICML 2020)

  45. arXiv:2006.13760  [pdf, other

    cs.LG cs.AI cs.CL cs.NE stat.ML

    The NetHack Learning Environment

    Authors: Heinrich Küttler, Nantas Nardelli, Alexander H. Miller, Roberta Raileanu, Marco Selvatici, Edward Grefenstette, Tim Rocktäschel

    Abstract: Progress in Reinforcement Learning (RL) algorithms goes hand-in-hand with the development of challenging environments that test the limits of current methods. While existing RL environments are either sufficiently complex or based on fast simulation, they are rarely both. Here, we present the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging… ▽ More

    Submitted 1 December, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: 28 pages. Accepted at NeurIPS 2020

  46. arXiv:2006.12122  [pdf, other

    cs.LG cs.AI stat.ML

    Learning with AMIGo: Adversarially Motivated Intrinsic Goals

    Authors: Andres Campero, Roberta Raileanu, Heinrich Küttler, Joshua B. Tenenbaum, Tim Rocktäschel, Edward Grefenstette

    Abstract: A key challenge for reinforcement learning (RL) consists of learning in environments with sparse extrinsic rewards. In contrast to current RL methods, humans are able to learn new skills with little or no reward by using various forms of intrinsic motivation. We propose AMIGo, a novel agent incorporating -- as form of meta-learning -- a goal-generating teacher that proposes Adversarially Motivated… ▽ More

    Submitted 23 February, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: 18 pages, 6 figures, published at The Ninth International Conference on Learning Representations (2021)

  47. arXiv:2005.11401  [pdf, other

    cs.CL cs.LG

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

    Authors: Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela

    Abstract: Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for… ▽ More

    Submitted 12 April, 2021; v1 submitted 22 May, 2020; originally announced May 2020.

    Comments: Accepted at NeurIPS 2020

  48. arXiv:2005.04611  [pdf, other

    cs.CL

    How Context Affects Language Models' Factual Predictions

    Authors: Fabio Petroni, Patrick Lewis, Aleksandra Piktus, Tim Rocktäschel, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel

    Abstract: When pre-trained on large unsupervised textual corpora, language models are able to store and retrieve factual knowledge to some extent, making it possible to use them directly for zero-shot cloze-style question answering. However, storing factual knowledge in a fixed number of weights of a language model clearly has limitations. Previous approaches have successfully provided access to information… ▽ More

    Submitted 10 May, 2020; originally announced May 2020.

    Comments: accepted at AKBC 2020

  49. arXiv:2004.07790  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training

    Authors: Joe Stacey, Pasquale Minervini, Haim Dubossarsky, Sebastian Riedel, Tim Rocktäschel

    Abstract: Natural Language Inference (NLI) datasets contain annotation artefacts resulting in spurious correlations between the natural language utterances and their respective entailment classes. These artefacts are exploited by neural networks even when only considering the hypothesis and ignoring the premise, leading to unwanted biases. Belinkov et al. (2019b) proposed tackling this problem via adversari… ▽ More

    Submitted 27 May, 2021; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: Accepted at EMNLP 2020

  50. arXiv:2002.12292  [pdf, other

    cs.LG cs.AI

    RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments

    Authors: Roberta Raileanu, Tim Rocktäschel

    Abstract: Exploration in sparse reward environments remains one of the key challenges of model-free reinforcement learning. Instead of solely relying on extrinsic rewards provided by the environment, many state-of-the-art methods use intrinsic rewards to encourage exploration. However, we show that existing methods fall short in procedurally-generated environments where an agent is unlikely to visit a state… ▽ More

    Submitted 29 February, 2020; v1 submitted 27 February, 2020; originally announced February 2020.