Skip to main content

Showing 1–18 of 18 results for author: Rusu, A A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.10616  [pdf, other

    cs.LG cs.CL

    DiPaCo: Distributed Path Composition

    Authors: Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Adhiguna Kuncoro, Yani Donchev, Rachita Chhaparia, Ionel Gog, Marc'Aurelio Ranzato, Jiajun Shen, Arthur Szlam

    Abstract: Progress in machine learning (ML) has been fueled by scaling neural network models. This scaling has been enabled by ever more heroic feats of engineering, necessary for accommodating ML approaches that require high bandwidth communication between devices working in parallel. In this work, we propose a co-designed modular architecture and training approach for ML models, dubbed DIstributed PAth CO… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  2. arXiv:2401.09135  [pdf, other

    cs.LG cs.CL

    Asynchronous Local-SGD Training for Language Modeling

    Authors: Bo Liu, Rachita Chhaparia, Arthur Douillard, Satyen Kale, Andrei A. Rusu, Jiajun Shen, Arthur Szlam, Marc'Aurelio Ranzato

    Abstract: Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication. This work presents an empirical study of {\it asynchronous} Local-SGD for training language models; that is, each worker updates the global parameters as soon as it has finished its SGD steps. We co… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  3. arXiv:2311.08105  [pdf, other

    cs.LG cs.CL

    DiLoCo: Distributed Low-Communication Training of Language Models

    Authors: Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Rachita Chhaparia, Yani Donchev, Adhiguna Kuncoro, Marc'Aurelio Ranzato, Arthur Szlam, Jiajun Shen

    Abstract: Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of tightly interconnected accelerators, with devices exchanging gradients and other intermediate states at each optimization step. While it is difficult to build and maintain a single computing cluster hosting many accelerators,… ▽ More

    Submitted 2 December, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

  4. arXiv:2211.11747  [pdf, other

    cs.LG cs.CV

    NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research

    Authors: Jorg Bornschein, Alexandre Galashov, Ross Hemsley, Amal Rannen-Triki, Yutian Chen, Arslan Chaudhry, Xu Owen He, Arthur Douillard, Massimo Caccia, Qixuang Feng, Jiajun Shen, Sylvestre-Alvise Rebuffi, Kitty Stacpoole, Diego de las Casas, Will Hawkins, Angeliki Lazaridou, Yee Whye Teh, Andrei A. Rusu, Razvan Pascanu, Marc'Aurelio Ranzato

    Abstract: A shared goal of several machine learning communities like continual learning, meta-learning and transfer learning, is to design algorithms and models that efficiently and robustly adapt to unseen tasks. An even more ambitious goal is to build models that never stop adapting, and that become increasingly more efficient through time by suitably transferring the accrued knowledge. Beyond the study o… ▽ More

    Submitted 16 May, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

  5. arXiv:2210.13982  [pdf, other

    cs.LG cs.CR

    Hindering Adversarial Attacks with Implicit Neural Representations

    Authors: Andrei A. Rusu, Dan A. Calian, Sven Gowal, Raia Hadsell

    Abstract: We introduce the Lossy Implicit Network Activation Coding (LINAC) defence, an input transformation which successfully hinders several common adversarial attacks on CIFAR-$10$ classifiers for perturbations up to $ε= 8/255$ in $L_\infty$ norm and $ε= 0.5$ in $L_2$ norm. Implicit neural representations are used to approximately encode pixel colour intensities in $2\text{D}$ images such that classifie… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

    Journal ref: PMLR 162 (2022) 18910-18934

  6. arXiv:2210.12448  [pdf, other

    cs.LG

    Probing Transfer in Deep Reinforcement Learning without Task Engineering

    Authors: Andrei A. Rusu, Sebastian Flennerhag, Dushyant Rao, Razvan Pascanu, Raia Hadsell

    Abstract: We evaluate the use of original game curricula supported by the Atari 2600 console as a heterogeneous transfer benchmark for deep reinforcement learning agents. Game designers created curricula using combinations of several discrete modifications to the basic versions of games such as Space Invaders, Breakout and Freeway, making them progressively more challenging for human players. By formally or… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

  7. arXiv:2201.07372  [pdf, other

    cs.LG cs.AI

    Prospective Learning: Principled Extrapolation to the Future

    Authors: Ashwin De Silva, Rahul Ramesh, Lyle Ungar, Marshall Hussain Shuler, Noah J. Cowan, Michael Platt, Chen Li, Leyla Isik, Seung-Eon Roh, Adam Charles, Archana Venkataraman, Brian Caffo, Javier J. How, Justus M Kebschull, John W. Krakauer, Maxim Bichuch, Kaleab Alemayehu Kinfu, Eva Yezerets, Dinesh Jayaraman, Jong M. Shin, Soledad Villar, Ian Phillips, Carey E. Priebe, Thomas Hartung, Michael I. Miller , et al. (18 additional authors not shown)

    Abstract: Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenari… ▽ More

    Submitted 13 July, 2023; v1 submitted 18 January, 2022; originally announced January 2022.

    Comments: Accepted at the 2nd Conference on Lifelong Learning Agents (CoLLAs), 2023

  8. arXiv:1910.14481  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Continual Unsupervised Representation Learning

    Authors: Dushyant Rao, Francesco Visin, Andrei A. Rusu, Yee Whye Teh, Razvan Pascanu, Raia Hadsell

    Abstract: Continual learning aims to improve the ability of modern learning systems to deal with non-stationary distributions, typically by attempting to learn a series of tasks sequentially. Prior art in the field has largely considered supervised or reinforcement learning tasks, and often assumes full knowledge of task labels and boundaries. In this work, we propose an approach (CURL) to tackle a more gen… ▽ More

    Submitted 31 October, 2019; originally announced October 2019.

    Comments: NeurIPS 2019

  9. arXiv:1909.00025  [pdf, other

    cs.LG cs.NE stat.ML

    Meta-Learning with Warped Gradient Descent

    Authors: Sebastian Flennerhag, Andrei A. Rusu, Razvan Pascanu, Francesco Visin, Hujun Yin, Raia Hadsell

    Abstract: Learning an efficient update rule from data that promotes rapid learning of new tasks from the same distribution remains an open problem in meta-learning. Typically, previous works have approached this issue either by attempting to train a neural network that directly produces updates or by attempting to learn better initialisations or scaling factors for a gradient-based update rule. Both of thes… ▽ More

    Submitted 18 February, 2020; v1 submitted 30 August, 2019; originally announced September 2019.

    Comments: 28 pages, 13 figures, 3 tables. Published as a conference paper at ICLR 2020

  10. arXiv:1906.05201  [pdf, other

    stat.ML cs.LG cs.NE

    Task Agnostic Continual Learning via Meta Learning

    Authors: Xu He, Jakub Sygnowski, Alexandre Galashov, Andrei A. Rusu, Yee Whye Teh, Razvan Pascanu

    Abstract: While neural networks are powerful function approximators, they suffer from catastrophic forgetting when the data distribution is not stationary. One particular formalism that studies learning under non-stationary distribution is provided by continual learning, where the non-stationarity is imposed by a sequence of distinct tasks. Most methods in this space assume, however, the knowledge of task b… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

  11. arXiv:1807.05960  [pdf, other

    cs.LG cs.CV stat.ML

    Meta-Learning with Latent Embedding Optimization

    Authors: Andrei A. Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, Raia Hadsell

    Abstract: Gradient-based meta-learning techniques are both widely applicable and proficient at solving challenging few-shot learning and fast adaptation problems. However, they have practical difficulties when operating on high-dimensional parameter spaces in extreme low-data regimes. We show that it is possible to bypass these limitations by learning a data-dependent latent generative representation of mod… ▽ More

    Submitted 26 March, 2019; v1 submitted 16 July, 2018; originally announced July 2018.

  12. arXiv:1806.07917  [pdf, other

    cs.NE cs.AI cs.LG

    Meta-Learning by the Baldwin Effect

    Authors: Chrisantha Thomas Fernando, Jakub Sygnowski, Simon Osindero, Jane Wang, Tom Schaul, Denis Teplyashin, Pablo Sprechmann, Alexander Pritzel, Andrei A. Rusu

    Abstract: The scope of the Baldwin effect was recently called into question by two papers that closely examined the seminal work of Hinton and Nowlan. To this date there has been no demonstration of its necessity in empirically challenging tasks. Here we show that the Baldwin effect is capable of evolving few-shot supervised and reinforcement learning mechanisms, by shaping the hyperparameters and the initi… ▽ More

    Submitted 22 June, 2018; v1 submitted 6 June, 2018; originally announced June 2018.

  13. arXiv:1707.08475  [pdf, other

    stat.ML cs.AI cs.LG

    DARLA: Improving Zero-Shot Transfer in Reinforcement Learning

    Authors: Irina Higgins, Arka Pal, Andrei A. Rusu, Loic Matthey, Christopher P Burgess, Alexander Pritzel, Matthew Botvinick, Charles Blundell, Alexander Lerchner

    Abstract: Domain adaptation is an important open problem in deep reinforcement learning (RL). In many scenarios of interest data is hard to obtain, so agents may learn a source policy in a setting where data is readily available, with the hope that it generalises well to the target domain. We propose a new multi-stage RL agent, DARLA (DisentAngled Representation Learning Agent), which learns to see before l… ▽ More

    Submitted 6 June, 2018; v1 submitted 26 July, 2017; originally announced July 2017.

    Comments: ICML 2017

  14. arXiv:1701.08734  [pdf, other

    cs.NE cs.LG

    PathNet: Evolution Channels Gradient Descent in Super Neural Networks

    Authors: Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A. Rusu, Alexander Pritzel, Daan Wierstra

    Abstract: For artificial general intelligence (AGI) it would be efficient if multiple users trained the same giant neural network, permitting parameter reuse, without catastrophic forgetting. PathNet is a first step in this direction. It is a neural network algorithm that uses agents embedded in the neural network whose task is to discover which parts of the network to re-use for new tasks. Agents are pathw… ▽ More

    Submitted 30 January, 2017; originally announced January 2017.

  15. arXiv:1612.00796  [pdf, other

    cs.LG cs.AI stat.ML

    Overcoming catastrophic forgetting in neural networks

    Authors: James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, Raia Hadsell

    Abstract: The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Neural networks are not, in general, capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks which they have… ▽ More

    Submitted 25 January, 2017; v1 submitted 2 December, 2016; originally announced December 2016.

  16. arXiv:1610.04286  [pdf, other

    cs.RO cs.LG

    Sim-to-Real Robot Learning from Pixels with Progressive Nets

    Authors: Andrei A. Rusu, Mel Vecerik, Thomas Rothörl, Nicolas Heess, Razvan Pascanu, Raia Hadsell

    Abstract: Applying end-to-end learning to solve complex, interactive, pixel-driven control tasks on a robot is an unsolved problem. Deep Reinforcement Learning algorithms are too slow to achieve performance on a real robot, but their potential has been demonstrated in simulated environments. We propose using progressive networks to bridge the reality gap and transfer learned policies from simulation to the… ▽ More

    Submitted 22 May, 2018; v1 submitted 13 October, 2016; originally announced October 2016.

  17. arXiv:1606.04671  [pdf, other

    cs.LG

    Progressive Neural Networks

    Authors: Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, Raia Hadsell

    Abstract: Learning to solve complex sequences of tasks--while both leveraging transfer and avoiding catastrophic forgetting--remains a key obstacle to achieving human-level intelligence. The progressive networks approach represents a step forward in this direction: they are immune to forgetting and can leverage prior knowledge via lateral connections to previously learned features. We evaluate this architec… ▽ More

    Submitted 22 October, 2022; v1 submitted 15 June, 2016; originally announced June 2016.

  18. arXiv:1511.06295  [pdf, other

    cs.LG

    Policy Distillation

    Authors: Andrei A. Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, Raia Hadsell

    Abstract: Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance. In this work, we present a novel method called policy distillation that can be used to extract the policy of a reinforcement learning agent and… ▽ More

    Submitted 7 January, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Submitted to ICLR 2016