Zum Hauptinhalt springen

Showing 1–22 of 22 results for author: Sacramento, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12275  [pdf, other

    cs.LG cs.NE

    When can transformers compositionally generalize in-context?

    Authors: Seijin Kobayashi, Simon Schug, Yassir Akram, Florian Redhardt, Johannes von Oswald, Razvan Pascanu, Guillaume Lajoie, João Sacramento

    Abstract: Many tasks can be composed from a few independent components. This gives rise to a combinatorial explosion of possible tasks, only some of which might be encountered during training. Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components? Here we study a modular multitask setting that allows us… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ICML 2024 workshop on Next Generation of Sequence Modeling Architectures

  2. arXiv:2406.08423  [pdf, other

    cs.LG cs.AI

    State Soup: In-Context Skill Learning, Retrieval and Mixing

    Authors: Maciej Pióro, Maciej Wołczyk, Razvan Pascanu, Johannes von Oswald, João Sacramento

    Abstract: A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range of sequence modeling problems. Such models naturally handle long sequences efficiently, as the cost of processing a new input is independent of sequence length. Here, we explore another advantage of these stateful sequence models, inspired by the success of model merging through parameter inte… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  3. arXiv:2406.05816  [pdf, other

    cs.LG

    Attention as a Hypernetwork

    Authors: Simon Schug, Seijin Kobayashi, Yassir Akram, João Sacramento, Razvan Pascanu

    Abstract: Transformers can under some circumstances generalize to novel problem instances whose constituent parts might have been encountered during training but whose compositions have not. What mechanisms underlie this ability for compositional generalization? By reformulating multi-head attention as a hypernetwork, we reveal that a low-dimensional latent code specifies key-query specific operations. We f… ▽ More

    Submitted 21 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Code available at https://github.com/smonsays/hypernetwork-attention

  4. arXiv:2312.15001  [pdf, other

    cs.LG cs.NE

    Discovering modular solutions that generalize compositionally

    Authors: Simon Schug, Seijin Kobayashi, Yassir Akram, Maciej Wołczyk, Alexandra Proca, Johannes von Oswald, Razvan Pascanu, João Sacramento, Angelika Steger

    Abstract: Many complex tasks can be decomposed into simpler, independent parts. Discovering such underlying compositional structure has the potential to enable compositional generalization. Despite progress, our most powerful systems struggle to compose flexibly. It therefore seems natural to make models more modular to help capture the compositional nature of many tasks. However, it is unclear under which… ▽ More

    Submitted 25 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Published as a conference paper at ICLR 2024; Code available at https://github.com/smonsays/modular-hyperteacher

  5. arXiv:2309.05858  [pdf, other

    cs.LG cs.AI

    Uncovering mesa-optimization algorithms in Transformers

    Authors: Johannes von Oswald, Eyvind Niklasson, Maximilian Schlegel, Seijin Kobayashi, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu, João Sacramento

    Abstract: Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood. Here, we hypothesize that the strong performance of Transformers stems from an architectural bias towards mesa-optimization, a learned process running within the forward pass of a model consisting of the following two steps: (i) the construction of an internal learning… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  6. arXiv:2309.01775  [pdf, other

    cs.LG cs.NE

    Gated recurrent neural networks discover attention

    Authors: Nicolas Zucchet, Seijin Kobayashi, Yassir Akram, Johannes von Oswald, Maxime Larcher, Angelika Steger, João Sacramento

    Abstract: Recent architectural developments have enabled recurrent neural networks (RNNs) to reach and even surpass the performance of Transformers on certain sequence modeling tasks. These modern RNNs feature a prominent design pattern: linear recurrent layers interconnected by feedforward paths with multiplicative gating. Here, we show how RNNs equipped with these two design elements can exactly implement… ▽ More

    Submitted 7 February, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

  7. arXiv:2305.15947  [pdf, other

    cs.LG cs.NE

    Online learning of long-range dependencies

    Authors: Nicolas Zucchet, Robert Meier, Simon Schug, Asier Mujika, João Sacramento

    Abstract: Online learning holds the promise of enabling efficient long-term credit assignment in recurrent neural networks. However, current algorithms fall short of offline backpropagation by either not being scalable or failing to learn long-range dependencies. Here we present a high-performance online learning algorithm that merely doubles the memory and computational requirements of a single inference p… ▽ More

    Submitted 6 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted at NeurIPS 2023

  8. arXiv:2212.07677  [pdf, other

    cs.LG cs.AI cs.CL

    Transformers learn in-context by gradient descent

    Authors: Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, Max Vladymyrov

    Abstract: At present, the mechanisms of in-context learning in Transformers are not well understood and remain mostly an intuition. In this paper, we suggest that training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We start by providing a simple weight construction that shows the equivalence of data transformations induced by 1) a single linea… ▽ More

    Submitted 31 May, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

  9. arXiv:2207.01332  [pdf, other

    cs.LG cs.NE

    The least-control principle for local learning at equilibrium

    Authors: Alexander Meulemans, Nicolas Zucchet, Seijin Kobayashi, Johannes von Oswald, João Sacramento

    Abstract: Equilibrium systems are a powerful way to express neural computations. As special cases, they include models of great current interest in both neuroscience and machine learning, such as deep neural networks, equilibrium recurrent neural networks, deep equilibrium models, or meta-learning. Here, we present a new principle for learning such systems with a temporally- and spatially-local rule. Our pr… ▽ More

    Submitted 31 October, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

    Comments: Published at NeurIPS 2022. 56 pages

    MSC Class: 68T07 ACM Class: I.2.6

  10. Beyond backpropagation: bilevel optimization through implicit differentiation and equilibrium propagation

    Authors: Nicolas Zucchet, João Sacramento

    Abstract: This paper reviews gradient-based techniques to solve bilevel optimization problems. Bilevel optimization is a general way to frame the learning of systems that are implicitly defined through a quantity that they minimize. This characterization can be applied to neural networks, optimizers, algorithmic solvers and even physical systems, and allows for greater modeling flexibility compared to an ex… ▽ More

    Submitted 27 October, 2022; v1 submitted 6 May, 2022; originally announced May 2022.

  11. arXiv:2204.07249  [pdf, other

    cs.NE cs.LG

    Minimizing Control for Credit Assignment with Strong Feedback

    Authors: Alexander Meulemans, Matilde Tristany Farinha, Maria R. Cervera, João Sacramento, Benjamin F. Grewe

    Abstract: The success of deep learning ignited interest in whether the brain learns hierarchical representations using gradient-based learning. However, current biologically plausible methods for gradient-based credit assignment in deep neural networks need infinitesimally small feedback signals, which is problematic in biologically realistic noisy environments and at odds with experimental evidence in neur… ▽ More

    Submitted 22 June, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

    Comments: 26 pages, 4 figures

    MSC Class: 68T07 ACM Class: I.2.6

  12. arXiv:2110.14402  [pdf, other

    cs.LG cs.NE

    Learning where to learn: Gradient sparsity in meta and continual learning

    Authors: Johannes von Oswald, Dominic Zhao, Seijin Kobayashi, Simon Schug, Massimo Caccia, Nicolas Zucchet, João Sacramento

    Abstract: Finding neural network weights that generalize well from small datasets is difficult. A promising approach is to learn a weight initialization such that a small number of weight changes results in low generalization error. We show that this form of meta-learning can be improved by letting the learning algorithm decide which weights to change, i.e., by learning where to learn. We find that patterne… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: Published at NeurIPS 2021

  13. arXiv:2106.07887  [pdf, other

    cs.LG

    Credit Assignment in Neural Networks through Deep Feedback Control

    Authors: Alexander Meulemans, Matilde Tristany Farinha, Javier García Ordóñez, Pau Vilimelis Aceituno, João Sacramento, Benjamin F. Grewe

    Abstract: The success of deep learning sparked interest in whether the brain learns by using similar techniques for assigning credit to each synaptic weight for its contribution to the network output. However, the majority of current attempts at biologically-plausible learning methods are either non-local in time, require highly specific connectivity motives, or have no clear link to any known mathematical… ▽ More

    Submitted 17 January, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: 14 pages and 4 figures in the main manuscript; 49 pages and 15 figures in the supplementary materials

    MSC Class: 68T07 ACM Class: I.2.6

  14. arXiv:2104.01677  [pdf, other

    cs.LG cs.NE q-bio.NC

    A contrastive rule for meta-learning

    Authors: Nicolas Zucchet, Simon Schug, Johannes von Oswald, Dominic Zhao, João Sacramento

    Abstract: Humans and other animals are capable of improving their learning performance as they solve related tasks from a given problem domain, to the point of being able to learn from extremely limited data. While synaptic plasticity is generically thought to underlie learning in the brain, the precise neural and synaptic mechanisms by which learning processes improve through experience are not well unders… ▽ More

    Submitted 3 October, 2022; v1 submitted 4 April, 2021; originally announced April 2021.

    Comments: 32 pages, 10 figures, published at NeurIPS 2022

  15. arXiv:2103.01133  [pdf, other

    cs.LG cs.AI

    Posterior Meta-Replay for Continual Learning

    Authors: Christian Henning, Maria R. Cervera, Francesco D'Angelo, Johannes von Oswald, Regina Traber, Benjamin Ehret, Seijin Kobayashi, Benjamin F. Grewe, João Sacramento

    Abstract: Learning a sequence of tasks without access to i.i.d. observations is a widely studied form of continual learning (CL) that remains challenging. In principle, Bayesian learning directly applies to this setting, since recursive and one-off Bayesian updates yield the same result. In practice, however, recursive updating often leads to poor trade-off solutions across tasks because approximate inferen… ▽ More

    Submitted 21 October, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: Published at NeurIPS 2021

  16. arXiv:2007.12927  [pdf, other

    cs.LG cs.CV stat.ML

    Neural networks with late-phase weights

    Authors: Johannes von Oswald, Seijin Kobayashi, Alexander Meulemans, Christian Henning, Benjamin F. Grewe, João Sacramento

    Abstract: The largely successful method of training neural networks is to learn their weights using some variant of stochastic gradient descent (SGD). Here, we show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning. At the end of learning, we obtain back a single model by taking a spatial average in weight space. To avoid incurring incre… ▽ More

    Submitted 11 April, 2022; v1 submitted 25 July, 2020; originally announced July 2020.

    Comments: 25 pages, 6 figures

    Journal ref: Published as a conference paper at ICLR 2021

  17. arXiv:2006.14331  [pdf, other

    cs.LG stat.ML

    A Theoretical Framework for Target Propagation

    Authors: Alexander Meulemans, Francesco S. Carzaniga, Johan A. K. Suykens, João Sacramento, Benjamin F. Grewe

    Abstract: The success of deep learning, a brain-inspired form of AI, has sparked interest in understanding how the brain could similarly learn across multiple layers of neurons. However, the majority of biologically-plausible learning algorithms have not yet reached the performance of backpropagation (BP), nor are they built on strong theoretical foundations. Here, we analyze target propagation (TP), a popu… ▽ More

    Submitted 16 December, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

    Comments: 13 pages and 4 figures in main manuscript; 41 pages and 8 figures in supplementary material

    MSC Class: 68T07

  18. arXiv:1911.08585  [pdf, other

    q-bio.NC cs.LG stat.ML

    Ghost Units Yield Biologically Plausible Backprop in Deep Neural Networks

    Authors: Thomas Mesnard, Gaetan Vignoud, Joao Sacramento, Walter Senn, Yoshua Bengio

    Abstract: In the past few years, deep learning has transformed artificial intelligence research and led to impressive performance in various difficult tasks. However, it is still unclear how the brain can perform credit assignment across many areas as efficiently as backpropagation does in deep neural networks. In this paper, we introduce a model that relies on a new role for a neuronal inhibitory machinery… ▽ More

    Submitted 15 November, 2019; originally announced November 2019.

  19. arXiv:1906.00695  [pdf, other

    cs.LG cs.AI stat.ML

    Continual learning with hypernetworks

    Authors: Johannes von Oswald, Christian Henning, Benjamin F. Grewe, João Sacramento

    Abstract: Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, we present a novel approach based on task-conditioned hypernetworks, i.e., networks that generate the weights of a target model based on task identity. Continual learning (CL) is less difficult for this class of models thanks to a simple key feature: instea… ▽ More

    Submitted 11 April, 2022; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: Published at ICLR 2020

    MSC Class: 68T99

  20. arXiv:1810.11393  [pdf, other

    q-bio.NC cs.LG cs.NE

    Dendritic cortical microcircuits approximate the backpropagation algorithm

    Authors: João Sacramento, Rui Ponte Costa, Yoshua Bengio, Walter Senn

    Abstract: Deep learning has seen remarkable developments over the last years, many of them inspired by neuroscience. However, the main learning mechanism behind these advances - error backpropagation - appears to be at odds with neurobiology. Here, we introduce a multilayer neuronal network model with simplified dendritic compartments in which error-driven synaptic plasticity adapts the network towards a gl… ▽ More

    Submitted 26 October, 2018; originally announced October 2018.

    Comments: To appear in Advances in Neural Information Processing Systems 31 (NIPS 2018). 12 pages, 3 figures, 9 pages of supplementary material (2 supplementary figures)

  21. arXiv:1801.00062  [pdf, other

    q-bio.NC cs.LG cs.NE

    Dendritic error backpropagation in deep cortical microcircuits

    Authors: João Sacramento, Rui Ponte Costa, Yoshua Bengio, Walter Senn

    Abstract: Animal behaviour depends on learning to associate sensory stimuli with the desired motor command. Understanding how the brain orchestrates the necessary synaptic modifications across different brain areas has remained a longstanding puzzle. Here, we introduce a multi-area neuronal network model in which synaptic plasticity continuously adapts the network towards a global desired output. In this mo… ▽ More

    Submitted 29 December, 2017; originally announced January 2018.

    Comments: 27 pages, 5 figures, 10 pages supplementary information

  22. arXiv:1606.01651  [pdf, other

    cs.LG cs.NE q-bio.NC

    Feedforward Initialization for Fast Inference of Deep Generative Networks is biologically plausible

    Authors: Yoshua Bengio, Benjamin Scellier, Olexa Bilaniuk, Joao Sacramento, Walter Senn

    Abstract: We consider deep multi-layered generative models such as Boltzmann machines or Hopfield nets in which computation (which implements inference) is both recurrent and stochastic, but where the recurrence is not to model sequential structure, only to perform computation. We find conditions under which a simple feedforward computation is a very good initialization for inference, after the input units… ▽ More

    Submitted 27 June, 2016; v1 submitted 6 June, 2016; originally announced June 2016.