Zum Hauptinhalt springen

Showing 1–8 of 8 results for author: Schug, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12275  [pdf, other

    cs.LG cs.NE

    When can transformers compositionally generalize in-context?

    Authors: Seijin Kobayashi, Simon Schug, Yassir Akram, Florian Redhardt, Johannes von Oswald, Razvan Pascanu, Guillaume Lajoie, João Sacramento

    Abstract: Many tasks can be composed from a few independent components. This gives rise to a combinatorial explosion of possible tasks, only some of which might be encountered during training. Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components? Here we study a modular multitask setting that allows us… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ICML 2024 workshop on Next Generation of Sequence Modeling Architectures

  2. arXiv:2406.05816  [pdf, other

    cs.LG

    Attention as a Hypernetwork

    Authors: Simon Schug, Seijin Kobayashi, Yassir Akram, João Sacramento, Razvan Pascanu

    Abstract: Transformers can under some circumstances generalize to novel problem instances whose constituent parts might have been encountered during training but whose compositions have not. What mechanisms underlie this ability for compositional generalization? By reformulating multi-head attention as a hypernetwork, we reveal that a low-dimensional latent code specifies key-query specific operations. We f… ▽ More

    Submitted 21 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Code available at https://github.com/smonsays/hypernetwork-attention

  3. arXiv:2312.15001  [pdf, other

    cs.LG cs.NE

    Discovering modular solutions that generalize compositionally

    Authors: Simon Schug, Seijin Kobayashi, Yassir Akram, Maciej Wołczyk, Alexandra Proca, Johannes von Oswald, Razvan Pascanu, João Sacramento, Angelika Steger

    Abstract: Many complex tasks can be decomposed into simpler, independent parts. Discovering such underlying compositional structure has the potential to enable compositional generalization. Despite progress, our most powerful systems struggle to compose flexibly. It therefore seems natural to make models more modular to help capture the compositional nature of many tasks. However, it is unclear under which… ▽ More

    Submitted 25 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Published as a conference paper at ICLR 2024; Code available at https://github.com/smonsays/modular-hyperteacher

  4. arXiv:2306.16803  [pdf, other

    cs.LG stat.ML

    Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis

    Authors: Alexander Meulemans, Simon Schug, Seijin Kobayashi, Nathaniel Daw, Gregory Wayne

    Abstract: To make reinforcement learning more sample efficient, we need better credit assignment methods that measure an action's influence on future rewards. Building upon Hindsight Credit Assignment (HCA), we introduce Counterfactual Contribution Analysis (COCOA), a new family of model-based credit assignment algorithms. Our algorithms achieve precise credit assignment by measuring the contribution of act… ▽ More

    Submitted 31 October, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 spotlight

  5. arXiv:2305.15947  [pdf, other

    cs.LG cs.NE

    Online learning of long-range dependencies

    Authors: Nicolas Zucchet, Robert Meier, Simon Schug, Asier Mujika, João Sacramento

    Abstract: Online learning holds the promise of enabling efficient long-term credit assignment in recurrent neural networks. However, current algorithms fall short of offline backpropagation by either not being scalable or failing to learn long-range dependencies. Here we present a high-performance online learning algorithm that merely doubles the memory and computational requirements of a single inference p… ▽ More

    Submitted 6 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted at NeurIPS 2023

  6. arXiv:2209.07509  [pdf, other

    cs.LG

    Random initialisations performing above chance and how to find them

    Authors: Frederik Benzing, Simon Schug, Robert Meier, Johannes von Oswald, Yassir Akram, Nicolas Zucchet, Laurence Aitchison, Angelika Steger

    Abstract: Neural networks trained with stochastic gradient descent (SGD) starting from different random initialisations typically find functionally very similar solutions, raising the question of whether there are meaningful differences between different SGD solutions. Entezari et al.\ recently conjectured that despite different initialisations, the solutions found by SGD lie in the same loss valley after t… ▽ More

    Submitted 7 November, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: NeurIPS 2022, 14th Annual Workshop on Optimization for Machine Learning (OPT2022)

  7. arXiv:2110.14402  [pdf, other

    cs.LG cs.NE

    Learning where to learn: Gradient sparsity in meta and continual learning

    Authors: Johannes von Oswald, Dominic Zhao, Seijin Kobayashi, Simon Schug, Massimo Caccia, Nicolas Zucchet, João Sacramento

    Abstract: Finding neural network weights that generalize well from small datasets is difficult. A promising approach is to learn a weight initialization such that a small number of weight changes results in low generalization error. We show that this form of meta-learning can be improved by letting the learning algorithm decide which weights to change, i.e., by learning where to learn. We find that patterne… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: Published at NeurIPS 2021

  8. arXiv:2104.01677  [pdf, other

    cs.LG cs.NE q-bio.NC

    A contrastive rule for meta-learning

    Authors: Nicolas Zucchet, Simon Schug, Johannes von Oswald, Dominic Zhao, João Sacramento

    Abstract: Humans and other animals are capable of improving their learning performance as they solve related tasks from a given problem domain, to the point of being able to learn from extremely limited data. While synaptic plasticity is generically thought to underlie learning in the brain, the precise neural and synaptic mechanisms by which learning processes improve through experience are not well unders… ▽ More

    Submitted 3 October, 2022; v1 submitted 4 April, 2021; originally announced April 2021.

    Comments: 32 pages, 10 figures, published at NeurIPS 2022