Zum Hauptinhalt springen

Showing 1–11 of 11 results for author: Kumaran, D

.
  1. arXiv:2406.04267  [pdf, other

    cs.CL cs.LG

    Transformers need glasses! Information over-squashing in language tasks

    Authors: Federico Barbero, Andrea Banino, Steven Kapturowski, Dharshan Kumaran, João G. M. Araújo, Alex Vitvitskyi, Razvan Pascanu, Petar Veličković

    Abstract: We study how information propagates in decoder-only Transformers, which are the architectural backbone of most existing frontier large language models (LLMs). We rely on a theoretical signal propagation analysis -- specifically, we analyse the representations of the last token in the final layer of the Transformer, as this is the representation used for next-token prediction. Our analysis reveals… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2403.07750  [pdf, other

    cs.CV cs.AI

    Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings

    Authors: Sahand Sharifzadeh, Christos Kaplanis, Shreya Pathak, Dharshan Kumaran, Anastasija Ilic, Jovana Mitrovic, Charles Blundell, Andrea Banino

    Abstract: The creation of high-quality human-labeled image-caption datasets presents a significant bottleneck in the development of Visual-Language Models (VLMs). In this work, we investigate an approach that leverages the strengths of Large Language Models (LLMs) and image generation models to create synthetic image-text pairs for efficient and effective VLM training. Our method employs a pretrained text-t… ▽ More

    Submitted 7 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: 9 pages, 6 figures

  3. arXiv:2210.05675  [pdf, other

    cs.CL cs.AI cs.LG

    Transformers generalize differently from information stored in context vs in weights

    Authors: Stephanie C. Y. Chan, Ishita Dasgupta, Junkyung Kim, Dharshan Kumaran, Andrew K. Lampinen, Felix Hill

    Abstract: Transformer models can use two fundamentally different kinds of information: information stored in weights during training, and information provided ``in-context'' at inference time. In this work, we show that transformers exhibit different inductive biases in how they represent and generalize from the information in these two sources. In particular, we characterize whether they generalize via par… ▽ More

    Submitted 13 October, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

  4. arXiv:2207.07051  [pdf, other

    cs.CL cs.AI cs.LG

    Language models show human-like content effects on reasoning tasks

    Authors: Ishita Dasgupta, Andrew K. Lampinen, Stephanie C. Y. Chan, Hannah R. Sheahan, Antonia Creswell, Dharshan Kumaran, James L. McClelland, Felix Hill

    Abstract: Reasoning is a key ability for an intelligent system. Large language models (LMs) achieve above-chance performance on abstract reasoning tasks, but exhibit many imperfections. However, human abstract reasoning is also imperfect. For example, human reasoning is affected by our real-world knowledge and beliefs, and shows notable "content effects"; humans reason more reliably when the semantic conten… ▽ More

    Submitted 17 July, 2024; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: Published version of record: https://academic.oup.com/pnasnexus/article/3/7/pgae233/7712372

  5. arXiv:2102.06511  [pdf, other

    cs.CR cs.LG

    A Non-Intrusive Machine Learning Solution for Malware Detection and Data Theft Classification in Smartphones

    Authors: Sai Vishwanath Venkatesh, Prasanna D. Kumaran, Joish J Bosco, Pravin R. Kumaar, Vineeth Vijayaraghavan

    Abstract: Smartphones contain information that is more sensitive and personal than those found on computers and laptops. With an increase in the versatility of smartphone functionality, more data has become vulnerable and exposed to attackers. Successful mobile malware attacks could steal a user's location, photos, or even banking information. Due to a lack of post-attack strategies firms also risk going ou… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

  6. arXiv:2001.10913  [pdf, other

    cs.LG cs.AI

    MEMO: A Deep Network for Flexible Combination of Episodic Memories

    Authors: Andrea Banino, Adrià Puigdomènech Badia, Raphael Köster, Martin J. Chadwick, Vinicius Zambaldi, Demis Hassabis, Caswell Barry, Matthew Botvinick, Dharshan Kumaran, Charles Blundell

    Abstract: Recent research developing neural network architectures with external memory have often used the benchmark bAbI question and answering dataset which provides a challenging number of tasks requiring reasoning. Here we employed a classic associative inference task from the memory-based reasoning neuroscience literature in order to more carefully probe the reasoning capacity of existing memory-augmen… ▽ More

    Submitted 29 January, 2020; originally announced January 2020.

    Comments: 9 pages, 2 figures, 3 tables, to be published as a conference paper at ICLR 2020

    ACM Class: I.2.6

  7. arXiv:1712.01815  [pdf, other

    cs.AI cs.LG

    Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

    Authors: David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis

    Abstract: The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game… ▽ More

    Submitted 5 December, 2017; originally announced December 2017.

  8. arXiv:1711.08378  [pdf

    cs.AI

    Building Machines that Learn and Think for Themselves: Commentary on Lake et al., Behavioral and Brain Sciences, 2017

    Authors: M. Botvinick, D. G. T. Barrett, P. Battaglia, N. de Freitas, D. Kumaran, J. Z Leibo, T. Lillicrap, J. Modayil, S. Mohamed, N. C. Rabinowitz, D. J. Rezende, A. Santoro, T. Schaul, C. Summerfield, G. Wayne, T. Weber, D. Wierstra, S. Legg, D. Hassabis

    Abstract: We agree with Lake and colleagues on their list of key ingredients for building humanlike intelligence, including the idea that model-based reasoning is essential. However, we favor an approach that centers on one additional ingredient: autonomy. In particular, we aim toward agents that can both build and exploit their own internal models, with minimal human hand-engineering. We believe an approac… ▽ More

    Submitted 22 November, 2017; originally announced November 2017.

  9. arXiv:1612.00796  [pdf, other

    cs.LG cs.AI stat.ML

    Overcoming catastrophic forgetting in neural networks

    Authors: James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, Raia Hadsell

    Abstract: The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Neural networks are not, in general, capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks which they have… ▽ More

    Submitted 25 January, 2017; v1 submitted 2 December, 2016; originally announced December 2016.

  10. arXiv:1611.05763  [pdf, other

    cs.LG cs.AI stat.ML

    Learning to reinforcement learn

    Authors: Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, Matt Botvinick

    Abstract: In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. However, a major limitation of such applications is their demand for massive amounts of training data. A critical present objective is thus to develop deep RL methods that can adapt rapidly to new tasks. In the present work we introduce a novel approach to this cha… ▽ More

    Submitted 23 January, 2017; v1 submitted 17 November, 2016; originally announced November 2016.

    Comments: 17 pages, 7 figures, 1 table

  11. arXiv:1611.03673  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    Learning to Navigate in Complex Environments

    Authors: Piotr Mirowski, Razvan Pascanu, Fabio Viola, Hubert Soyer, Andrew J. Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharshan Kumaran, Raia Hadsell

    Abstract: Learning to navigate in complex environments with dynamic elements is an important milestone in developing AI agents. In this work we formulate the navigation question as a reinforcement learning problem and show that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs. In particular we consider jointly lea… ▽ More

    Submitted 13 January, 2017; v1 submitted 11 November, 2016; originally announced November 2016.

    Comments: 11 pages, 5 appendix pages, 11 figures, 3 tables, under review as a conference paper at ICLR 2017