Skip to main content

Showing 1–50 of 54 results for author: Simonyan, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2204.14198  [pdf, other

    cs.CV cs.AI cs.LG

    Flamingo: a Visual Language Model for Few-Shot Learning

    Authors: Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals , et al. (2 additional authors not shown)

    Abstract: Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. We propose key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily i… ▽ More

    Submitted 15 November, 2022; v1 submitted 29 April, 2022; originally announced April 2022.

    Comments: 54 pages. In Proceedings of Neural Information Processing Systems (NeurIPS) 2022

  2. arXiv:2203.15556  [pdf, other

    cs.CL cs.LG

    Training Compute-Optimal Large Language Models

    Authors: Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre

    Abstract: We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training over 400 language models ranging from 70 million to over 16 billion… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

  3. arXiv:2202.10890  [pdf, other

    cs.CV

    HiP: Hierarchical Perceiver

    Authors: Joao Carreira, Skanda Koppula, Daniel Zoran, Adria Recasens, Catalin Ionescu, Olivier Henaff, Evan Shelhamer, Relja Arandjelovic, Matt Botvinick, Oriol Vinyals, Karen Simonyan, Andrew Zisserman, Andrew Jaegle

    Abstract: General perception systems such as Perceivers can process arbitrary modalities in any combination and are able to handle up to a few hundred thousand inputs. They achieve this generality by using exclusively global attention operations. This however hinders them from scaling up to the inputs sizes required to process raw high-resolution images or video. In this paper, we show that some degree of l… ▽ More

    Submitted 3 November, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

  4. arXiv:2202.01169  [pdf, other

    cs.CL cs.LG

    Unified Scaling Laws for Routed Language Models

    Authors: Aidan Clark, Diego de las Casas, Aurelia Guy, Arthur Mensch, Michela Paganini, Jordan Hoffmann, Bogdan Damoc, Blake Hechtman, Trevor Cai, Sebastian Borgeaud, George van den Driessche, Eliza Rutherford, Tom Hennigan, Matthew Johnson, Katie Millican, Albin Cassirer, Chris Jones, Elena Buchatskaya, David Budden, Laurent Sifre, Simon Osindero, Oriol Vinyals, Jack Rae, Erich Elsen, Koray Kavukcuoglu , et al. (1 additional authors not shown)

    Abstract: The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that conditionally use only a subset of their parameters while processing an input. For these models, parameter count and computational requirement form two independent axes along which an increase leads to better… ▽ More

    Submitted 9 February, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: Fixing typos and affiliation clarity

  5. arXiv:2112.11446  [pdf, other

    cs.CL cs.AI

    Scaling Language Models: Methods, Analysis & Insights from Training Gopher

    Authors: Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor , et al. (55 additional authors not shown)

    Abstract: Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gop… ▽ More

    Submitted 21 January, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: 120 pages

  6. arXiv:2112.04426  [pdf, other

    cs.CL cs.LG

    Improving language models by retrieving from trillions of tokens

    Authors: Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan , et al. (3 additional authors not shown)

    Abstract: We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to d… ▽ More

    Submitted 7 February, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: Fix incorrect reported numbers in Table 14

  7. arXiv:2104.05336  [pdf, other

    cs.CL cs.LG

    Machine Translation Decoding beyond Beam Search

    Authors: Rémi Leblond, Jean-Baptiste Alayrac, Laurent Sifre, Miruna Pislar, Jean-Baptiste Lespiau, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals

    Abstract: Beam search is the go-to method for decoding auto-regressive machine translation models. While it yields consistent improvements in terms of BLEU, it is only concerned with finding outputs with high model likelihood, and is thus agnostic to whatever end metric or score practitioners care about. Our aim is to establish whether beam search can be replaced by a more powerful metric-driven search tech… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: 23 pages

  8. Skillful Precipitation Nowcasting using Deep Generative Models of Radar

    Authors: Suman Ravuri, Karel Lenc, Matthew Willson, Dmitry Kangin, Remi Lam, Piotr Mirowski, Megan Fitzsimons, Maria Athanassiadou, Sheleem Kashem, Sam Madge, Rachel Prudden, Amol Mandhane, Aidan Clark, Andrew Brock, Karen Simonyan, Raia Hadsell, Niall Robinson, Ellen Clancy, Alberto Arribas, Shakir Mohamed

    Abstract: Precipitation nowcasting, the high-resolution forecasting of precipitation up to two hours ahead, supports the real-world socio-economic needs of many sectors reliant on weather-dependent decision-making. State-of-the-art operational nowcasting methods typically advect precipitation fields with radar-based wind estimates, and struggle to capture important non-linear events such as convective initi… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: 46 pages, 17 figures, 2 tables

  9. arXiv:2103.06089  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Variable-rate discrete representation learning

    Authors: Sander Dieleman, Charlie Nash, Jesse Engel, Karen Simonyan

    Abstract: Semantically meaningful information content in perceptual signals is usually unevenly distributed. In speech signals for example, there are often many silences, and the speed of pronunciation can vary considerably. In this work, we propose slow autoencoders (SlowAEs) for unsupervised learning of high-level variable-rate discrete representations of sequences, and apply them to speech. We show that… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

    Comments: 26 pages, 15 figures, samples can be found at https://vdrl.github.io/

  10. arXiv:2102.06171  [pdf, other

    cs.CV cs.LG stat.ML

    High-Performance Large-Scale Image Recognition Without Normalization

    Authors: Andrew Brock, Soham De, Samuel L. Smith, Karen Simonyan

    Abstract: Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for l… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

  11. arXiv:2006.07360  [pdf, other

    cs.LG stat.ML

    AlgebraNets

    Authors: Jordan Hoffmann, Simon Schmitt, Simon Osindero, Karen Simonyan, Erich Elsen

    Abstract: Neural networks have historically been built layerwise from the set of functions in ${f: \mathbb{R}^n \to \mathbb{R}^m }$, i.e. with activations and weights/parameters represented by real numbers, $\mathbb{R}$. Our work considers a richer set of objects for activations and weights, and undertakes a comprehensive study of alternative algebras as number representations by studying their performance… ▽ More

    Submitted 16 June, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

  12. arXiv:2006.07232  [pdf, other

    cs.LG cs.NE stat.ML

    A Practical Sparse Approximation for Real Time Recurrent Learning

    Authors: Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

    Abstract: Current methods for training recurrent neural networks are based on backpropagation through time, which requires storing a complete history of network states, and prohibits updating the weights `online' (after every timestep). Real Time Recurrent Learning (RTRL) eliminates the need for history storage and allows for online weight updates, but does so at the expense of computational costs that are… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  13. arXiv:2006.03575  [pdf, other

    cs.SD cs.LG eess.AS

    End-to-End Adversarial Text-to-Speech

    Authors: Jeff Donahue, Sander Dieleman, Mikołaj Bińkowski, Erich Elsen, Karen Simonyan

    Abstract: Modern text-to-speech synthesis pipelines typically involve multiple processing stages, each of which is designed or learnt independently from the rest. In this work, we take on the challenging task of learning to synthesise speech from normalised text or phonemes in an end-to-end manner, resulting in models which operate directly on character or phoneme input sequences and produce raw speech audi… ▽ More

    Submitted 17 March, 2021; v1 submitted 5 June, 2020; originally announced June 2020.

    Comments: 23 pages. In proceedings of ICLR 2021

  14. arXiv:2004.02967  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Evolving Normalization-Activation Layers

    Authors: Hanxiao Liu, Andrew Brock, Karen Simonyan, Quoc V. Le

    Abstract: Normalization layers and activation functions are fundamental components in deep networks and typically co-locate with each other. Here we propose to design them using an automated approach. Instead of designing them separately, we unify them into a single tensor-to-tensor computation graph, and evolve its structure starting from basic mathematical functions. Examples of such mathematical function… ▽ More

    Submitted 17 July, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

  15. arXiv:2003.04035  [pdf, other

    cs.CV cs.LG

    Transformation-based Adversarial Video Prediction on Large-Scale Data

    Authors: Pauline Luc, Aidan Clark, Sander Dieleman, Diego de Las Casas, Yotam Doron, Albin Cassirer, Karen Simonyan

    Abstract: Recent breakthroughs in adversarial generative modeling have led to models capable of producing video samples of high quality, even on large and complex datasets of real-world video. In this work, we focus on the task of video prediction, where given a sequence of frames extracted from a video, the goal is to generate a plausible future sequence. We first improve the state of the art by performing… ▽ More

    Submitted 17 November, 2021; v1 submitted 9 March, 2020; originally announced March 2020.

  16. arXiv:1912.00953  [pdf, other

    cs.LG stat.ML

    LOGAN: Latent Optimisation for Generative Adversarial Networks

    Authors: Yan Wu, Jeff Donahue, David Balduzzi, Karen Simonyan, Timothy Lillicrap

    Abstract: Training generative adversarial networks requires balancing of delicate adversarial dynamics. Even with careful tuning, training may diverge or end up in a bad equilibrium with dropped modes. In this work, we improve CS-GAN with natural gradient-based latent optimisation and show that it improves adversarial dynamics by enhancing interactions between the discriminator and the generator. Our experi… ▽ More

    Submitted 1 July, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: Improved writing, added new analysis and evaluation

  17. arXiv:1911.09723  [pdf, other

    cs.CV

    Fast Sparse ConvNets

    Authors: Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan

    Abstract: Historically, the pursuit of efficient inference has been one of the driving forces behind research into new deep learning architectures and building blocks. Some recent examples include: the squeeze-and-excitation module, depthwise separable convolutions in Xception, and the inverted bottleneck in MobileNet v2. Notably, in all of these cases, the resulting building blocks enabled not only higher… ▽ More

    Submitted 21 November, 2019; originally announced November 2019.

  18. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

    Authors: Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver

    Abstract: Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the… ▽ More

    Submitted 21 February, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

  19. arXiv:1909.11646  [pdf, other

    cs.SD cs.LG eess.AS

    High Fidelity Speech Synthesis with Adversarial Networks

    Authors: Mikołaj Bińkowski, Jeff Donahue, Sander Dieleman, Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, Karen Simonyan

    Abstract: Generative adversarial networks have seen rapid development in recent years and have led to remarkable improvements in generative modelling of images. However, their application in the audio domain has received limited attention, and autoregressive models, such as WaveNet, remain the state of the art in generative modelling of audio signals such as human speech. To address this paucity, we introdu… ▽ More

    Submitted 26 September, 2019; v1 submitted 25 September, 2019; originally announced September 2019.

  20. arXiv:1909.11583  [pdf, other

    cs.LG cs.AI stat.ML

    Off-Policy Actor-Critic with Shared Experience Replay

    Authors: Simon Schmitt, Matteo Hessel, Karen Simonyan

    Abstract: We investigate the combination of actor-critic reinforcement learning algorithms with uniform large-scale experience replay and propose solutions for two challenges: (a) efficient actor-critic learning with experience replay (b) stability of off-policy learning where agents learn from other agents behaviour. We employ those insights to accelerate hyper-parameter sweeps in which all participating a… ▽ More

    Submitted 18 November, 2019; v1 submitted 25 September, 2019; originally announced September 2019.

  21. arXiv:1907.06571  [pdf, other

    cs.CV cs.LG stat.ML

    Adversarial Video Generation on Complex Datasets

    Authors: Aidan Clark, Jeff Donahue, Karen Simonyan

    Abstract: Generative models of natural images have progressed towards high fidelity samples by the strong leveraging of scale. We attempt to carry this success to the field of video modeling by showing that large Generative Adversarial Networks trained on the complex Kinetics-600 dataset are able to produce video samples of substantially higher complexity and fidelity than previous work. Our proposed model,… ▽ More

    Submitted 25 September, 2019; v1 submitted 15 July, 2019; originally announced July 2019.

  22. arXiv:1907.02544  [pdf, other

    cs.CV cs.LG stat.ML

    Large Scale Adversarial Representation Learning

    Authors: Jeff Donahue, Karen Simonyan

    Abstract: Adversarially trained generative models (GANs) have recently achieved compelling image synthesis results. But despite early successes in using GANs for unsupervised representation learning, they have since been superseded by approaches based on self-supervision. In this work we show that progress in image generation quality translates to substantially improved representation learning performance.… ▽ More

    Submitted 5 November, 2019; v1 submitted 4 July, 2019; originally announced July 2019.

    Comments: 32 pages. In proceedings of NeurIPS 2019. This is the camera-ready version of the paper, with supplementary material included as appendices

  23. arXiv:1906.03139  [pdf, other

    cs.NE cs.LG stat.ML

    Non-Differentiable Supervised Learning with Evolution Strategies and Hybrid Methods

    Authors: Karel Lenc, Erich Elsen, Tom Schaul, Karen Simonyan

    Abstract: In this work we show that Evolution Strategies (ES) are a viable method for learning non-differentiable parameters of large supervised models. ES are black-box optimization algorithms that estimate distributions of model parameters; however they have only been used for relatively small problems so far. We show that it is possible to scale ES to more complex tasks and models with millions of parame… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

  24. arXiv:1903.04933  [pdf, other

    cs.CV cs.LG stat.ML

    Hierarchical Autoregressive Image Models with Auxiliary Decoders

    Authors: Jeffrey De Fauw, Sander Dieleman, Karen Simonyan

    Abstract: Autoregressive generative models of images tend to be biased towards capturing local structure, and as a result they often produce samples which are lacking in terms of large-scale coherence. To address this, we propose two methods to learn discrete representations of images which abstract away local detail. We show that autoregressive models conditioned on these representations can produce high-f… ▽ More

    Submitted 8 October, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

    Comments: Updated: added human evaluation results, incorporated review feedback

  25. arXiv:1903.01292  [pdf, other

    cs.AI cs.CV cs.RO

    The StreetLearn Environment and Dataset

    Authors: Piotr Mirowski, Andras Banki-Horvath, Keith Anderson, Denis Teplyashin, Karl Moritz Hermann, Mateusz Malinowski, Matthew Koichi Grimes, Karen Simonyan, Koray Kavukcuoglu, Andrew Zisserman, Raia Hadsell

    Abstract: Navigation is a rich and well-grounded problem domain that drives progress in many different areas of research: perception, planning, memory, exploration, and optimisation in particular. Historically these challenges have been separately considered and solutions built that rely on stationary datasets - for example, recorded trajectories through an environment. These datasets cannot be used for dec… ▽ More

    Submitted 4 March, 2019; originally announced March 2019.

    Comments: 13 pages, 6 figures, 4 tables. arXiv admin note: text overlap with arXiv:1804.00168

  26. arXiv:1809.11096  [pdf, other

    cs.LG stat.ML

    Large Scale GAN Training for High Fidelity Natural Image Synthesis

    Authors: Andrew Brock, Jeff Donahue, Karen Simonyan

    Abstract: Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale. We find that applying orthogonal regularization to the generator renders it amenabl… ▽ More

    Submitted 25 February, 2019; v1 submitted 28 September, 2018; originally announced September 2018.

  27. arXiv:1808.03715  [pdf, ps, other

    cs.SD cs.LG eess.AS

    This Time with Feeling: Learning Expressive Musical Performance

    Authors: Sageev Oore, Ian Simon, Sander Dieleman, Douglas Eck, Karen Simonyan

    Abstract: Music generation has generally been focused on either creating scores or interpreting them. We discuss differences between these two problems and propose that, in fact, it may be valuable to work in the space of direct $\it performance$ generation: jointly predicting the notes $\it and$ $\it also$ their expressive timing and dynamics. We consider the significance and qualities of the data set need… ▽ More

    Submitted 10 August, 2018; originally announced August 2018.

    Comments: Includes links to urls for audio samples

  28. arXiv:1806.10474  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    The challenge of realistic music generation: modelling raw audio at scale

    Authors: Sander Dieleman, Aäron van den Oord, Karen Simonyan

    Abstract: Realistic music generation is a challenging task. When building generative models of music that are learnt from data, typically high-level representations such as scores or MIDI are used that abstract away the idiosyncrasies of a particular performance. But these nuances are very important for our perception of musicality and realism, so in this work we embark on modelling music in the raw audio d… ▽ More

    Submitted 26 June, 2018; originally announced June 2018.

    Comments: 13 pages, 2 figures, submitted to NIPS 2018

  29. arXiv:1806.09055  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    DARTS: Differentiable Architecture Search

    Authors: Hanxiao Liu, Karen Simonyan, Yiming Yang

    Abstract: This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, our method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient… ▽ More

    Submitted 23 April, 2019; v1 submitted 23 June, 2018; originally announced June 2018.

    Comments: Published at ICLR 2019; Code and pretrained models available at https://github.com/quark0/darts

  30. arXiv:1804.00168  [pdf, other

    cs.AI

    Learning to Navigate in Cities Without a Map

    Authors: Piotr Mirowski, Matthew Koichi Grimes, Mateusz Malinowski, Karl Moritz Hermann, Keith Anderson, Denis Teplyashin, Karen Simonyan, Koray Kavukcuoglu, Andrew Zisserman, Raia Hadsell

    Abstract: Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. Long-range navigation is a complex cognitive task that relies on developing an internal representation of space, grounded by recognisable landmarks and robust visual processing, that can simultaneously support cont… ▽ More

    Submitted 9 January, 2019; v1 submitted 31 March, 2018; originally announced April 2018.

    Comments: 17 pages, 16 figures, published at NeurIPS 2018

    Journal ref: Neural Information Processing Systems 2018

  31. arXiv:1803.03835  [pdf, other

    cs.LG

    Kickstarting Deep Reinforcement Learning

    Authors: Simon Schmitt, Jonathan J. Hudson, Augustin Zidek, Simon Osindero, Carl Doersch, Wojciech M. Czarnecki, Joel Z. Leibo, Heinrich Kuttler, Andrew Zisserman, Karen Simonyan, S. M. Ali Eslami

    Abstract: We present a method for using previously-trained 'teacher' agents to kickstart the training of a new 'student' agent. To this end, we leverage ideas from policy distillation and population based training. Our method places no constraints on the architecture of the teacher or student agents, and it regulates itself to allow the students to surpass their teachers in performance. We show that, on a c… ▽ More

    Submitted 10 March, 2018; originally announced March 2018.

  32. arXiv:1802.08435  [pdf, other

    cs.SD cs.LG eess.AS

    Efficient Neural Audio Synthesis

    Authors: Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron van den Oord, Sander Dieleman, Koray Kavukcuoglu

    Abstract: Sequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating high-quality samples. Efficient sampling for this class of models has however remained an elusive problem. With a focus on text-to-speech synthesis, we describe a set of general techniques for reducing sampling time while maintaining high outp… ▽ More

    Submitted 25 June, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

    Comments: 10 pages

  33. arXiv:1802.04697  [pdf, other

    cs.AI cs.LG stat.ML

    Learning to Search with MCTSnets

    Authors: Arthur Guez, Théophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Rémi Munos, David Silver

    Abstract: Planning problems are among the most important and well-studied problems in artificial intelligence. They are most typically solved by tree search algorithms that simulate ahead into the future, evaluate future states, and back-up those evaluations to the root of a search tree. Among these algorithms, Monte-Carlo tree search (MCTS) is one of the most general, powerful and widely used. A typical im… ▽ More

    Submitted 17 July, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

    Comments: ICML 2018 (camera-ready version)

  34. arXiv:1802.01561  [pdf, other

    cs.LG cs.AI

    IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

    Authors: Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu

    Abstract: In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also… ▽ More

    Submitted 28 June, 2018; v1 submitted 5 February, 2018; originally announced February 2018.

  35. arXiv:1712.01815  [pdf, other

    cs.AI cs.LG

    Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

    Authors: David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis

    Abstract: The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game… ▽ More

    Submitted 5 December, 2017; originally announced December 2017.

  36. arXiv:1711.10433  [pdf, other

    cs.LG

    Parallel WaveNet: Fast High-Fidelity Speech Synthesis

    Authors: Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, Demis Hassabis

    Abstract: The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today's massively parallel computers, and therefore hard to deploy in a real-time p… ▽ More

    Submitted 28 November, 2017; originally announced November 2017.

  37. arXiv:1711.09846  [pdf, other

    cs.LG cs.NE

    Population Based Training of Neural Networks

    Authors: Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, Koray Kavukcuoglu

    Abstract: Neural networks dominate the modern machine learning landscape, but their training and success still suffer from sensitivity to empirical choices of hyperparameters such as model architecture, loss function, and optimisation algorithm. In this work we present \emph{Population Based Training (PBT)}, a simple asynchronous optimisation algorithm which effectively utilises a fixed computational budget… ▽ More

    Submitted 28 November, 2017; v1 submitted 27 November, 2017; originally announced November 2017.

  38. arXiv:1711.00436  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Hierarchical Representations for Efficient Architecture Search

    Authors: Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, Koray Kavukcuoglu

    Abstract: We explore efficient neural architecture search methods and show that a simple yet powerful evolutionary algorithm can discover new architectures with excellent performance. Our approach combines a novel hierarchical genetic representation scheme that imitates the modularized design pattern commonly adopted by human experts, and an expressive search space that supports complex topologies. Our algo… ▽ More

    Submitted 22 February, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

    Comments: Accepted as a conference paper at ICLR 2018

  39. arXiv:1708.04782  [pdf, other

    cs.LG cs.AI

    StarCraft II: A New Challenge for Reinforcement Learning

    Authors: Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, John Quan, Stephen Gaffney, Stig Petersen, Karen Simonyan, Tom Schaul, Hado van Hasselt, David Silver, Timothy Lillicrap, Kevin Calderone, Paul Keet, Anthony Brunasso, David Lawrence, Anders Ekermo, Jacob Repp, Rodney Tsing

    Abstract: This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially o… ▽ More

    Submitted 16 August, 2017; originally announced August 2017.

    Comments: Collaboration between DeepMind & Blizzard. 20 pages, 9 figures, 2 tables

  40. arXiv:1705.06950  [pdf, other

    cs.CV

    The Kinetics Human Action Video Dataset

    Authors: Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, Andrew Zisserman

    Abstract: We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such… ▽ More

    Submitted 19 May, 2017; originally announced May 2017.

  41. arXiv:1704.01279  [pdf, other

    cs.LG cs.AI cs.SD

    Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

    Authors: Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas Eck, Karen Simonyan, Mohammad Norouzi

    Abstract: Generative models in vision have seen rapid progress due to algorithmic improvements and the availability of high-quality image datasets. In this paper, we offer contributions in both these areas to enable similar progress in audio modeling. First, we detail a powerful new WaveNet-style autoencoder model that conditions an autoregressive decoder on temporal codes learned from the raw audio wavefor… ▽ More

    Submitted 5 April, 2017; originally announced April 2017.

  42. arXiv:1610.10099  [pdf, other

    cs.CL cs.LG

    Neural Machine Translation in Linear Time

    Authors: Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, Koray Kavukcuoglu

    Abstract: We present a novel neural network for processing sequences. The ByteNet is a one-dimensional convolutional neural network that is composed of two parts, one to encode the source sequence and the other to decode the target sequence. The two network parts are connected by stacking the decoder on top of the encoder and preserving the temporal resolution of the sequences. To address the differing leng… ▽ More

    Submitted 15 March, 2017; v1 submitted 31 October, 2016; originally announced October 2016.

    Comments: 9 pages

  43. arXiv:1610.00527  [pdf, other

    cs.CV cs.LG

    Video Pixel Networks

    Authors: Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu

    Abstract: We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over t… ▽ More

    Submitted 3 October, 2016; originally announced October 2016.

    Comments: 16 pages

  44. arXiv:1609.03499  [pdf, other

    cs.SD cs.LG

    WaveNet: A Generative Model for Raw Audio

    Authors: Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu

    Abstract: This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-of-… ▽ More

    Submitted 19 September, 2016; v1 submitted 12 September, 2016; originally announced September 2016.

  45. arXiv:1507.00210  [pdf, other

    stat.ML cs.LG cs.NE

    Natural Neural Networks

    Authors: Guillaume Desjardins, Karen Simonyan, Razvan Pascanu, Koray Kavukcuoglu

    Abstract: We introduce Natural Neural Networks, a novel family of algorithms that speed up convergence by adapting their internal representation during training to improve conditioning of the Fisher matrix. In particular, we show a specific example that employs a simple and efficient reparametrization of the neural network weights by implicitly whitening the representation obtained at each layer, while pres… ▽ More

    Submitted 1 July, 2015; originally announced July 2015.

  46. arXiv:1506.02025  [pdf, other

    cs.CV

    Spatial Transformer Networks

    Authors: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu

    Abstract: Convolutional Neural Networks define an exceptionally powerful class of models, but are still limited by the lack of ability to be spatially invariant to the input data in a computationally and parameter efficient manner. In this work we introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network. This differentiable module… ▽ More

    Submitted 4 February, 2016; v1 submitted 5 June, 2015; originally announced June 2015.

  47. arXiv:1412.5903  [pdf, other

    cs.CV

    Deep Structured Output Learning for Unconstrained Text Recognition

    Authors: Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

    Abstract: We develop a representation suitable for the unconstrained recognition of words in natural images: the general case of no fixed lexicon and unknown length. To this end we propose a convolutional neural network (CNN) based architecture which incorporates a Conditional Random Field (CRF) graphical model, taking the whole word image as a single input. The unaries of the CRF are provided by a CNN th… ▽ More

    Submitted 10 April, 2015; v1 submitted 18 December, 2014; originally announced December 2014.

    Comments: arXiv admin note: text overlap with arXiv:1406.2227

  48. arXiv:1412.1842  [pdf, other

    cs.CV

    Reading Text in the Wild with Convolutional Neural Networks

    Authors: Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

    Abstract: In this work we present an end-to-end system for text spotting -- localising and recognising text in natural scene images -- and text based image retrieval. This system is based on a region proposal mechanism for detection and deep convolutional neural networks for recognition. Our pipeline uses a novel combination of complementary proposal generation techniques to ensure high recall, and a fast s… ▽ More

    Submitted 4 December, 2014; originally announced December 2014.

  49. arXiv:1409.1556  [pdf, ps, other

    cs.CV

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Authors: Karen Simonyan, Andrew Zisserman

    Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19… ▽ More

    Submitted 10 April, 2015; v1 submitted 4 September, 2014; originally announced September 2014.

  50. arXiv:1407.4764  [pdf, other

    cs.CV cs.LG cs.NE

    Efficient On-the-fly Category Retrieval using ConvNets and GPUs

    Authors: Ken Chatfield, Karen Simonyan, Andrew Zisserman

    Abstract: We investigate the gains in precision and speed, that can be obtained by using Convolutional Networks (ConvNets) for on-the-fly retrieval - where classifiers are learnt at run time for a textual query from downloaded images, and used to rank large image or video datasets. We make three contributions: (i) we present an evaluation of state-of-the-art image representations for object category retri… ▽ More

    Submitted 17 November, 2014; v1 submitted 17 July, 2014; originally announced July 2014.

    Comments: Published in proceedings of ACCV 2014