Zum Hauptinhalt springen

Showing 1–39 of 39 results for author: Oord, A v d

.
  1. arXiv:2408.07009  [pdf, other

    cs.CV

    Imagen 3

    Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis, Siavash Khodadadeh , et al. (227 additional authors not shown)

    Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

    Submitted 13 August, 2024; originally announced August 2024.

  2. arXiv:2308.03526  [pdf, other

    cs.LG cs.AI

    AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

    Authors: Michaël Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Caglar Gulcehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Richard Powell, Konrad Żołna, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Toyama, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah Henderson, Sergio Gómez Colmenarejo, Aäron van den Oord, Wojciech Marian Czarnecki, Nando de Freitas, Oriol Vinyals

    Abstract: StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level execution. It also has an active professional competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of it… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 32 pages, 13 figures, previous version published as a NeurIPS 2021 workshop: https://openreview.net/forum?id=Np8Pumfoty

  3. arXiv:2112.06749  [pdf, other

    cs.CL cs.LG

    Step-unrolled Denoising Autoencoders for Text Generation

    Authors: Nikolay Savinov, Junyoung Chung, Mikolaj Binkowski, Erich Elsen, Aaron van den Oord

    Abstract: In this paper we propose a new generative model of text, Step-unrolled Denoising Autoencoder (SUNDAE), that does not rely on autoregressive models. Similarly to denoising diffusion techniques, SUNDAE is repeatedly applied on a sequence of tokens, starting from random inputs and improving them each time until convergence. We present a simple new improvement operator that converges in fewer iteratio… ▽ More

    Submitted 19 April, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: Accepted to ICLR 2022

  4. arXiv:2111.12124  [pdf, ps, other

    cs.SD eess.AS

    Towards Learning Universal Audio Representations

    Authors: Luyu Wang, Pauline Luc, Yan Wu, Adria Recasens, Lucas Smaira, Andrew Brock, Andrew Jaegle, Jean-Baptiste Alayrac, Sander Dieleman, Joao Carreira, Aaron van den Oord

    Abstract: The ability to learn universal audio representations that can solve diverse speech, music, and environment tasks can spur many applications that require general sound content understanding. In this work, we introduce a holistic audio representation evaluation suite (HARES) spanning 12 downstream tasks across audio domains and provide a thorough empirical study of recent sound representation learni… ▽ More

    Submitted 23 June, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

  5. arXiv:2106.04615  [pdf, other

    cs.LG cs.AI stat.ML

    Vector Quantized Models for Planning

    Authors: Sherjil Ozair, Yazhe Li, Ali Razavi, Ioannis Antonoglou, Aäron van den Oord, Oriol Vinyals

    Abstract: Recent developments in the field of model-based RL have proven successful in a range of environments, especially ones where planning is essential. However, such successes have been limited to deterministic fully-observed environments. We present a new approach that handles stochastic and partially-observable environments. Our key insight is to use discrete autoencoders to capture the multiple poss… ▽ More

    Submitted 10 June, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: ICML 2021

  6. arXiv:2105.08054  [pdf, other

    cs.CV

    Divide and Contrast: Self-supervised Learning from Uncurated Data

    Authors: Yonglong Tian, Olivier J. Henaff, Aaron van den Oord

    Abstract: Self-supervised learning holds promise in leveraging large amounts of unlabeled data, however much of its progress has thus far been limited to highly curated pre-training data such as ImageNet. We explore the effects of contrastive learning from larger, less-curated image datasets such as YFCC, and find there is indeed a large difference in the resulting representation quality. We hypothesize tha… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

  7. arXiv:2104.12807  [pdf, other

    cs.SD eess.AS

    Multimodal Self-Supervised Learning of General Audio Representations

    Authors: Luyu Wang, Pauline Luc, Adria Recasens, Jean-Baptiste Alayrac, Aaron van den Oord

    Abstract: We present a multimodal framework to learn general audio representations from videos. Existing contrastive audio representation learning methods mainly focus on using the audio modality alone during training. In this work, we show that additional information contained in video can be utilized to greatly improve the learned features. First, we demonstrate that our contrastive framework does not req… ▽ More

    Submitted 28 April, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

  8. arXiv:2103.16559  [pdf, other

    cs.CV

    Broaden Your Views for Self-Supervised Video Learning

    Authors: Adrià Recasens, Pauline Luc, Jean-Baptiste Alayrac, Luyu Wang, Ross Hemsley, Florian Strub, Corentin Tallec, Mateusz Malinowski, Viorica Patraucean, Florent Altché, Michal Valko, Jean-Bastien Grill, Aäron van den Oord, Andrew Zisserman

    Abstract: Most successful self-supervised learning methods are trained to align the representations of two independent views from the data. State-of-the-art methods in video are inspired by image techniques, where these two views are similarly extracted by cropping and augmenting the resulting crop. However, these methods miss a crucial element in the video domain: time. We introduce BraVe, a self-supervise… ▽ More

    Submitted 19 October, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: This paper is an extended version of our ICCV-21 paper. It includes more results as well as a minor architectural variation which improves results

  9. arXiv:2103.10957  [pdf, other

    cs.CV

    Efficient Visual Pretraining with Contrastive Detection

    Authors: Olivier J. Hénaff, Skanda Koppula, Jean-Baptiste Alayrac, Aaron van den Oord, Oriol Vinyals, João Carreira

    Abstract: Self-supervised pretraining has been shown to yield powerful representations for transfer learning. These performance gains come at a large computational cost however, with state-of-the-art methods requiring an order of magnitude more computation than supervised pretraining. We tackle this computational bottleneck by introducing a new self-supervised objective, contrastive detection, which tasks r… ▽ More

    Submitted 5 August, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

    Comments: Technical report

  10. arXiv:2103.06508  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-Format Contrastive Learning of Audio Representations

    Authors: Luyu Wang, Aaron van den Oord

    Abstract: Recent advances suggest the advantage of multi-modal training in comparison with single-modal methods. In contrast to this view, in our work we find that similar gain can be obtained from training with different formats of a single modality. In particular, we investigate the use of the contrastive learning framework to learn audio representations by maximizing the agreement between the raw audio a… ▽ More

    Submitted 23 March, 2021; v1 submitted 11 March, 2021; originally announced March 2021.

  11. arXiv:2103.01950  [pdf, other

    cs.CV cs.LG

    Predicting Video with VQVAE

    Authors: Jacob Walker, Ali Razavi, Aäron van den Oord

    Abstract: In recent years, the task of video prediction-forecasting future video given past video frames-has attracted attention in the research community. In this paper we propose a novel approach to this problem with Vector Quantized Variational AutoEncoders (VQ-VAE). With VQ-VAE we compress high-resolution videos into a hierarchical set of multi-scale discrete latent variables. Compared to pixels, this c… ▽ More

    Submitted 2 March, 2021; originally announced March 2021.

    Comments: 13 Pages

    ACM Class: I.2.6; I.2.10

  12. arXiv:2006.07159  [pdf, other

    cs.CV cs.LG

    Are we done with ImageNet?

    Authors: Lucas Beyer, Olivier J. Hénaff, Alexander Kolesnikov, Xiaohua Zhai, Aäron van den Oord

    Abstract: Yes, and no. We ask whether recent progress on the ImageNet classification benchmark continues to represent meaningful generalization, or whether the community has started to overfit to the idiosyncrasies of its labeling procedure. We therefore develop a significantly more robust procedure for collecting human annotations of the ImageNet validation set. Using these new labels, we reassess the accu… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: All five authors contributed equally. New labels at https://github.com/google-research/reassessed-imagenet

  13. arXiv:2001.11128  [pdf, other

    cs.CL cs.LG eess.AS

    Learning Robust and Multilingual Speech Representations

    Authors: Kazuya Kawakami, Luyu Wang, Chris Dyer, Phil Blunsom, Aaron van den Oord

    Abstract: Unsupervised speech representation learning has shown remarkable success at finding representations that correlate with phonetic structures and improve downstream speech recognition performance. However, most research has been focused on evaluating the representations in terms of their ability to improve the performance of speech recognition systems on read English (e.g. Wall Street Journal and Li… ▽ More

    Submitted 29 January, 2020; originally announced January 2020.

  14. arXiv:1910.06464  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Low Bit-Rate Speech Coding with VQ-VAE and a WaveNet Decoder

    Authors: Cristina Gârbacea, Aäron van den Oord, Yazhe Li, Felicia S C Lim, Alejandro Luebs, Oriol Vinyals, Thomas C Walters

    Abstract: In order to efficiently transmit and store speech signals, speech codecs create a minimally redundant representation of the input signal which is then decoded at the receiver with the best possible perceptual quality. In this work we demonstrate that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction… ▽ More

    Submitted 14 October, 2019; originally announced October 2019.

    Comments: ICASSP 2019

    Journal ref: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 735-739. IEEE, 2019

  15. arXiv:1906.09237  [pdf, other

    cs.LG cs.AI stat.ML

    Shaping Belief States with Generative Environment Models for RL

    Authors: Karol Gregor, Danilo Jimenez Rezende, Frederic Besse, Yan Wu, Hamza Merzic, Aaron van den Oord

    Abstract: When agents interact with a complex environment, they must form and maintain beliefs about the relevant aspects of that environment. We propose a way to efficiently train expressive generative models in complex environments. We show that a predictive algorithm with an expressive generative model can form stable belief-states in visually rich and dynamic 3D environments. More precisely, we show tha… ▽ More

    Submitted 24 June, 2019; v1 submitted 21 June, 2019; originally announced June 2019.

    Comments: pre-print

  16. arXiv:1906.00446  [pdf, other

    cs.LG cs.CV stat.ML

    Generating Diverse High-Fidelity Images with VQ-VAE-2

    Authors: Ali Razavi, Aaron van den Oord, Oriol Vinyals

    Abstract: We explore the use of Vector Quantized Variational AutoEncoder (VQ-VAE) models for large scale image generation. To this end, we scale and enhance the autoregressive priors used in VQ-VAE to generate synthetic samples of much higher coherence and fidelity than possible before. We use simple feed-forward encoder and decoder networks, making our model an attractive candidate for applications where t… ▽ More

    Submitted 2 June, 2019; originally announced June 2019.

  17. arXiv:1905.09272  [pdf, other

    cs.CV cs.LG

    Data-Efficient Image Recognition with Contrastive Predictive Coding

    Authors: Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord

    Abstract: Human observers can learn to recognize new categories of images from a handful of examples, yet doing so with artificial ones remains an open challenge. We hypothesize that data-efficient recognition is enabled by representations which make the variability in natural signals more predictable. We therefore revisit and improve Contrastive Predictive Coding, an unsupervised objective for learning suc… ▽ More

    Submitted 1 July, 2020; v1 submitted 22 May, 2019; originally announced May 2019.

  18. arXiv:1905.06922  [pdf, other

    cs.LG stat.ML

    On Variational Bounds of Mutual Information

    Authors: Ben Poole, Sherjil Ozair, Aaron van den Oord, Alexander A. Alemi, George Tucker

    Abstract: Estimating and optimizing Mutual Information (MI) is core to many problems in machine learning; however, bounding MI in high dimensions is challenging. To establish tractable and scalable objectives, recent work has turned to variational bounds parameterized by neural networks, but the relationships and tradeoffs between these bounds remains unclear. In this work, we unify these recent development… ▽ More

    Submitted 16 May, 2019; originally announced May 2019.

    Comments: ICML 2019

  19. arXiv:1903.11780  [pdf, other

    cs.LG stat.ML

    Wasserstein Dependency Measure for Representation Learning

    Authors: Sherjil Ozair, Corey Lynch, Yoshua Bengio, Aaron van den Oord, Sergey Levine, Pierre Sermanet

    Abstract: Mutual information maximization has emerged as a powerful learning objective for unsupervised representation learning obtaining state-of-the-art performance in applications such as object recognition, speech recognition, and reinforcement learning. However, such approaches are fundamentally limited since a tight lower bound of mutual information requires sample size exponential in the mutual infor… ▽ More

    Submitted 27 March, 2019; originally announced March 2019.

  20. arXiv:1901.08810  [pdf, other

    cs.LG eess.AS stat.ML

    Unsupervised speech representation learning using WaveNet autoencoders

    Authors: Jan Chorowski, Ron J. Weiss, Samy Bengio, Aäron van den Oord

    Abstract: We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content from the signal, e.g.\ phoneme identities, while being invariant to confounding low level details in the signal such as the underlying pitch contour or backgroun… ▽ More

    Submitted 11 September, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

    Comments: Accepted to IEEE TASLP, final version available at http://dx.doi.org/10.1109/TASLP.2019.2938863

  21. arXiv:1901.03416  [pdf, other

    cs.LG stat.ML

    Preventing Posterior Collapse with delta-VAEs

    Authors: Ali Razavi, Aäron van den Oord, Ben Poole, Oriol Vinyals

    Abstract: Due to the phenomenon of "posterior collapse," current latent variable generative models pose a challenging design choice that either weakens the capacity of the decoder or requires augmenting the objective so it does not only maximize the likelihood of the data. In this paper, we propose an alternative that utilizes the most powerful generative models as decoders, whilst optimising the variationa… ▽ More

    Submitted 10 January, 2019; originally announced January 2019.

  22. arXiv:1809.10460  [pdf, other

    cs.LG cs.SD stat.ML

    Sample Efficient Adaptive Text-to-Speech

    Authors: Yutian Chen, Yannis Assael, Brendan Shillingford, David Budden, Scott Reed, Heiga Zen, Quan Wang, Luis C. Cobo, Andrew Trask, Ben Laurie, Caglar Gulcehre, Aäron van den Oord, Oriol Vinyals, Nando de Freitas

    Abstract: We present a meta-learning approach for adaptive text-to-speech (TTS) with few data. During training, we learn a multi-speaker model using a shared conditional WaveNet core and independent learned embeddings for each speaker. The aim of training is not to produce a neural network with fixed weights, which is then deployed as a TTS system. Instead, the aim is to produce a network that requires few… ▽ More

    Submitted 16 January, 2019; v1 submitted 27 September, 2018; originally announced September 2018.

    Comments: Accepted by ICLR 2019

  23. arXiv:1807.03748  [pdf, other

    cs.LG stat.ML

    Representation Learning with Contrastive Predictive Coding

    Authors: Aaron van den Oord, Yazhe Li, Oriol Vinyals

    Abstract: While supervised learning has enabled great progress in many applications, unsupervised learning has not seen such widespread adoption, and remains an important and challenging endeavor for artificial intelligence. In this work, we propose a universal unsupervised learning approach to extract useful representations from high-dimensional data, which we call Contrastive Predictive Coding. The key in… ▽ More

    Submitted 22 January, 2019; v1 submitted 10 July, 2018; originally announced July 2018.

  24. arXiv:1806.10474  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    The challenge of realistic music generation: modelling raw audio at scale

    Authors: Sander Dieleman, Aäron van den Oord, Karen Simonyan

    Abstract: Realistic music generation is a challenging task. When building generative models of music that are learnt from data, typically high-level representations such as scores or MIDI are used that abstract away the idiosyncrasies of a particular performance. But these nuances are very important for our perception of musicality and realism, so in this work we embark on modelling music in the raw audio d… ▽ More

    Submitted 26 June, 2018; originally announced June 2018.

    Comments: 13 pages, 2 figures, submitted to NIPS 2018

  25. arXiv:1804.02476  [pdf, other

    cs.NE cs.LG stat.ML

    Associative Compression Networks for Representation Learning

    Authors: Alex Graves, Jacob Menick, Aaron van den Oord

    Abstract: This paper introduces Associative Compression Networks (ACNs), a new framework for variational autoencoding with neural networks. The system differs from existing variational autoencoders (VAEs) in that the prior distribution used to model each code is conditioned on a similar code from the dataset. In compression terms this equates to sequentially transmitting the dataset using an ordering determ… ▽ More

    Submitted 26 April, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

    Comments: Revised to clarify difference between ACN and IID loss

  26. arXiv:1802.08435  [pdf, other

    cs.SD cs.LG eess.AS

    Efficient Neural Audio Synthesis

    Authors: Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron van den Oord, Sander Dieleman, Koray Kavukcuoglu

    Abstract: Sequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating high-quality samples. Efficient sampling for this class of models has however remained an elusive problem. With a focus on text-to-speech synthesis, we describe a set of general techniques for reducing sampling time while maintaining high outp… ▽ More

    Submitted 25 June, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

    Comments: 10 pages

  27. arXiv:1802.05666  [pdf, other

    cs.LG cs.CR stat.ML

    Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

    Authors: Jonathan Uesato, Brendan O'Donoghue, Aaron van den Oord, Pushmeet Kohli

    Abstract: This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. We motivate 'adversarial risk' as an objective for achieving models robust to worst-case inputs. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may optim… ▽ More

    Submitted 12 June, 2018; v1 submitted 15 February, 2018; originally announced February 2018.

  28. arXiv:1711.10433  [pdf, other

    cs.LG

    Parallel WaveNet: Fast High-Fidelity Speech Synthesis

    Authors: Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, Demis Hassabis

    Abstract: The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today's massively parallel computers, and therefore hard to deploy in a real-time p… ▽ More

    Submitted 28 November, 2017; originally announced November 2017.

  29. arXiv:1711.00937  [pdf, other

    cs.LG

    Neural Discrete Representation Learning

    Authors: Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu

    Abstract: Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt r… ▽ More

    Submitted 30 May, 2018; v1 submitted 2 November, 2017; originally announced November 2017.

  30. arXiv:1710.10304  [pdf, other

    cs.NE cs.CV

    Few-shot Autoregressive Density Estimation: Towards Learning to Learn Distributions

    Authors: Scott Reed, Yutian Chen, Thomas Paine, Aäron van den Oord, S. M. Ali Eslami, Danilo Rezende, Oriol Vinyals, Nando de Freitas

    Abstract: Deep autoregressive models have shown state-of-the-art performance in density estimation for natural images on large-scale datasets such as ImageNet. However, such models require many thousands of gradient-based weight updates and unique image examples for training. Ideally, the models would rapidly learn visual concepts from only a handful of examples, similar to the manner in which humans learns… ▽ More

    Submitted 28 February, 2018; v1 submitted 27 October, 2017; originally announced October 2017.

  31. arXiv:1703.03664  [pdf, other

    cs.CV cs.NE

    Parallel Multiscale Autoregressive Density Estimation

    Authors: Scott Reed, Aäron van den Oord, Nal Kalchbrenner, Sergio Gómez Colmenarejo, Ziyu Wang, Dan Belov, Nando de Freitas

    Abstract: PixelCNN achieves state-of-the-art results in density estimation for natural images. Although training is fast, inference is costly, requiring one network evaluation per pixel; O(N) for N pixels. This can be sped up by caching activations, but still involves generating each pixel sequentially. In this work, we propose a parallelized PixelCNN that allows more efficient inference by modeling certain… ▽ More

    Submitted 10 March, 2017; originally announced March 2017.

  32. arXiv:1703.01310  [pdf, other

    cs.AI

    Count-Based Exploration with Neural Density Models

    Authors: Georg Ostrovski, Marc G. Bellemare, Aaron van den Oord, Remi Munos

    Abstract: Bellemare et al. (2016) introduced the notion of a pseudo-count, derived from a density model, to generalize count-based exploration to non-tabular reinforcement learning. This pseudo-count was used to generate an exploration bonus for a DQN agent and combined with a mixed Monte Carlo update was sufficient to achieve state of the art on the Atari 2600 game Montezuma's Revenge. We consider two ques… ▽ More

    Submitted 14 June, 2017; v1 submitted 3 March, 2017; originally announced March 2017.

  33. arXiv:1610.10099  [pdf, other

    cs.CL cs.LG

    Neural Machine Translation in Linear Time

    Authors: Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, Koray Kavukcuoglu

    Abstract: We present a novel neural network for processing sequences. The ByteNet is a one-dimensional convolutional neural network that is composed of two parts, one to encode the source sequence and the other to decode the target sequence. The two network parts are connected by stacking the decoder on top of the encoder and preserving the temporal resolution of the sequences. To address the differing leng… ▽ More

    Submitted 15 March, 2017; v1 submitted 31 October, 2016; originally announced October 2016.

    Comments: 9 pages

  34. arXiv:1610.00527  [pdf, other

    cs.CV cs.LG

    Video Pixel Networks

    Authors: Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu

    Abstract: We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over t… ▽ More

    Submitted 3 October, 2016; originally announced October 2016.

    Comments: 16 pages

  35. arXiv:1609.03499  [pdf, other

    cs.SD cs.LG

    WaveNet: A Generative Model for Raw Audio

    Authors: Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu

    Abstract: This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-of-… ▽ More

    Submitted 19 September, 2016; v1 submitted 12 September, 2016; originally announced September 2016.

  36. arXiv:1606.05328  [pdf, other

    cs.CV cs.LG

    Conditional Image Generation with PixelCNN Decoders

    Authors: Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu

    Abstract: This work explores conditional image generation with a new image density model based on the PixelCNN architecture. The model can be conditioned on any vector, including descriptive labels or tags, or latent embeddings created by other networks. When conditioned on class labels from the ImageNet database, the model is able to generate diverse, realistic scenes representing distinct animals, objects… ▽ More

    Submitted 18 June, 2016; v1 submitted 16 June, 2016; originally announced June 2016.

  37. arXiv:1601.06759  [pdf, other

    cs.CV cs.LG cs.NE

    Pixel Recurrent Neural Networks

    Authors: Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu

    Abstract: Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of depend… ▽ More

    Submitted 19 August, 2016; v1 submitted 25 January, 2016; originally announced January 2016.

  38. arXiv:1511.01844  [pdf, other

    stat.ML cs.LG

    A note on the evaluation of generative models

    Authors: Lucas Theis, Aäron van den Oord, Matthias Bethge

    Abstract: Probabilistic generative models can be used for compression, denoising, inpainting, texture synthesis, semi-supervised learning, unsupervised feature learning, and other tasks. Given this wide range of applications, it is not surprising that a lot of heterogeneity exists in the way these models are formulated, trained, and evaluated. As a consequence, direct comparison between models is often diff… ▽ More

    Submitted 24 April, 2016; v1 submitted 5 November, 2015; originally announced November 2015.

  39. arXiv:1506.01911  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video

    Authors: Lionel Pigou, Aäron van den Oord, Sander Dieleman, Mieke Van Herreweghe, Joni Dambre

    Abstract: Recent studies have demonstrated the power of recurrent neural networks for machine translation, image captioning and speech recognition. For the task of capturing temporal structure in video, however, there still remain numerous open research questions. Current research suggests using a simple temporal feature pooling strategy to take into account the temporal aspect of video. We demonstrate that… ▽ More

    Submitted 10 February, 2016; v1 submitted 5 June, 2015; originally announced June 2015.