Zum Hauptinhalt springen

Showing 1–27 of 27 results for author: Kalchbrenner, N

.
  1. arXiv:2306.06079  [pdf, other

    physics.ao-ph cs.LG

    Deep Learning for Day Forecasts from Sparse Observations

    Authors: Marcin Andrychowicz, Lasse Espeholt, Di Li, Samier Merchant, Alexander Merose, Fred Zyda, Shreya Agrawal, Nal Kalchbrenner

    Abstract: Deep neural networks offer an alternative paradigm for modeling weather conditions. The ability of neural models to make a prediction in less than a second once the data is available and to do so with very high temporal and spatial resolution, and the ability to learn directly from atmospheric observations, are just some of these models' unique advantages. Neural models trained using atmospheric o… ▽ More

    Submitted 6 July, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

  2. arXiv:2203.04946  [pdf, other

    cs.CV

    Do better ImageNet classifiers assess perceptual similarity better?

    Authors: Manoj Kumar, Neil Houlsby, Nal Kalchbrenner, Ekin D. Cubuk

    Abstract: Perceptual distances between images, as measured in the space of pre-trained deep features, have outperformed prior low-level, pixel-based metrics on assessing perceptual similarity. While the capabilities of older and less accurate models such as AlexNet and VGG to capture perceptual similarity are well known, modern and more accurate models are less studied. In this paper, we present a large-sca… ▽ More

    Submitted 29 October, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: TMLR 2022 (https://openreview.net/forum?id=qrGKGZZvH0)

  3. arXiv:2111.07470  [pdf, other

    cs.LG physics.ao-ph

    Skillful Twelve Hour Precipitation Forecasts using Large Context Neural Networks

    Authors: Lasse Espeholt, Shreya Agrawal, Casper Sønderby, Manoj Kumar, Jonathan Heek, Carla Bromberg, Cenk Gazen, Jason Hickey, Aaron Bell, Nal Kalchbrenner

    Abstract: The problem of forecasting weather has been scientifically studied for centuries due to its high impact on human lives, transportation, food production and energy management, among others. Current operational forecasting models are based on physics and use supercomputers to simulate the atmosphere to make forecasts hours and days in advance. Better physics-based forecasts require improvements in t… ▽ More

    Submitted 14 November, 2021; originally announced November 2021.

    Comments: 34 pages

  4. arXiv:2106.06080  [pdf, other

    cs.LG cs.AI

    Gradual Domain Adaptation in the Wild:When Intermediate Distributions are Absent

    Authors: Samira Abnar, Rianne van den Berg, Golnaz Ghiasi, Mostafa Dehghani, Nal Kalchbrenner, Hanie Sedghi

    Abstract: We focus on the problem of domain adaptation when the goal is shifting the model towards the target distribution, rather than learning domain invariant representations. It has been shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution, self-training can be success… ▽ More

    Submitted 13 July, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

  5. arXiv:2102.11107  [pdf, other

    cs.LG cs.AI

    Towards Causal Representation Learning

    Authors: Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, Yoshua Bengio

    Abstract: The two fields of machine learning and graphical causality arose and developed separately. However, there is now cross-pollination and increasing interest in both fields to benefit from the advances of the other. In the present paper, we review fundamental concepts of causal inference and relate them to crucial open problems of machine learning, including transfer and generalization, thereby assay… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Comments: Special Issue of Proceedings of the IEEE - Advances in Machine Learning and Deep Neural Networks

  6. arXiv:2102.04432  [pdf, other

    cs.CV cs.AI cs.LG

    Colorization Transformer

    Authors: Manoj Kumar, Dirk Weissenborn, Nal Kalchbrenner

    Abstract: We present the Colorization Transformer, a novel approach for diverse high fidelity image colorization based on self-attention. Given a grayscale image, the colorization proceeds in three steps. We first use a conditional autoregressive transformer to produce a low resolution coarse coloring of the grayscale image. Our architecture adopts conditional transformer layers to effectively condition gra… ▽ More

    Submitted 7 March, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

    Comments: ICLR 2021 Camera Ready. See https://openreview.net/forum?id=5NA1PinlGFu for more details

  7. arXiv:2008.01160  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    A Spectral Energy Distance for Parallel Speech Synthesis

    Authors: Alexey A. Gritsenko, Tim Salimans, Rianne van den Berg, Jasper Snoek, Nal Kalchbrenner

    Abstract: Speech synthesis is an important practical generative modeling problem that has seen great progress over the last few years, with likelihood-based autoregressive neural models now outperforming traditional concatenative systems. A downside of such autoregressive models is that they require executing tens of thousands of sequential operations per second of generated audio, making them ill-suited fo… ▽ More

    Submitted 23 October, 2020; v1 submitted 3 August, 2020; originally announced August 2020.

  8. arXiv:2004.03705  [pdf, other

    cs.CL cs.LG stat.ML

    Deep Learning Based Text Classification: A Comprehensive Review

    Authors: Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam Chenaghlu, Jianfeng Gao

    Abstract: Deep learning based models have surpassed classical machine learning based approaches in various text classification tasks, including sentiment analysis, news categorization, question answering, and natural language inference. In this paper, we provide a comprehensive review of more than 150 deep learning based models for text classification developed in recent years, and discuss their technical c… ▽ More

    Submitted 4 January, 2021; v1 submitted 5 April, 2020; originally announced April 2020.

  9. arXiv:2003.12140  [pdf, other

    cs.LG physics.ao-ph stat.ML

    MetNet: A Neural Weather Model for Precipitation Forecasting

    Authors: Casper Kaae Sønderby, Lasse Espeholt, Jonathan Heek, Mostafa Dehghani, Avital Oliver, Tim Salimans, Shreya Agrawal, Jason Hickey, Nal Kalchbrenner

    Abstract: Weather forecasting is a long standing scientific challenge with direct social and economic impact. The task is suitable for deep neural networks due to vast amounts of continuously collected data and a rich spatial and temporal structure that presents long range dependencies. We introduce MetNet, a neural network that forecasts precipitation up to 8 hours into the future at the high spatial resol… ▽ More

    Submitted 30 March, 2020; v1 submitted 24 March, 2020; originally announced March 2020.

  10. arXiv:1912.12180  [pdf, other

    cs.CV

    Axial Attention in Multidimensional Transformers

    Authors: Jonathan Ho, Nal Kalchbrenner, Dirk Weissenborn, Tim Salimans

    Abstract: We propose Axial Transformers, a self-attention-based autoregressive model for images and other data organized as high dimensional tensors. Existing autoregressive models either suffer from excessively large computational resource requirements for high dimensional data, or make compromises in terms of distribution expressiveness or ease of implementation in order to decrease resource requirements.… ▽ More

    Submitted 20 December, 2019; originally announced December 2019.

    Comments: 10 pages

  11. arXiv:1908.03491  [pdf, other

    cs.LG cs.CV stat.ML

    Bayesian Inference for Large Scale Image Classification

    Authors: Jonathan Heek, Nal Kalchbrenner

    Abstract: Bayesian inference promises to ground and improve the performance of deep neural networks. It promises to be robust to overfitting, to simplify the training procedure and the space of hyperparameters, and to provide a calibrated measure of uncertainty that can enhance decision making, agent exploration and prediction fairness. Markov Chain Monte Carlo (MCMC) methods enable Bayesian inference by ge… ▽ More

    Submitted 9 August, 2019; originally announced August 2019.

  12. arXiv:1812.01608  [pdf, other

    cs.CV cs.GR cs.LG stat.ML

    Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling

    Authors: Jacob Menick, Nal Kalchbrenner

    Abstract: The unconditional generation of high fidelity images is a longstanding benchmark for testing the performance of image decoders. Autoregressive image models have been able to generate small images unconditionally, but the extension of these methods to large images where fidelity can be more readily assessed has remained an open problem. Among the major challenges are the capacity to encode the vast… ▽ More

    Submitted 4 December, 2018; originally announced December 2018.

  13. arXiv:1803.07416  [pdf, other

    cs.LG cs.CL stat.ML

    Tensor2Tensor for Neural Machine Translation

    Authors: Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit

    Abstract: Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.

    Submitted 16 March, 2018; originally announced March 2018.

    Comments: arXiv admin note: text overlap with arXiv:1706.03762

  14. arXiv:1802.08435  [pdf, other

    cs.SD cs.LG eess.AS

    Efficient Neural Audio Synthesis

    Authors: Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron van den Oord, Sander Dieleman, Koray Kavukcuoglu

    Abstract: Sequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating high-quality samples. Efficient sampling for this class of models has however remained an elusive problem. With a focus on text-to-speech synthesis, we describe a set of general techniques for reducing sampling time while maintaining high outp… ▽ More

    Submitted 25 June, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

    Comments: 10 pages

  15. arXiv:1711.10433  [pdf, other

    cs.LG

    Parallel WaveNet: Fast High-Fidelity Speech Synthesis

    Authors: Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, Demis Hassabis

    Abstract: The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today's massively parallel computers, and therefore hard to deploy in a real-time p… ▽ More

    Submitted 28 November, 2017; originally announced November 2017.

  16. arXiv:1703.03664  [pdf, other

    cs.CV cs.NE

    Parallel Multiscale Autoregressive Density Estimation

    Authors: Scott Reed, Aäron van den Oord, Nal Kalchbrenner, Sergio Gómez Colmenarejo, Ziyu Wang, Dan Belov, Nando de Freitas

    Abstract: PixelCNN achieves state-of-the-art results in density estimation for natural images. Although training is fast, inference is costly, requiring one network evaluation per pixel; O(N) for N pixels. This can be sped up by caching activations, but still involves generating each pixel sequentially. In this work, we propose a parallelized PixelCNN that allows more efficient inference by modeling certain… ▽ More

    Submitted 10 March, 2017; originally announced March 2017.

  17. arXiv:1610.10099  [pdf, other

    cs.CL cs.LG

    Neural Machine Translation in Linear Time

    Authors: Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, Koray Kavukcuoglu

    Abstract: We present a novel neural network for processing sequences. The ByteNet is a one-dimensional convolutional neural network that is composed of two parts, one to encode the source sequence and the other to decode the target sequence. The two network parts are connected by stacking the decoder on top of the encoder and preserving the temporal resolution of the sequences. To address the differing leng… ▽ More

    Submitted 15 March, 2017; v1 submitted 31 October, 2016; originally announced October 2016.

    Comments: 9 pages

  18. arXiv:1610.00527  [pdf, other

    cs.CV cs.LG

    Video Pixel Networks

    Authors: Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu

    Abstract: We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over t… ▽ More

    Submitted 3 October, 2016; originally announced October 2016.

    Comments: 16 pages

  19. arXiv:1609.03499  [pdf, other

    cs.SD cs.LG

    WaveNet: A Generative Model for Raw Audio

    Authors: Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu

    Abstract: This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-of-… ▽ More

    Submitted 19 September, 2016; v1 submitted 12 September, 2016; originally announced September 2016.

  20. arXiv:1606.05328  [pdf, other

    cs.CV cs.LG

    Conditional Image Generation with PixelCNN Decoders

    Authors: Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu

    Abstract: This work explores conditional image generation with a new image density model based on the PixelCNN architecture. The model can be conditioned on any vector, including descriptive labels or tags, or latent embeddings created by other networks. When conditioned on class labels from the ImageNet database, the model is able to generate diverse, realistic scenes representing distinct animals, objects… ▽ More

    Submitted 18 June, 2016; v1 submitted 16 June, 2016; originally announced June 2016.

  21. arXiv:1602.03032  [pdf, other

    cs.NE

    Associative Long Short-Term Memory

    Authors: Ivo Danihelka, Greg Wayne, Benigno Uria, Nal Kalchbrenner, Alex Graves

    Abstract: We investigate a new method to augment recurrent neural networks with extra memory without increasing the number of network parameters. The system has an associative memory based on complex-valued vectors and is closely related to Holographic Reduced Representations and Long Short-Term Memory networks. Holographic Reduced Representations have limited capacity: as they store more information, each… ▽ More

    Submitted 19 May, 2016; v1 submitted 9 February, 2016; originally announced February 2016.

    Comments: ICML-2016

  22. arXiv:1601.06759  [pdf, other

    cs.CV cs.LG cs.NE

    Pixel Recurrent Neural Networks

    Authors: Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu

    Abstract: Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of depend… ▽ More

    Submitted 19 August, 2016; v1 submitted 25 January, 2016; originally announced January 2016.

  23. arXiv:1507.01526  [pdf, other

    cs.NE cs.CL cs.LG

    Grid Long Short-Term Memory

    Authors: Nal Kalchbrenner, Ivo Danihelka, Alex Graves

    Abstract: This paper introduces Grid Long Short-Term Memory, a network of LSTM cells arranged in a multidimensional grid that can be applied to vectors, sequences or higher dimensional data such as images. The network differs from existing deep LSTM architectures in that the cells are connected between network layers as well as along the spatiotemporal dimensions of the data. The network provides a unified… ▽ More

    Submitted 7 January, 2016; v1 submitted 6 July, 2015; originally announced July 2015.

    Comments: 15 pages

  24. arXiv:1408.6181  [pdf, ps, other

    cs.CL

    Resolving Lexical Ambiguity in Tensor Regression Models of Meaning

    Authors: Dimitri Kartsaklis, Nal Kalchbrenner, Mehrnoosh Sadrzadeh

    Abstract: This paper provides a method for improving tensor-based compositional distributional models of meaning by the addition of an explicit disambiguation step prior to composition. In contrast with previous research where this hypothesis has been successfully tested against relatively simple compositional models, in our work we use a robust model trained with linear regression. The results we get in tw… ▽ More

    Submitted 26 August, 2014; originally announced August 2014.

    Journal ref: Proceedings of ACL 2014, Vol. 2:Short Papers, pp:212-217

  25. arXiv:1406.3830  [pdf, other

    cs.CL cs.LG stat.ML

    Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network

    Authors: Misha Denil, Alban Demiraj, Nal Kalchbrenner, Phil Blunsom, Nando de Freitas

    Abstract: Capturing the compositional process which maps the meaning of words to that of documents is a central challenge for researchers in Natural Language Processing and Information Retrieval. We introduce a model that is able to represent the meaning of documents by embedding them in a low dimensional vector space, while preserving distinctions of word and sentence order crucial for capturing nuanced se… ▽ More

    Submitted 15 June, 2014; originally announced June 2014.

  26. arXiv:1404.2188  [pdf, other

    cs.CL

    A Convolutional Neural Network for Modelling Sentences

    Authors: Nal Kalchbrenner, Edward Grefenstette, Phil Blunsom

    Abstract: The ability to accurately represent sentences is central to language understanding. We describe a convolutional architecture dubbed the Dynamic Convolutional Neural Network (DCNN) that we adopt for the semantic modelling of sentences. The network uses Dynamic k-Max Pooling, a global pooling operation over linear sequences. The network handles input sentences of varying length and induces a feature… ▽ More

    Submitted 8 April, 2014; originally announced April 2014.

  27. arXiv:1306.3584  [pdf, other

    cs.CL

    Recurrent Convolutional Neural Networks for Discourse Compositionality

    Authors: Nal Kalchbrenner, Phil Blunsom

    Abstract: The compositionality of meaning extends beyond the single sentence. Just as words combine to form the meaning of sentences, so do sentences combine to form the meaning of paragraphs, dialogues and general discourse. We introduce both a sentence model and a discourse model corresponding to the two levels of compositionality. The sentence model adopts convolution as the central operation for composi… ▽ More

    Submitted 15 June, 2013; originally announced June 2013.