Zum Hauptinhalt springen

Showing 1–45 of 45 results for author: Salimans, T

.
  1. arXiv:2406.04103  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    Multistep Distillation of Diffusion Models via Moment Matching

    Authors: Tim Salimans, Thomas Mensink, Jonathan Heek, Emiel Hoogeboom

    Abstract: We present a new method for making diffusion models faster to sample. The method distills many-step diffusion models into few-step models by matching conditional expectations of the clean data given noisy data along the sampling trajectory. Our approach extends recently proposed one-step methods to the multi-step case, and provides a new perspective by interpreting these approaches in terms of mom… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2405.16852  [pdf, other

    cs.LG cs.AI stat.ML

    EM Distillation for One-step Diffusion Models

    Authors: Sirui Xie, Zhisheng Xiao, Diederik P Kingma, Tingbo Hou, Ying Nian Wu, Kevin Patrick Murphy, Tim Salimans, Ben Poole, Ruiqi Gao

    Abstract: While diffusion models can learn complex distributions, sampling requires a computationally expensive iterative process. Existing distillation methods enable efficient sampling, but have notable limitations, such as performance degradation with very few sampling steps, reliance on training data access, or mode-seeking optimization that may fail to capture the full distribution. We propose EM Disti… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  3. arXiv:2403.06807  [pdf, other

    cs.LG cs.CV stat.ML

    Multistep Consistency Models

    Authors: Jonathan Heek, Emiel Hoogeboom, Tim Salimans

    Abstract: Diffusion models are relatively easy to train but require many steps to generate samples. Consistency models are far more difficult to train, but generate samples in a single step. In this paper we propose Multistep Consistency Models: A unification between Consistency Models (Song et al., 2023) and TRACT (Berthelot et al., 2023) that can interpolate between a consistency model and a diffusion m… ▽ More

    Submitted 3 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  4. arXiv:2402.09470  [pdf, other

    cs.LG stat.ML

    Rolling Diffusion Models

    Authors: David Ruhe, Jonathan Heek, Tim Salimans, Emiel Hoogeboom

    Abstract: Diffusion models have recently been increasingly applied to temporal data such as video, fluid mechanics simulations, or climate data. These methods generally treat subsequent frames equally regarding the amount of noise in the diffusion process. This paper explores Rolling Diffusion: a new approach that uses a sliding window denoising process. It ensures that the diffusion process progressively c… ▽ More

    Submitted 9 September, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  5. arXiv:2301.11093  [pdf, other

    cs.CV cs.LG stat.ML

    Simple diffusion: End-to-end diffusion for high resolution images

    Authors: Emiel Hoogeboom, Jonathan Heek, Tim Salimans

    Abstract: Currently, applying diffusion models in pixel space of high resolution images is difficult. Instead, existing approaches focus on diffusion in lower dimensional spaces (latent diffusion), or have multiple super-resolution levels of generation referred to as cascades. The downside is that these approaches add additional complexity to the diffusion framework. This paper aims to improve denoising d… ▽ More

    Submitted 12 December, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

  6. arXiv:2210.03142  [pdf, other

    cs.CV cs.AI cs.LG

    On Distillation of Guided Diffusion Models

    Authors: Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik P. Kingma, Stefano Ermon, Jonathan Ho, Tim Salimans

    Abstract: Classifier-free guided diffusion models have recently been shown to be highly effective at high-resolution image generation, and they have been widely used in large-scale diffusion frameworks including DALLE-2, Stable Diffusion and Imagen. However, a downside of classifier-free guided diffusion models is that they are computationally expensive at inference time since they require evaluating two di… ▽ More

    Submitted 12 April, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: CVPR 2023, Award candidate

  7. arXiv:2210.02303  [pdf, other

    cs.CV cs.LG

    Imagen Video: High Definition Video Generation with Diffusion Models

    Authors: Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J. Fleet, Tim Salimans

    Abstract: We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. We describe how we scale up the system as a high definition text-to-video model including design deci… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: See accompanying website: https://imagen.research.google/video/

  8. arXiv:2209.05557  [pdf, other

    cs.LG cs.CV stat.ML

    Blurring Diffusion Models

    Authors: Emiel Hoogeboom, Tim Salimans

    Abstract: Recently, Rissanen et al., (2022) have presented a new type of diffusion process for generative modeling based on heat dissipation, or blurring, as an alternative to isotropic Gaussian diffusion. Here, we show that blurring can equivalently be defined through a Gaussian diffusion process with non-isotropic noise. In making this connection, we bridge the gap between inverse heat dissipation and den… ▽ More

    Submitted 1 May, 2024; v1 submitted 12 September, 2022; originally announced September 2022.

  9. arXiv:2207.12598  [pdf, other

    cs.LG cs.AI

    Classifier-Free Diffusion Guidance

    Authors: Jonathan Ho, Tim Salimans

    Abstract: Classifier guidance is a recently introduced method to trade off mode coverage and sample fidelity in conditional diffusion models post training, in the same spirit as low temperature sampling or truncation in other types of generative models. Classifier guidance combines the score estimate of a diffusion model with the gradient of an image classifier and thereby requires training an image classif… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: A short version of this paper appeared in the NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications: https://openreview.net/pdf?id=qw8AKxfYbI

  10. arXiv:2206.08889  [pdf, other

    stat.ML cs.IT cs.LG

    Lossy Compression with Gaussian Diffusion

    Authors: Lucas Theis, Tim Salimans, Matthew D. Hoffman, Fabian Mentzer

    Abstract: We consider a novel lossy compression approach based on unconditional diffusion generative models, which we call DiffC. Unlike modern compression schemes which rely on transform coding and quantization to restrict the transmitted information, DiffC relies on the efficient communication of pixels corrupted by Gaussian noise. We implement a proof of concept and find that it works surprisingly well d… ▽ More

    Submitted 31 December, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

  11. arXiv:2205.11487  [pdf, other

    cs.CV cs.LG

    Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

    Authors: Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi

    Abstract: We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only c… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  12. arXiv:2204.03458  [pdf, other

    cs.CV cs.AI cs.LG

    Video Diffusion Models

    Authors: Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, David J. Fleet

    Abstract: Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion model for video generation that shows very promising initial results. Our model is a natural extension of the standard image diffusion architecture, and it enables jointly training from image and video data, which we find to… ▽ More

    Submitted 22 June, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

  13. arXiv:2202.00512  [pdf, other

    cs.LG cs.AI stat.ML

    Progressive Distillation for Fast Sampling of Diffusion Models

    Authors: Tim Salimans, Jonathan Ho

    Abstract: Diffusion models have recently shown great promise for generative modeling, outperforming GANs on perceptual quality and autoregressive models at density estimation. A remaining downside is their slow sampling time: generating high quality samples takes many hundreds or thousands of model evaluations. Here we make two contributions to help eliminate this downside: First, we present new parameteriz… ▽ More

    Submitted 7 June, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

    Comments: Published as a conference paper at ICLR 2022

  14. arXiv:2111.05826  [pdf, other

    cs.CV cs.LG

    Palette: Image-to-Image Diffusion Models

    Authors: Chitwan Saharia, William Chan, Huiwen Chang, Chris A. Lee, Jonathan Ho, Tim Salimans, David J. Fleet, Mohammad Norouzi

    Abstract: This paper develops a unified framework for image-to-image translation based on conditional diffusion models and evaluates this framework on four challenging image-to-image translation tasks, namely colorization, inpainting, uncropping, and JPEG restoration. Our simple implementation of image-to-image diffusion models outperforms strong GAN and regression baselines on all tasks, without task-speci… ▽ More

    Submitted 3 May, 2022; v1 submitted 10 November, 2021; originally announced November 2021.

  15. arXiv:2110.02037  [pdf, other

    cs.LG stat.ML

    Autoregressive Diffusion Models

    Authors: Emiel Hoogeboom, Alexey A. Gritsenko, Jasmijn Bastings, Ben Poole, Rianne van den Berg, Tim Salimans

    Abstract: We introduce Autoregressive Diffusion Models (ARDMs), a model class encompassing and generalizing order-agnostic autoregressive models (Uria et al., 2014) and absorbing discrete diffusion (Austin et al., 2021), which we show are special cases of ARDMs under mild assumptions. ARDMs are simple to implement and easy to train. Unlike standard ARMs, they do not require causal masking of model represent… ▽ More

    Submitted 1 February, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: Published as a conference paper at International Conference on Learning Representations (ICLR) 2022

  16. arXiv:2107.00630  [pdf, other

    cs.LG stat.ML

    Variational Diffusion Models

    Authors: Diederik P. Kingma, Tim Salimans, Ben Poole, Jonathan Ho

    Abstract: Diffusion-based generative models have demonstrated a capacity for perceptually impressive synthesis, but can they also be great likelihood-based models? We answer this in the affirmative, and introduce a family of diffusion-based generative models that obtain state-of-the-art likelihoods on standard image density estimation benchmarks. Unlike other diffusion-based models, our method allows for ef… ▽ More

    Submitted 13 April, 2023; v1 submitted 1 July, 2021; originally announced July 2021.

    Comments: Published at NeurIPS'21

  17. arXiv:2106.15282  [pdf, other

    cs.CV cs.AI cs.LG

    Cascaded Diffusion Models for High Fidelity Image Generation

    Authors: Jonathan Ho, Chitwan Saharia, William Chan, David J. Fleet, Mohammad Norouzi, Tim Salimans

    Abstract: We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation benchmark, without any assistance from auxiliary image classifiers to boost sample quality. A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution, beginning with a standard diffusion model at the lowe… ▽ More

    Submitted 17 December, 2021; v1 submitted 30 May, 2021; originally announced June 2021.

  18. arXiv:2104.09402  [pdf, other

    cs.LG cs.AI

    Agent-Centric Representations for Multi-Agent Reinforcement Learning

    Authors: Wenling Shang, Lasse Espeholt, Anton Raichuk, Tim Salimans

    Abstract: Object-centric representations have recently enabled significant progress in tackling relational reasoning tasks. By building a strong object-centric inductive bias into neural architectures, recent efforts have improved generalization and data efficiency of machine learning algorithms for these problems. One problem class involving relational reasoning that still remains under-explored is multi-a… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

  19. arXiv:2104.07636  [pdf, other

    eess.IV cs.CV cs.LG

    Image Super-Resolution via Iterative Refinement

    Authors: Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, Mohammad Norouzi

    Abstract: We present SR3, an approach to image Super-Resolution via Repeated Refinement. SR3 adapts denoising diffusion probabilistic models to conditional image generation and performs super-resolution through a stochastic denoising process. Inference starts with pure Gaussian noise and iteratively refines the noisy output using a U-Net model trained on denoising at various noise levels. SR3 exhibits stron… ▽ More

    Submitted 30 June, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

  20. arXiv:2008.01160  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    A Spectral Energy Distance for Parallel Speech Synthesis

    Authors: Alexey A. Gritsenko, Tim Salimans, Rianne van den Berg, Jasper Snoek, Nal Kalchbrenner

    Abstract: Speech synthesis is an important practical generative modeling problem that has seen great progress over the last few years, with likelihood-based autoregressive neural models now outperforming traditional concatenative systems. A downside of such autoregressive models is that they require executing tens of thousands of sequential operations per second of generated audio, making them ill-suited fo… ▽ More

    Submitted 23 October, 2020; v1 submitted 3 August, 2020; originally announced August 2020.

  21. arXiv:2006.12459  [pdf, other

    cs.LG stat.ML

    IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression

    Authors: Rianne van den Berg, Alexey A. Gritsenko, Mostafa Dehghani, Casper Kaae Sønderby, Tim Salimans

    Abstract: In this paper we analyse and improve integer discrete flows for lossless compression. Integer discrete flows are a recently proposed class of models that learn invertible transformations for integer-valued random variables. Their discrete nature makes them particularly suitable for lossless compression with entropy coding schemes. We start by investigating a recent theoretical claim that states th… ▽ More

    Submitted 23 March, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: Accepted as a conference paper at the Ninth International Conference on Learning Representations (ICLR) 2021

  22. arXiv:2003.12140  [pdf, other

    cs.LG physics.ao-ph stat.ML

    MetNet: A Neural Weather Model for Precipitation Forecasting

    Authors: Casper Kaae Sønderby, Lasse Espeholt, Jonathan Heek, Mostafa Dehghani, Avital Oliver, Tim Salimans, Shreya Agrawal, Jason Hickey, Nal Kalchbrenner

    Abstract: Weather forecasting is a long standing scientific challenge with direct social and economic impact. The task is suitable for deep neural networks due to vast amounts of continuously collected data and a rich spatial and temporal structure that presents long range dependencies. We introduce MetNet, a neural network that forecasts precipitation up to 8 hours into the future at the high spatial resol… ▽ More

    Submitted 30 March, 2020; v1 submitted 24 March, 2020; originally announced March 2020.

  23. arXiv:2003.12022  [pdf, other

    cs.CV

    Milking CowMask for Semi-Supervised Image Classification

    Authors: Geoff French, Avital Oliver, Tim Salimans

    Abstract: Consistency regularization is a technique for semi-supervised learning that underlies a number of strong results for classification with few labeled data. It works by encouraging a learned model to be robust to perturbations on unlabeled data. Here, we present a novel mask-based augmentation method called CowMask. Using it to provide perturbations for semi-supervised consistency regularization, we… ▽ More

    Submitted 5 June, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

    Comments: 11 pages, 2 figures, submitted to NeurIPS 2020

  24. arXiv:2002.02655  [pdf, other

    cs.LG stat.ML

    The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks

    Authors: Jakub Swiatkowski, Kevin Roth, Bastiaan S. Veeling, Linh Tran, Joshua V. Dillon, Jasper Snoek, Stephan Mandt, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin

    Abstract: Variational Bayesian Inference is a popular methodology for approximating posterior distributions over Bayesian neural network weights. Recent work developing this class of methods has explored ever richer parameterizations of the approximate posterior in the hope of improving performance. In contrast, here we share a curious experimental finding that suggests instead restricting the variational d… ▽ More

    Submitted 5 July, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

  25. arXiv:2002.02405  [pdf, other

    stat.ML cs.LG stat.CO

    How Good is the Bayes Posterior in Deep Neural Networks Really?

    Authors: Florian Wenzel, Kevin Roth, Bastiaan S. Veeling, Jakub Świątkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin

    Abstract: During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early 2020---no publicized deployments of Bayesian neura… ▽ More

    Submitted 2 July, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

    Comments: Full version (main paper and appendix) of the ICML 2020 publication

  26. arXiv:2001.04694  [pdf, other

    cs.LG stat.ML

    Hydra: Preserving Ensemble Diversity for Model Distillation

    Authors: Linh Tran, Bastiaan S. Veeling, Kevin Roth, Jakub Swiatkowski, Joshua V. Dillon, Jasper Snoek, Stephan Mandt, Tim Salimans, Sebastian Nowozin, Rodolphe Jenatton

    Abstract: Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty. However, they are expensive in computation and memory. Therefore, recent research has focused on distilling ensembles into a single compact model, reducing the computational and memory burden of the ensemble while trying to preserve its predictive behavior. Most existing d… ▽ More

    Submitted 19 March, 2021; v1 submitted 14 January, 2020; originally announced January 2020.

    Comments: Accepted to ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning

  27. arXiv:1912.12180  [pdf, other

    cs.CV

    Axial Attention in Multidimensional Transformers

    Authors: Jonathan Ho, Nal Kalchbrenner, Dirk Weissenborn, Tim Salimans

    Abstract: We propose Axial Transformers, a self-attention-based autoregressive model for images and other data organized as high dimensional tensors. Existing autoregressive models either suffer from excessively large computational resource requirements for high dimensional data, or make compromises in terms of distribution expressiveness or ease of implementation in order to decrease resource requirements.… ▽ More

    Submitted 20 December, 2019; originally announced December 2019.

    Comments: 10 pages

  28. arXiv:1912.06680  [pdf, other

    cs.LG stat.ML

    Dota 2 with Large Scale Deep Reinforcement Learning

    Authors: OpenAI, :, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique P. d. O. Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang , et al. (2 additional authors not shown)

    Abstract: On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learnin… ▽ More

    Submitted 13 December, 2019; originally announced December 2019.

  29. The Likelihood of Mixed Hitting Times

    Authors: Jaap H. Abbring, Tim Salimans

    Abstract: We present a method for computing the likelihood of a mixed hitting-time model that specifies durations as the first time a latent Lévy process crosses a heterogeneous threshold. This likelihood is not generally known in closed form, but its Laplace transform is. Our approach to its computation relies on numerical methods for inverting Laplace transforms that exploit special properties of the firs… ▽ More

    Submitted 30 April, 2021; v1 submitted 9 May, 2019; originally announced May 2019.

    Comments: 37 pages

  30. arXiv:1904.03646  [pdf, other

    cs.LG stat.ML

    Policy Gradient Search: Online Planning and Expert Iteration without Search Trees

    Authors: Thomas Anthony, Robert Nishihara, Philipp Moritz, Tim Salimans, John Schulman

    Abstract: Monte Carlo Tree Search (MCTS) algorithms perform simulation-based search to improve policies online. During search, the simulation policy is adapted to explore the most promising lines of play. MCTS has been used by state-of-the-art programs for many problems, however a disadvantage to MCTS is that it estimates the values of states with Monte Carlo averages, stored in a search tree; this does not… ▽ More

    Submitted 7 April, 2019; originally announced April 2019.

  31. arXiv:1812.03381  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Learning Montezuma's Revenge from a Single Demonstration

    Authors: Tim Salimans, Richard Chen

    Abstract: We propose a new method for learning from a single demonstration to solve hard exploration tasks like the Atari game Montezuma's Revenge. Instead of imitating human demonstrations, as proposed in other recent works, our approach is to maximize rewards directly. Our agent is trained using off-the-shelf reinforcement learning, but starts every episode by resetting to a state from a demonstration. By… ▽ More

    Submitted 8 December, 2018; originally announced December 2018.

    Comments: Deep RL Workshop, NeurIPS 2018

  32. arXiv:1803.05573  [pdf, other

    cs.LG stat.ML

    Improving GANs Using Optimal Transport

    Authors: Tim Salimans, Han Zhang, Alec Radford, Dimitris Metaxas

    Abstract: We present Optimal Transport GAN (OT-GAN), a variant of generative adversarial nets minimizing a new metric measuring the distance between the generator distribution and the data distribution. This metric, which we call mini-batch energy distance, combines optimal transport in primal form with an energy distance defined in an adversarially learned feature space, resulting in a highly discriminativ… ▽ More

    Submitted 14 March, 2018; originally announced March 2018.

  33. arXiv:1703.03864  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Evolution Strategies as a Scalable Alternative to Reinforcement Learning

    Authors: Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, Ilya Sutskever

    Abstract: We explore the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients. Experiments on MuJoCo and Atari show that ES is a viable solution strategy that scales extremely well with the number of CPUs available: By using a novel communication strategy based on common random numbers, ou… ▽ More

    Submitted 7 September, 2017; v1 submitted 10 March, 2017; originally announced March 2017.

  34. arXiv:1701.05517  [pdf, other

    cs.LG stat.ML

    PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications

    Authors: Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma

    Abstract: PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood. Here we discuss our implementation of PixelCNNs which we make available at https://github.com/openai/pixel-cnn. Our implementation contains a number of modifications to the original model that both simplify its structure and improve its performance. 1) We use a discretized logistic mixture likelihood o… ▽ More

    Submitted 19 January, 2017; originally announced January 2017.

  35. arXiv:1611.02731  [pdf, other

    cs.LG stat.ML

    Variational Lossy Autoencoder

    Authors: Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, Pieter Abbeel

    Abstract: Representation learning seeks to expose certain aspects of observed data in a learned representation that's amenable to downstream tasks like classification. For instance, a good representation for 2D images might be one that describes only global structure and discards information about detailed texture. In this paper, we present a simple but principled method to learn such global representations… ▽ More

    Submitted 4 March, 2017; v1 submitted 8 November, 2016; originally announced November 2016.

    Comments: Added CIFAR10 experiments; ICLR 2017

  36. arXiv:1606.04934  [pdf, other

    cs.LG stat.ML

    Improving Variational Inference with Inverse Autoregressive Flow

    Authors: Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling

    Abstract: The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables. We propose a new type of normalizing flow, inverse autoregressive flow (IAF), that, in contrast to earlier published flows, scales well to high-dimensional latent spaces. The proposed flow consists of a chain of invertible transformations, where each transformation… ▽ More

    Submitted 30 January, 2017; v1 submitted 15 June, 2016; originally announced June 2016.

  37. arXiv:1606.03498  [pdf, other

    cs.LG cs.CV cs.NE

    Improved Techniques for Training GANs

    Authors: Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen

    Abstract: We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. We focus on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic. Unlike most work on generative models, our primary goal is not to train a model that assigns high likelihood to test data, n… ▽ More

    Submitted 10 June, 2016; originally announced June 2016.

  38. arXiv:1602.08734  [pdf, ps, other

    stat.ML cs.LG stat.CO

    A Structured Variational Auto-encoder for Learning Deep Hierarchies of Sparse Features

    Authors: Tim Salimans

    Abstract: In this note we present a generative model of natural images consisting of a deep hierarchy of layers of latent random variables, each of which follows a new type of distribution that we call rectified Gaussian. These rectified Gaussian units allow spike-and-slab type sparsity, while retaining the differentiability necessary for efficient stochastic gradient variational inference. To learn the par… ▽ More

    Submitted 28 February, 2016; originally announced February 2016.

  39. arXiv:1602.07868  [pdf, other

    cs.LG cs.AI cs.NE

    Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

    Authors: Tim Salimans, Diederik P. Kingma

    Abstract: We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does not i… ▽ More

    Submitted 3 June, 2016; v1 submitted 25 February, 2016; originally announced February 2016.

  40. arXiv:1506.02557  [pdf, other

    stat.ML cs.LG stat.CO

    Variational Dropout and the Local Reparameterization Trick

    Authors: Diederik P. Kingma, Tim Salimans, Max Welling

    Abstract: We investigate a local reparameterizaton technique for greatly reducing the variance of stochastic gradients for variational Bayesian inference (SGVB) of a posterior over model parameters, while retaining parallelizability. This local reparameterization translates uncertainty about global parameters into local noise that is independent across datapoints in the minibatch. Such parameterizations can… ▽ More

    Submitted 20 December, 2015; v1 submitted 8 June, 2015; originally announced June 2015.

  41. arXiv:1410.6460  [pdf, other

    stat.CO stat.ML

    Markov Chain Monte Carlo and Variational Inference: Bridging the Gap

    Authors: Tim Salimans, Diederik P. Kingma, Max Welling

    Abstract: Recent advances in stochastic gradient variational inference have made it possible to perform variational Bayesian inference with posterior approximations containing auxiliary random variables. This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we incorporate one or more steps of MCMC into our variational approximation. By doing so we obtain a rich cl… ▽ More

    Submitted 19 May, 2015; v1 submitted 23 October, 2014; originally announced October 2014.

  42. arXiv:1401.2135  [pdf, other

    stat.CO

    Implementing and Automating Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression

    Authors: Tim Salimans

    Abstract: We recently proposed a general algorithm for approximating nonstandard Bayesian posterior distributions by minimization of their Kullback-Leibler divergence with respect to a more convenient approximating distribution. In this note we offer details on how to efficiently implement this algorithm in practice. We also suggest default choices for the form of the posterior approximation, the number of… ▽ More

    Submitted 9 January, 2014; originally announced January 2014.

  43. arXiv:1401.1022  [pdf, ps, other

    stat.CO

    On Using Control Variates with Stochastic Approximation for Variational Bayes and its Connection to Stochastic Linear Regression

    Authors: Tim Salimans, David A. Knowles

    Abstract: Recently, we and several other authors have written about the possibilities of using stochastic approximation techniques for fitting variational approximations to intractable Bayesian posterior distributions. Naive implementations of stochastic approximation suffer from high variance in this setting. Several authors have therefore suggested using control variates to reduce this variance, while we… ▽ More

    Submitted 12 January, 2014; v1 submitted 6 January, 2014; originally announced January 2014.

  44. arXiv:1311.0704  [pdf, other

    astro-ph.IM astro-ph.CO physics.soc-ph

    Observing Dark Worlds: A crowdsourcing experiment for dark matter mapping

    Authors: David Harvey, Thomas D. Kitching, Joyce Noah-Vanhoucke, Ben Hamner, Tim Salimans

    Abstract: We present the results and conclusions from the citizen science competition `Observing Dark Worlds', where we asked participants to calculate the positions of dark matter halos from 120 catalogues of simulated weak lensing galaxy data, using computational methods. In partnership with Kaggle (http://www.kaggle.com), 357 users participated in the competition which saw 2278 downloads of the data and… ▽ More

    Submitted 4 November, 2013; originally announced November 2013.

    Comments: 25 Pages (Large spaced, equiv. 8 pages in MNRAS style), 7 Figures

  45. arXiv:1206.6679  [pdf, other

    stat.CO cs.CV stat.ML

    Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression

    Authors: Tim Salimans, David A. Knowles

    Abstract: We propose a general algorithm for approximating nonstandard Bayesian posterior distributions. The algorithm minimizes the Kullback-Leibler divergence of an approximating distribution to the intractable posterior distribution. Our method can be used to approximate any posterior distribution, provided that it is given in closed form up to the proportionality constant. The approximation can be any d… ▽ More

    Submitted 28 July, 2014; v1 submitted 28 June, 2012; originally announced June 2012.

    MSC Class: 62F15

    Journal ref: Bayesian Analysis, Volume 8, Number 4 (2013), 837-882