Skip to main content

Showing 1–20 of 20 results for author: Makhzani, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.17546  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo

    Authors: Stephen Zhao, Rob Brekelmans, Alireza Makhzani, Roger Grosse

    Abstract: Numerous capability and safety techniques of Large Language Models (LLMs), including RLHF, automated red-teaming, prompt engineering, and infilling, can be cast as sampling from an unnormalized target distribution defined by a given reward or potential function over the full sequence. In this work, we leverage the rich toolkit of Sequential Monte Carlo (SMC) for these probabilistic inference probl… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  2. arXiv:2402.03496  [pdf, other

    cs.LG math.OC

    Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective

    Authors: Wu Lin, Felix Dangel, Runa Eschenhagen, Juhan Bae, Richard E. Turner, Alireza Makhzani

    Abstract: Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers. Their diagonal preconditioner is based on the gradient outer product which is incorporated into the parameter update via a square root. While these methods are often motivated as approximate second-order methods, the square root represents a fundamental differen… ▽ More

    Submitted 15 July, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: A long version of the ICML 2024 paper. Fixed some typos and updated the abstract and Sec. 4 to emphasize the concept of preconditioner invariance

  3. arXiv:2312.05705  [pdf, other

    cs.LG stat.ML

    Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC

    Authors: Wu Lin, Felix Dangel, Runa Eschenhagen, Kirill Neklyudov, Agustinus Kristiadi, Richard E. Turner, Alireza Makhzani

    Abstract: Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or decomposition. These limitations render such methods unpopular for modern mixed-precision training. We address them by (i) formulating an inverse-fre… ▽ More

    Submitted 15 June, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

    Comments: A long version of the ICML 2024 paper

  4. arXiv:2310.10649  [pdf, other

    cs.LG math.OC stat.ML

    A Computational Framework for Solving Wasserstein Lagrangian Flows

    Authors: Kirill Neklyudov, Rob Brekelmans, Alexander Tong, Lazar Atanackovic, Qiang Liu, Alireza Makhzani

    Abstract: The dynamical formulation of the optimal transport can be extended through various choices of the underlying geometry (kinetic energy), and the regularization of density paths (potential energy). These combinations yield different variational problems (Lagrangians), encompassing many variations of the optimal transport problem such as the Schrödinger bridge, unbalanced optimal transport, and optim… ▽ More

    Submitted 3 July, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  5. arXiv:2307.07050  [pdf, other

    physics.comp-ph cs.LG physics.chem-ph

    Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the Quantum Many-Body Schrödinger Equation

    Authors: Kirill Neklyudov, Jannes Nys, Luca Thiede, Juan Carrasquilla, Qiang Liu, Max Welling, Alireza Makhzani

    Abstract: Solving the quantum many-body Schrödinger equation is a fundamental and challenging problem in the fields of quantum physics, quantum chemistry, and material sciences. One of the common computational approaches to this problem is Quantum Variational Monte Carlo (QVMC), in which ground-state solutions are obtained by minimizing the energy of the system within a restricted family of parameterized wa… ▽ More

    Submitted 26 October, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: Published in NeurIPS 2023

  6. arXiv:2305.09705  [pdf, other

    cs.LG cs.IT

    Random Edge Coding: One-Shot Bits-Back Coding of Large Labeled Graphs

    Authors: Daniel Severo, James Townsend, Ashish Khisti, Alireza Makhzani

    Abstract: We present a one-shot method for compressing large labeled graphs called Random Edge Coding. When paired with a parameter-free model based on Pólya's Urn, the worst-case computational and memory complexities scale quasi-linearly and linearly with the number of observed edges, making it efficient on sparse graphs, and requires only integer arithmetic. Key to our method is bits-back coding, which is… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: Published at ICML 2023

  7. arXiv:2303.06992  [pdf, other

    cs.LG stat.ML

    Improving Mutual Information Estimation with Annealed and Energy-Based Bounds

    Authors: Rob Brekelmans, Sicong Huang, Marzyeh Ghassemi, Greg Ver Steeg, Roger Grosse, Alireza Makhzani

    Abstract: Mutual information (MI) is a fundamental quantity in information theory and machine learning. However, direct estimation of MI is intractable, even if the true joint probability density for the variables of interest is known, as it involves estimating a potentially high-dimensional log partition function. In this work, we present a unifying view of existing MI bounds from the perspective of import… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: A shorter version appeared in the International Conference on Learning Representations (ICLR) 2022

    Journal ref: ICLR 2022 https://openreview.net/forum?id=T0B9AoM_bFg

  8. arXiv:2301.08292  [pdf, other

    quant-ph cs.LG

    Quantum HyperNetworks: Training Binary Neural Networks in Quantum Superposition

    Authors: Juan Carrasquilla, Mohamed Hibat-Allah, Estelle Inack, Alireza Makhzani, Kirill Neklyudov, Graham W. Taylor, Giacomo Torlai

    Abstract: Binary neural networks, i.e., neural networks whose parameters and activations are constrained to only two possible values, offer a compelling avenue for the deployment of deep learning models on energy- and memory-limited devices. However, their training, architectural design, and hyperparameter tuning remain challenging as these involve multiple computationally expensive combinatorial optimizati… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

    Comments: 10 pages, 6 figures. Minimal implementation: https://github.com/carrasqu/binncode

  9. arXiv:2210.06662  [pdf, other

    cs.LG

    Action Matching: Learning Stochastic Dynamics from Samples

    Authors: Kirill Neklyudov, Rob Brekelmans, Daniel Severo, Alireza Makhzani

    Abstract: Learning the continuous dynamics of a system from snapshots of its temporal marginals is a problem which appears throughout natural sciences and machine learning, including in quantum systems, single-cell biological data, and generative modeling. In these settings, we assume access to cross-sectional samples that are uncorrelated over time, rather than full trajectories of samples. In order to bet… ▽ More

    Submitted 8 June, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Published in ICML 2023

  10. arXiv:2201.10787  [pdf, other

    cs.LG cs.CR

    Variational Model Inversion Attacks

    Authors: Kuan-Chieh Wang, Yan Fu, Ke Li, Ashish Khisti, Richard Zemel, Alireza Makhzani

    Abstract: Given the ubiquity of deep neural networks, it is important that these models do not reveal information about sensitive data that they have been trained on. In model inversion attacks, a malicious user attempts to recover the private dataset used to train a supervised neural network. A successful model inversion attack should generate realistic and diverse samples that accurately describe each of… ▽ More

    Submitted 26 January, 2022; originally announced January 2022.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  11. arXiv:2107.09202  [pdf, other

    cs.IT cs.LG eess.SP

    Compressing Multisets with Large Alphabets using Bits-Back Coding

    Authors: Daniel Severo, James Townsend, Ashish Khisti, Alireza Makhzani, Karen Ullrich

    Abstract: Current methods which compress multisets at an optimal rate have computational complexity that scales linearly with alphabet size, making them too slow to be practical in many real-world settings. We show how to convert a compression algorithm for sequences into one for multisets, in exchange for an additional complexity term that is quasi-linear in sequence length. This allows us to compress mult… ▽ More

    Submitted 27 February, 2023; v1 submitted 15 July, 2021; originally announced July 2021.

    Journal ref: IEEE Journal on Selected Areas in Information Theory, 2023

  12. arXiv:2102.11086  [pdf, other

    cs.LG cs.AI cs.IT stat.CO

    Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding

    Authors: Yangjun Ruan, Karen Ullrich, Daniel Severo, James Townsend, Ashish Khisti, Arnaud Doucet, Alireza Makhzani, Chris J. Maddison

    Abstract: Latent variable models have been successfully applied in lossless compression with the bits-back coding algorithm. However, bits-back suffers from an increase in the bitrate equal to the KL divergence between the approximate posterior and the true posterior. In this paper, we show how to remove this gap asymptotically by deriving bits-back coding algorithms from tighter variational bounds. The key… ▽ More

    Submitted 14 June, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

  13. arXiv:2012.15480  [pdf, other

    cs.LG cs.IT stat.ML

    Likelihood Ratio Exponential Families

    Authors: Rob Brekelmans, Frank Nielsen, Alireza Makhzani, Aram Galstyan, Greg Ver Steeg

    Abstract: The exponential family is well known in machine learning and statistical physics as the maximum entropy distribution subject to a set of observed constraints, while the geometric mixture path is common in MCMC methods such as annealed importance sampling. Linking these two ideas, recent work has interpreted the geometric mixture path as an exponential family of distributions to analyze the thermod… ▽ More

    Submitted 15 January, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: NeurIPS Workshop on Deep Learning through Information Geometry

  14. arXiv:2008.06653  [pdf, other

    cs.LG stat.ML

    Evaluating Lossy Compression Rates of Deep Generative Models

    Authors: Sicong Huang, Alireza Makhzani, Yanshuai Cao, Roger Grosse

    Abstract: The field of deep generative modeling has succeeded in producing astonishingly realistic-seeming images and audio, but quantitative evaluation remains a challenge. Log-likelihood is an appealing metric due to its grounding in statistics and information theory, but it can be challenging to estimate for implicit generative models, and scalar-valued metrics give an incomplete picture of a model's qua… ▽ More

    Submitted 15 August, 2020; originally announced August 2020.

  15. arXiv:1805.09804  [pdf, other

    cs.LG stat.ML

    Implicit Autoencoders

    Authors: Alireza Makhzani

    Abstract: In this paper, we describe the "implicit autoencoder" (IAE), a generative autoencoder in which both the generative path and the recognition path are parametrized by implicit distributions. We use two generative adversarial networks to define the reconstruction and the regularization cost functions of the implicit autoencoder, and derive the learning rules based on maximum-likelihood learning. Usin… ▽ More

    Submitted 6 February, 2019; v1 submitted 24 May, 2018; originally announced May 2018.

  16. arXiv:1708.04782  [pdf, other

    cs.LG cs.AI

    StarCraft II: A New Challenge for Reinforcement Learning

    Authors: Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, John Quan, Stephen Gaffney, Stig Petersen, Karen Simonyan, Tom Schaul, Hado van Hasselt, David Silver, Timothy Lillicrap, Kevin Calderone, Paul Keet, Anthony Brunasso, David Lawrence, Anders Ekermo, Jacob Repp, Rodney Tsing

    Abstract: This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially o… ▽ More

    Submitted 16 August, 2017; originally announced August 2017.

    Comments: Collaboration between DeepMind & Blizzard. 20 pages, 9 figures, 2 tables

  17. arXiv:1706.00531  [pdf, other

    cs.LG

    PixelGAN Autoencoders

    Authors: Alireza Makhzani, Brendan Frey

    Abstract: In this paper, we describe the "PixelGAN autoencoder", a generative autoencoder in which the generative path is a convolutional autoregressive neural network on pixels (PixelCNN) that is conditioned on a latent code, and the recognition path uses a generative adversarial network (GAN) to impose a prior distribution on the latent code. We show that different priors result in different decomposition… ▽ More

    Submitted 1 June, 2017; originally announced June 2017.

  18. arXiv:1511.05644  [pdf, other

    cs.LG

    Adversarial Autoencoders

    Authors: Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Brendan Frey

    Abstract: In this paper, we propose the "adversarial autoencoder" (AAE), which is a probabilistic autoencoder that uses the recently proposed generative adversarial networks (GAN) to perform variational inference by matching the aggregated posterior of the hidden code vector of the autoencoder with an arbitrary prior distribution. Matching the aggregated posterior to the prior ensures that generating from a… ▽ More

    Submitted 24 May, 2016; v1 submitted 17 November, 2015; originally announced November 2015.

  19. arXiv:1409.2752  [pdf, ps, other

    cs.LG cs.NE

    Winner-Take-All Autoencoders

    Authors: Alireza Makhzani, Brendan Frey

    Abstract: In this paper, we propose a winner-take-all method for learning hierarchical sparse representations in an unsupervised fashion. We first introduce fully-connected winner-take-all autoencoders which use mini-batch statistics to directly enforce a lifetime sparsity in the activations of the hidden units. We then propose the convolutional winner-take-all autoencoder which combines the benefits of con… ▽ More

    Submitted 7 June, 2015; v1 submitted 9 September, 2014; originally announced September 2014.

  20. arXiv:1312.5663  [pdf, ps, other

    cs.LG

    k-Sparse Autoencoders

    Authors: Alireza Makhzani, Brendan Frey

    Abstract: Recently, it has been observed that when representations are learnt in a way that encourages sparsity, improved performance is obtained on classification tasks. These methods involve combinations of activation functions, sampling steps and different kinds of penalties. To investigate the effectiveness of sparsity by itself, we propose the k-sparse autoencoder, which is an autoencoder with linear a… ▽ More

    Submitted 22 March, 2014; v1 submitted 19 December, 2013; originally announced December 2013.