Zum Hauptinhalt springen

Showing 1–19 of 19 results for author: Evci, U

.
  1. arXiv:2402.04744  [pdf, other

    cs.LG cs.AR

    Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers

    Authors: Abhimanyu Rajeshkumar Bambhaniya, Amir Yazdanbakhsh, Suvinay Subramanian, Sheng-Chun Kao, Shivani Agrawal, Utku Evci, Tushar Krishna

    Abstract: N:M Structured sparsity has garnered significant interest as a result of relatively modest overhead and improved efficiency. Additionally, this form of sparsity holds considerable appeal for reducing the memory footprint owing to their modest representation overhead. There have been efforts to develop training recipes for N:M structured sparsity, they primarily focus on low-sparsity regions (… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 18 pages, 8 figures, 17 tables. Code is available at https://github.com/abhibambhaniya/progressive_gradient_flow_nm_sparsity

  2. arXiv:2309.08520  [pdf, other

    cs.LG

    Scaling Laws for Sparsely-Connected Foundation Models

    Authors: Elias Frantar, Carlos Riquelme, Neil Houlsby, Dan Alistarh, Utku Evci

    Abstract: We explore the impact of parameter sparsity on the scaling behavior of Transformers trained on massive datasets (i.e., "foundation models"), in both vision and language domains. In this setting, we identify the first scaling law describing the relationship between weight sparsity, number of non-zero parameters, and amount of training data, which we validate empirically across model and data scales… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  3. arXiv:2305.02299  [pdf, other

    cs.LG cs.CV

    Dynamic Sparse Training with Structured Sparsity

    Authors: Mike Lasby, Anna Golubeva, Utku Evci, Mihai Nica, Yani Ioannou

    Abstract: Dynamic Sparse Training (DST) methods achieve state-of-the-art results in sparse neural network training, matching the generalization of dense models while enabling sparse training and inference. Although the resulting models are highly sparse and theoretically less computationally expensive, achieving speedups with unstructured sparsity on real-world hardware is challenging. In this work, we prop… ▽ More

    Submitted 21 February, 2024; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: ICLR 2024, 29 pages, 22 figures

  4. arXiv:2304.14082  [pdf, other

    cs.LG cs.SE

    JaxPruner: A concise library for sparsity research

    Authors: Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci

    Abstract: This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the… ▽ More

    Submitted 18 December, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: Jaxpruner is hosted at http://github.com/google-research/jaxpruner

  5. arXiv:2302.12902  [pdf, other

    cs.LG

    The Dormant Neuron Phenomenon in Deep Reinforcement Learning

    Authors: Ghada Sokar, Rishabh Agarwal, Pablo Samuel Castro, Utku Evci

    Abstract: In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective me… ▽ More

    Submitted 13 June, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: Oral at ICML 2023

  6. arXiv:2302.05442  [pdf, other

    cs.CV cs.AI cs.LG

    Scaling Vision Transformers to 22 Billion Parameters

    Authors: Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver , et al. (17 additional authors not shown)

    Abstract: The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modelling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters (Chen et al… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

  7. arXiv:2209.07617  [pdf, other

    cs.LG cs.AI cs.AR cs.PF

    Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask

    Authors: Sheng-Chun Kao, Amir Yazdanbakhsh, Suvinay Subramanian, Shivani Agrawal, Utku Evci, Tushar Krishna

    Abstract: Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DNNs). Among different categories of sparsity, structured sparsity has gained more attention due to its efficient execution on modern accelerators. Particularly, N:M sparsity is attractive because there are already hardware accelerator architectures that can leverage certain forms of N:M structured sp… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: 11 pages, 2 figures, and 9 tables. Published at the ICML Workshop on Sparsity in Neural Networks Advancing Understanding and Practice, 2022. First two authors contributed equally

  8. arXiv:2206.10369  [pdf, other

    cs.LG cs.AI

    The State of Sparse Training in Deep Reinforcement Learning

    Authors: Laura Graesser, Utku Evci, Erich Elsen, Pablo Samuel Castro

    Abstract: The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision. Their appeal stems largely from the reduced number of parameters required to train and store, as well as in an increase in learning efficiency. Somewhat surprisingly, there have been very few efforts exploring their use in Deep Reinforcement Learning (DRL). In this work we perform a systematic… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Proceedings of the 39th International Conference on Machine Learning (ICML'22)

  9. arXiv:2201.05125  [pdf, other

    cs.LG cs.CV

    GradMax: Growing Neural Networks using Gradient Information

    Authors: Utku Evci, Bart van Merriënboer, Thomas Unterthiner, Max Vladymyrov, Fabian Pedregosa

    Abstract: The architecture and the parameters of neural networks are often optimized independently, which requires costly retraining of the parameters whenever the architecture is modified. In this work we instead focus on growing the architecture without requiring costly retraining. We present a method that adds new neurons during training without impacting what is already learned, while improving the trai… ▽ More

    Submitted 7 June, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

    Comments: ICLR 2022

    Journal ref: International Conference on Learning Representations, 2022

  10. arXiv:2201.03529  [pdf, other

    cs.LG cs.CV

    Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning

    Authors: Utku Evci, Vincent Dumoulin, Hugo Larochelle, Michael C. Mozer

    Abstract: Transfer-learning methods aim to improve performance in a data-scarce target domain using a model pretrained on a data-rich source domain. A cost-efficient strategy, linear probing, involves freezing the source model and training a new classification head for the target domain. This strategy is outperformed by a more costly but state-of-the-art method -- fine-tuning all parameters of the source mo… ▽ More

    Submitted 25 July, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: presented at ICML 2022 (Oral)

    Journal ref: ICML 2022, Proceedings of the 39th International Conference on Machine Learning

  11. arXiv:2104.02638  [pdf, other

    cs.LG cs.CV

    Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark

    Authors: Vincent Dumoulin, Neil Houlsby, Utku Evci, Xiaohua Zhai, Ross Goroshin, Sylvain Gelly, Hugo Larochelle

    Abstract: Meta and transfer learning are two successful families of approaches to few-shot learning. Despite highly related goals, state-of-the-art advances in each family are measured largely in isolation of each other. As a result of diverging evaluation norms, a direct or thorough comparison of different approaches is challenging. To bridge this gap, we perform a cross-family study of the best transfer a… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

  12. arXiv:2010.03533  [pdf, other

    cs.LG cs.CV

    Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

    Authors: Utku Evci, Yani A. Ioannou, Cem Keskin, Yann Dauphin

    Abstract: Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and also have the potential to enable efficient training. However, naively training unstructured sparse NNs from random initialization results in significantly worse generalization, with the notable exceptions of Lottery Tickets (LTs) and Dynamic Sparse Training (DST). Thro… ▽ More

    Submitted 15 March, 2022; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: Published in AAAI 2022. Code can be found at https://github.com/google-research/rigl/tree/master/rigl/rigl_tf2

    MSC Class: 68T07

  13. arXiv:2006.07232  [pdf, other

    cs.LG cs.NE stat.ML

    A Practical Sparse Approximation for Real Time Recurrent Learning

    Authors: Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

    Abstract: Current methods for training recurrent neural networks are based on backpropagation through time, which requires storing a complete history of network states, and prohibits updating the weights `online' (after every timestep). Real Time Recurrent Learning (RTRL) eliminates the need for history storage and allows for online weight updates, but does so at the expense of computational costs that are… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  14. arXiv:1911.11134  [pdf, other

    cs.LG cs.CV stat.ML

    Rigging the Lottery: Making All Tickets Winners

    Authors: Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen

    Abstract: Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and… ▽ More

    Submitted 23 July, 2021; v1 submitted 25 November, 2019; originally announced November 2019.

    Comments: Published in Proceedings of the 37th International Conference on Machine Learning. Code can be found in github.com/google-research/rigl

    Journal ref: Proceedings of the 37th International Conference on Machine Learning (2020) 471-481

  15. arXiv:1907.01041  [pdf

    cs.CL cs.LG

    Natural Language Understanding with the Quora Question Pairs Dataset

    Authors: Lakshay Sharma, Laura Graesser, Nikita Nangia, Utku Evci

    Abstract: This paper explores the task Natural Language Understanding (NLU) by looking at duplicate question detection in the Quora dataset. We conducted extensive exploration of the dataset and used various machine learning models, including linear and tree-based models. Our final finding was that a simple Continuous Bag of Words neural network model had the best performance, outdoing more complicated recu… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

  16. arXiv:1906.10732  [pdf, other

    cs.LG cs.CV stat.ML

    The Difficulty of Training Sparse Neural Networks

    Authors: Utku Evci, Fabian Pedregosa, Aidan Gomez, Erich Elsen

    Abstract: We investigate the difficulties of training sparse neural networks and make new observations about optimization dynamics and the energy landscape within the sparse regime. Recent work of \citep{Gale2019, Liu2018} has shown that sparse ResNet-50 architectures trained on ImageNet-2012 dataset converge to solutions that are significantly worse than those found by pruning. We show that, despite the fa… ▽ More

    Submitted 7 October, 2020; v1 submitted 25 June, 2019; originally announced June 2019.

    Comments: sparse networks, pruning, energy landscape, sparsity

  17. arXiv:1903.03096  [pdf, other

    cs.LG stat.ML

    Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples

    Authors: Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Utku Evci, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, Hugo Larochelle

    Abstract: Few-shot classification refers to learning a classifier for new classes given only a few examples. While a plethora of models have emerged to tackle it, we find the procedure and datasets that are used to assess their progress lacking. To address this limitation, we propose Meta-Dataset: a new benchmark for training and evaluating models that is large-scale, consists of diverse datasets, and prese… ▽ More

    Submitted 8 April, 2020; v1 submitted 7 March, 2019; originally announced March 2019.

    Comments: Code available at https://github.com/google-research/meta-dataset

    Journal ref: International Conference on Learning Representations (2020)

  18. arXiv:1806.06068  [pdf, other

    cs.LG stat.ML

    Detecting Dead Weights and Units in Neural Networks

    Authors: Utku Evci

    Abstract: Deep Neural Networks are highly over-parameterized and the size of the neural networks can be reduced significantly after training without any decrease in performance. One can clearly see this phenomenon in a wide range of architectures trained for various problems. Weight/channel pruning, distillation, quantization, matrix factorization are some of the main methods one can use to remove the redun… ▽ More

    Submitted 15 June, 2018; originally announced June 2018.

    Comments: M.Sc. thesis

  19. arXiv:1706.04454  [pdf, other

    cs.LG

    Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

    Authors: Levent Sagun, Utku Evci, V. Ugur Guney, Yann Dauphin, Leon Bottou

    Abstract: We study the properties of common loss surfaces through their Hessian matrix. In particular, in the context of deep learning, we empirically show that the spectrum of the Hessian is composed of two parts: (1) the bulk centered near zero, (2) and outliers away from the bulk. We present numerical evidence and mathematical justifications to the following conjectures laid out by Sagun et al. (2016): F… ▽ More

    Submitted 7 May, 2018; v1 submitted 14 June, 2017; originally announced June 2017.

    Comments: Minor update for ICLR 2018 Workshop Track presentation