Zum Hauptinhalt springen

Showing 1–13 of 13 results for author: Gale, T

.
  1. arXiv:2407.11991  [pdf, other

    cs.HC cs.AI

    Inspired by AI? A Novel Generative AI System To Assist Conceptual Automotive Design

    Authors: Ye Wang, Nicole B. Damen, Thomas Gale, Voho Seo, Hooman Shayani

    Abstract: Design inspiration is crucial for establishing the direction of a design as well as evoking feelings and conveying meanings during the conceptual design process. Many practice designers use text-based searches on platforms like Pinterest to gather image ideas, followed by sketching on paper or using digital tools to develop concepts. Emerging generative AI techniques, such as diffusion models, off… ▽ More

    Submitted 6 June, 2024; originally announced July 2024.

    Journal ref: IDETC 2024

  2. arXiv:2405.16883  [pdf, other

    cs.LG cs.AI cs.MS cs.PL

    Scorch: A Library for Sparse Deep Learning

    Authors: Bobby Yan, Alexander J. Root, Trevor Gale, David Broman, Fredrik Kjolstad

    Abstract: The rapid growth in the size of deep learning models strains the capabilities of traditional dense computation paradigms. Leveraging sparse computation has become increasingly popular for training and deploying large-scale models, but existing deep learning frameworks lack extensive support for sparse operations. To bridge this gap, we introduce Scorch, a library that seamlessly integrates efficie… ▽ More

    Submitted 20 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: 25 pages, 8 figures

  3. arXiv:2404.07839  [pdf, other

    cs.LG cs.AI cs.CL

    RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

    Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

    Abstract: We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-tr… ▽ More

    Submitted 28 August, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  4. arXiv:2304.14082  [pdf, other

    cs.LG cs.SE

    JaxPruner: A concise library for sparsity research

    Authors: Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci

    Abstract: This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the… ▽ More

    Submitted 18 December, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: Jaxpruner is hosted at http://github.com/google-research/jaxpruner

  5. arXiv:2211.15841  [pdf, other

    cs.LG cs.AI cs.DC

    MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

    Authors: Trevor Gale, Deepak Narayanan, Cliff Young, Matei Zaharia

    Abstract: We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs. Our system is motivated by the limitations of current frameworks, which restrict the dynamic routing in MoE layers to satisfy the constraints of existing software and hardware. These formulations force a tradeoff between model quality and hardware efficiency, as users must choose between dropping tokens from t… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  6. arXiv:2108.07258  [pdf, other

    cs.LG cs.AI cs.CY

    On the Opportunities and Risks of Foundation Models

    Authors: Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh , et al. (89 additional authors not shown)

    Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their cap… ▽ More

    Submitted 12 July, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Report page with citation guidelines: https://crfm.stanford.edu/report.html

  7. arXiv:2006.10901  [pdf, other

    cs.LG cs.DC stat.ML

    Sparse GPU Kernels for Deep Learning

    Authors: Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen

    Abstract: Scientific workloads have traditionally exploited high levels of sparsity to accelerate computation and reduce memory requirements. While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because these applications have relatively moderate levels of sparsity that are not sufficient for existing sparse kernels to outperform their dense counterparts. In this… ▽ More

    Submitted 31 August, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Updated to match camera-ready for SC20

  8. arXiv:1911.11134  [pdf, other

    cs.LG cs.CV stat.ML

    Rigging the Lottery: Making All Tickets Winners

    Authors: Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen

    Abstract: Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and… ▽ More

    Submitted 23 July, 2021; v1 submitted 25 November, 2019; originally announced November 2019.

    Comments: Published in Proceedings of the 37th International Conference on Machine Learning. Code can be found in github.com/google-research/rigl

    Journal ref: Proceedings of the 37th International Conference on Machine Learning (2020) 471-481

  9. arXiv:1911.09723  [pdf, other

    cs.CV

    Fast Sparse ConvNets

    Authors: Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan

    Abstract: Historically, the pursuit of efficient inference has been one of the driving forces behind research into new deep learning architectures and building blocks. Some recent examples include: the squeeze-and-excitation module, depthwise separable convolutions in Xception, and the inverted bottleneck in MobileNet v2. Notably, in all of these cases, the resulting building blocks enabled not only higher… ▽ More

    Submitted 21 November, 2019; originally announced November 2019.

  10. arXiv:1902.09574  [pdf, other

    cs.LG stat.ML

    The State of Sparsity in Deep Neural Networks

    Authors: Trevor Gale, Erich Elsen, Sara Hooker

    Abstract: We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet. Across thousands of experiments, we demonstrate that complex techniques (Molchanov et al., 2017; Louizos et al., 2017b) shown to yield high compression rates on smaller dataset… ▽ More

    Submitted 25 February, 2019; originally announced February 2019.

  11. arXiv:1804.05019  [pdf, other

    cs.NI

    Automatic Detection and Query of Wireless Spectrum Events from Streaming Data

    Authors: Carolina Fortuna, Timotej Gale, Tomaz Solc, Mihael Mohorcic

    Abstract: Several alternatives for more efficient spectrum management have been proposed over the last decade, resulting in new techniques for automatic wideband spectrum sensing. However, while spectrum sensing technology is important, understanding, using and taking actions on this data for better spectrum and network resource management is at least equally important. In this paper, we propose a system th… ▽ More

    Submitted 13 April, 2018; originally announced April 2018.

    Comments: 11 pages, 8 figures, 2 tables, 5 listings, Submitted to an IEEE journal

  12. arXiv:1611.07819  [pdf, other

    cs.DC cs.MS cs.NE

    dMath: Distributed Linear Algebra for DL

    Authors: Steven Eliuk, Cameron Upright, Hars Vardhan, Stephen Walsh, Trevor Gale

    Abstract: The paper presents a parallel math library, dMath, that demonstrates leading scaling when using intranode, internode, and hybrid-parallelism for deep learning (DL). dMath provides easy-to-use distributed primitives and a variety of domain-specific algorithms including matrix multiplication, convolutions, and others allowing for rapid development of scalable applications like deep neural networks (… ▽ More

    Submitted 18 November, 2016; originally announced November 2016.

    Comments: 5 pages. arXiv admin note: text overlap with arXiv:1604.01416

  13. Winning versus losing during gambling and its neural correlates

    Authors: Pierre Sacré, Matthew S. D. Kerr, Sandya Subramanian, Kevin Kahn, Jorge Gonzalez-Martinez, Matthew A. Johnson, John T. Gale, Sridevi V. Sarma

    Abstract: Humans often make decisions which maximize an internal utility function. For example, humans often maximize their expected reward when gambling and this is considered as a "rational" decision. However, humans tend to change their betting strategies depending on how they "feel". If someone has experienced a losing streak, they may "feel" that they are more likely to win on the next hand even though… ▽ More

    Submitted 10 February, 2016; originally announced February 2016.