Search | arXiv e-print repository

arXiv:2407.11991 [pdf, other]

Inspired by AI? A Novel Generative AI System To Assist Conceptual Automotive Design

Authors: Ye Wang, Nicole B. Damen, Thomas Gale, Voho Seo, Hooman Shayani

Abstract: Design inspiration is crucial for establishing the direction of a design as well as evoking feelings and conveying meanings during the conceptual design process. Many practice designers use text-based searches on platforms like Pinterest to gather image ideas, followed by sketching on paper or using digital tools to develop concepts. Emerging generative AI techniques, such as diffusion models, off… ▽ More Design inspiration is crucial for establishing the direction of a design as well as evoking feelings and conveying meanings during the conceptual design process. Many practice designers use text-based searches on platforms like Pinterest to gather image ideas, followed by sketching on paper or using digital tools to develop concepts. Emerging generative AI techniques, such as diffusion models, offer a promising avenue to streamline these processes by swiftly generating design concepts based on text and image inspiration inputs, subsequently using the AI generated design concepts as fresh sources of inspiration for further concept development. However, applying these generative AI techniques directly within a design context has challenges. Firstly, generative AI tools may exhibit a bias towards particular styles, resulting in a lack of diversity of design outputs. Secondly, these tools may struggle to grasp the nuanced meanings of texts or images in a design context. Lastly, the lack of integration with established design processes within design teams can result in fragmented use scenarios. Focusing on these challenges, we conducted workshops, surveys, and data augmentation involving teams of experienced automotive designers to investigate their current practices in generating concepts inspired by texts and images, as well as their preferred interaction modes for generative AI systems to support the concept generation workflow. Finally, we developed a novel generative AI system based on diffusion models to assist conceptual automotive design. △ Less

Submitted 6 June, 2024; originally announced July 2024.

Journal ref: IDETC 2024

arXiv:2405.16883 [pdf, other]

Scorch: A Library for Sparse Deep Learning

Authors: Bobby Yan, Alexander J. Root, Trevor Gale, David Broman, Fredrik Kjolstad

Abstract: The rapid growth in the size of deep learning models strains the capabilities of traditional dense computation paradigms. Leveraging sparse computation has become increasingly popular for training and deploying large-scale models, but existing deep learning frameworks lack extensive support for sparse operations. To bridge this gap, we introduce Scorch, a library that seamlessly integrates efficie… ▽ More The rapid growth in the size of deep learning models strains the capabilities of traditional dense computation paradigms. Leveraging sparse computation has become increasingly popular for training and deploying large-scale models, but existing deep learning frameworks lack extensive support for sparse operations. To bridge this gap, we introduce Scorch, a library that seamlessly integrates efficient sparse tensor computation into the PyTorch ecosystem, with an initial focus on inference workloads on CPUs. Scorch provides a flexible and intuitive interface for sparse tensors, supporting diverse sparse data structures. Scorch introduces a compiler stack that automates key optimizations, including automatic loop ordering, tiling, and format inference. Combined with a runtime that adapts its execution to both dense and sparse data, Scorch delivers substantial speedups over hand-written PyTorch Sparse (torch.sparse) operations without sacrificing usability. More importantly, Scorch enables efficient computation of complex sparse operations that lack hand-optimized PyTorch implementations. This flexibility is crucial for exploring novel sparse architectures. We demonstrate Scorch's ease of use and performance gains on diverse deep learning models across multiple domains. With only minimal code changes, Scorch achieves 1.05-5.78x speedups over PyTorch Sparse on end-to-end tasks. Scorch's seamless integration and performance gains make it a valuable addition to the PyTorch ecosystem. We believe Scorch will enable wider exploration of sparsity as a tool for scaling deep learning and inform the development of other sparse libraries. △ Less

Submitted 20 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

Comments: 25 pages, 8 figures

arXiv:2404.07839 [pdf, other]

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

Abstract: We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-tr… ▽ More We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens. △ Less

Submitted 28 August, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

arXiv:2304.14082 [pdf, other]

JaxPruner: A concise library for sparsity research

Authors: Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci

Abstract: This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the… ▽ More This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks. △ Less

Submitted 18 December, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

Comments: Jaxpruner is hosted at http://github.com/google-research/jaxpruner

arXiv:2211.15841 [pdf, other]

MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

Authors: Trevor Gale, Deepak Narayanan, Cliff Young, Matei Zaharia

Abstract: We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs. Our system is motivated by the limitations of current frameworks, which restrict the dynamic routing in MoE layers to satisfy the constraints of existing software and hardware. These formulations force a tradeoff between model quality and hardware efficiency, as users must choose between dropping tokens from t… ▽ More We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs. Our system is motivated by the limitations of current frameworks, which restrict the dynamic routing in MoE layers to satisfy the constraints of existing software and hardware. These formulations force a tradeoff between model quality and hardware efficiency, as users must choose between dropping tokens from the computation or wasting computation and memory on padding. To address these limitations, we reformulate MoE computation in terms of block-sparse operations and develop new block-sparse GPU kernels that efficiently handle the dynamism present in MoEs. Our approach never drops tokens and maps efficiently to modern hardware, enabling end-to-end training speedups of up to 40% over MoEs trained with the state-of-the-art Tutel library and 2.4x over DNNs trained with the highly-optimized Megatron-LM framework. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2108.07258 [pdf, other]

On the Opportunities and Risks of Foundation Models

Authors: Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh , et al. (89 additional authors not shown)

Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their cap… ▽ More AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature. △ Less

Submitted 12 July, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Report page with citation guidelines: https://crfm.stanford.edu/report.html

arXiv:2006.10901 [pdf, other]

Sparse GPU Kernels for Deep Learning

Authors: Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen

Abstract: Scientific workloads have traditionally exploited high levels of sparsity to accelerate computation and reduce memory requirements. While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because these applications have relatively moderate levels of sparsity that are not sufficient for existing sparse kernels to outperform their dense counterparts. In this… ▽ More Scientific workloads have traditionally exploited high levels of sparsity to accelerate computation and reduce memory requirements. While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because these applications have relatively moderate levels of sparsity that are not sufficient for existing sparse kernels to outperform their dense counterparts. In this work, we study sparse matrices from deep learning applications and identify favorable properties that can be exploited to accelerate computation. Based on these insights, we develop high-performance GPU kernels for two sparse matrix operations widely applicable in neural networks: sparse matrix-dense matrix multiplication and sampled dense-dense matrix multiplication. Our kernels reach 27% of single-precision peak on Nvidia V100 GPUs. Using our kernels, we demonstrate sparse Transformer and MobileNet models that achieve 1.2-2.1x speedups and up to 12.8x memory savings without sacrificing accuracy. △ Less

Submitted 31 August, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

Comments: Updated to match camera-ready for SC20

arXiv:1911.11134 [pdf, other]

Rigging the Lottery: Making All Tickets Winners

Authors: Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen

Abstract: Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and… ▽ More Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods. Our method updates the topology of the sparse network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results on a variety of networks and datasets, including ResNet-50, MobileNets on Imagenet-2012, and RNNs on WikiText-103. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static. Code used in our work can be found in github.com/google-research/rigl. △ Less

Submitted 23 July, 2021; v1 submitted 25 November, 2019; originally announced November 2019.

Comments: Published in Proceedings of the 37th International Conference on Machine Learning. Code can be found in github.com/google-research/rigl

Journal ref: Proceedings of the 37th International Conference on Machine Learning (2020) 471-481

arXiv:1911.09723 [pdf, other]

Fast Sparse ConvNets

Authors: Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan

Abstract: Historically, the pursuit of efficient inference has been one of the driving forces behind research into new deep learning architectures and building blocks. Some recent examples include: the squeeze-and-excitation module, depthwise separable convolutions in Xception, and the inverted bottleneck in MobileNet v2. Notably, in all of these cases, the resulting building blocks enabled not only higher… ▽ More Historically, the pursuit of efficient inference has been one of the driving forces behind research into new deep learning architectures and building blocks. Some recent examples include: the squeeze-and-excitation module, depthwise separable convolutions in Xception, and the inverted bottleneck in MobileNet v2. Notably, in all of these cases, the resulting building blocks enabled not only higher efficiency, but also higher accuracy, and found wide adoption in the field. In this work, we further expand the arsenal of efficient building blocks for neural network architectures; but instead of combining standard primitives (such as convolution), we advocate for the replacement of these dense primitives with their sparse counterparts. While the idea of using sparsity to decrease the parameter count is not new, the conventional wisdom is that this reduction in theoretical FLOPs does not translate into real-world efficiency gains. We aim to correct this misconception by introducing a family of efficient sparse kernels for ARM and WebAssembly, which we open-source for the benefit of the community as part of the XNNPACK library. Equipped with our efficient implementation of sparse primitives, we show that sparse versions of MobileNet v1, MobileNet v2 and EfficientNet architectures substantially outperform strong dense baselines on the efficiency-accuracy curve. On Snapdragon 835 our sparse networks outperform their dense equivalents by $1.3-2.4\times$ -- equivalent to approximately one entire generation of MobileNet-family improvement. We hope that our findings will facilitate wider adoption of sparsity as a tool for creating efficient and accurate deep learning architectures. △ Less

Submitted 21 November, 2019; originally announced November 2019.

arXiv:1902.09574 [pdf, other]

The State of Sparsity in Deep Neural Networks

Authors: Trevor Gale, Erich Elsen, Sara Hooker

Abstract: We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet. Across thousands of experiments, we demonstrate that complex techniques (Molchanov et al., 2017; Louizos et al., 2017b) shown to yield high compression rates on smaller dataset… ▽ More We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet. Across thousands of experiments, we demonstrate that complex techniques (Molchanov et al., 2017; Louizos et al., 2017b) shown to yield high compression rates on smaller datasets perform inconsistently, and that simple magnitude pruning approaches achieve comparable or better results. Additionally, we replicate the experiments performed by (Frankle & Carbin, 2018) and (Liu et al., 2018) at scale and show that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization. Together, these results highlight the need for large-scale benchmarks in the field of model compression. We open-source our code, top performing model checkpoints, and results of all hyperparameter configurations to establish rigorous baselines for future work on compression and sparsification. △ Less

Submitted 25 February, 2019; originally announced February 2019.

arXiv:1804.05019 [pdf, other]

Automatic Detection and Query of Wireless Spectrum Events from Streaming Data

Authors: Carolina Fortuna, Timotej Gale, Tomaz Solc, Mihael Mohorcic

Abstract: Several alternatives for more efficient spectrum management have been proposed over the last decade, resulting in new techniques for automatic wideband spectrum sensing. However, while spectrum sensing technology is important, understanding, using and taking actions on this data for better spectrum and network resource management is at least equally important. In this paper, we propose a system th… ▽ More Several alternatives for more efficient spectrum management have been proposed over the last decade, resulting in new techniques for automatic wideband spectrum sensing. However, while spectrum sensing technology is important, understanding, using and taking actions on this data for better spectrum and network resource management is at least equally important. In this paper, we propose a system that is able to automatically detect wireless spectrum events from streaming spectrum sensing data, and enables the consumption of the events as they are produced, as a statistical report or on a per-query basis. The proposed system is referred to as spectrum streamer and is wireless technology agnostic, scalable, able to deliver actionable information to humans and machines and also enables application development by custom querying of the detected events. △ Less

Submitted 13 April, 2018; originally announced April 2018.

Comments: 11 pages, 8 figures, 2 tables, 5 listings, Submitted to an IEEE journal

arXiv:1611.07819 [pdf, other]

dMath: Distributed Linear Algebra for DL

Authors: Steven Eliuk, Cameron Upright, Hars Vardhan, Stephen Walsh, Trevor Gale

Abstract: The paper presents a parallel math library, dMath, that demonstrates leading scaling when using intranode, internode, and hybrid-parallelism for deep learning (DL). dMath provides easy-to-use distributed primitives and a variety of domain-specific algorithms including matrix multiplication, convolutions, and others allowing for rapid development of scalable applications like deep neural networks (… ▽ More The paper presents a parallel math library, dMath, that demonstrates leading scaling when using intranode, internode, and hybrid-parallelism for deep learning (DL). dMath provides easy-to-use distributed primitives and a variety of domain-specific algorithms including matrix multiplication, convolutions, and others allowing for rapid development of scalable applications like deep neural networks (DNNs). Persistent data stored in GPU memory and advanced memory management techniques avoid costly transfers between host and device. dMath delivers performance, portability, and productivity to its specific domain of support. △ Less

Submitted 18 November, 2016; originally announced November 2016.

Comments: 5 pages. arXiv admin note: text overlap with arXiv:1604.01416

arXiv:1602.03493 [pdf, other]

doi 10.1109/CISS.2016.7460563

Winning versus losing during gambling and its neural correlates

Authors: Pierre Sacré, Matthew S. D. Kerr, Sandya Subramanian, Kevin Kahn, Jorge Gonzalez-Martinez, Matthew A. Johnson, John T. Gale, Sridevi V. Sarma

Abstract: Humans often make decisions which maximize an internal utility function. For example, humans often maximize their expected reward when gambling and this is considered as a "rational" decision. However, humans tend to change their betting strategies depending on how they "feel". If someone has experienced a losing streak, they may "feel" that they are more likely to win on the next hand even though… ▽ More Humans often make decisions which maximize an internal utility function. For example, humans often maximize their expected reward when gambling and this is considered as a "rational" decision. However, humans tend to change their betting strategies depending on how they "feel". If someone has experienced a losing streak, they may "feel" that they are more likely to win on the next hand even though the odds of the game have not changed. That is, their decisions are driven by their emotional state. In this paper, we investigate how the human brain responds to wins and losses during gambling. Using a combination of local field potential recordings in human subjects performing a financial decision-making task, spectral analyses, and non-parametric cluster statistics, we investigated whether neural responses in different cognitive and limbic brain areas differ between wins and losses after decisions are made. In eleven subjects, the neural activity modulated significantly between win and loss trials in one brain region: the anterior insula ($p=0.01$). In particular, gamma activity (30-70 Hz) increased in the anterior insula when subjects just realized that they won. Modulation of metabolic activity in the anterior insula has been observed previously in functional magnetic resonance imaging studies during decision making and when emotions are elicited. However, our study is able to characterize temporal dynamics of electrical activity in this brain region at the millisecond resolution while decisions are made and after outcomes are revealed. △ Less

Submitted 10 February, 2016; originally announced February 2016.

Showing 1–13 of 13 results for author: Gale, T