Zum Hauptinhalt springen

Showing 1–27 of 27 results for author: Mirhoseini, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.21787  [pdf, other

    cs.LG cs.AI

    Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

    Authors: Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher Ré, Azalia Mirhoseini

    Abstract: Scaling the amount of compute used to train language models has dramatically improved their capabilities. However, when it comes to inference, we often limit the amount of compute to only one attempt per problem. Here, we explore inference compute as another axis for scaling by increasing the number of generated samples. Across multiple tasks and models, we observe that coverage - the fraction of… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  2. arXiv:2406.03372  [pdf, other

    physics.app-ph cs.LG

    Training of Physical Neural Networks

    Authors: Ali Momeni, Babak Rahmani, Benjamin Scellier, Logan G. Wright, Peter L. McMahon, Clara C. Wanjura, Yuhang Li, Anas Skalli, Natalia G. Berloff, Tatsuhiro Onodera, Ilker Oguz, Francesco Morichetti, Philipp del Hougne, Manuel Le Gallo, Abu Sebastian, Azalia Mirhoseini, Cheng Zhang, Danijela Marković, Daniel Brunner, Christophe Moser, Sylvain Gigan, Florian Marquardt, Aydogan Ozcan, Julie Grollier, Andrea J. Liu , et al. (3 additional authors not shown)

    Abstract: Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 29 pages, 4 figures

  3. arXiv:2405.16755  [pdf, other

    cs.LG cs.AI cs.DB

    CHESS: Contextual Harnessing for Efficient SQL Synthesis

    Authors: Shayan Talaei, Mohammadreza Pourreza, Yu-Chen Chang, Azalia Mirhoseini, Amin Saberi

    Abstract: Utilizing large language models (LLMs) for transforming natural language questions into SQL queries (text-to-SQL) is a promising yet challenging approach, particularly when applied to real-world databases with complex and extensive schemas. In particular, effectively incorporating data catalogs and database values for SQL generation remains an obstacle, leading to suboptimal solutions. We address… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

  4. arXiv:2404.08763  [pdf, other

    cs.LG cs.CL

    CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

    Authors: Je-Yong Lee, Donghyun Lee, Genghan Zhang, Mo Tiwari, Azalia Mirhoseini

    Abstract: Large Language Models (LLMs) have dramatically advanced AI applications, yet their deployment remains challenging due to their immense inference costs. Recent studies ameliorate the computational costs of LLMs by increasing their activation sparsity but suffer from significant performance degradation on downstream tasks. In this work, we introduce a new framework for sparsifying the activations of… ▽ More

    Submitted 26 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  5. arXiv:2402.05099  [pdf, other

    cs.LG

    Hydragen: High-Throughput LLM Inference with Shared Prefixes

    Authors: Jordan Juravsky, Bradley Brown, Ryan Ehrlich, Daniel Y. Fu, Christopher Ré, Azalia Mirhoseini

    Abstract: Transformer-based large language models (LLMs) are now deployed to hundreds of millions of users. LLM inference is commonly performed on batches of sequences that share a prefix, such as few-shot examples or a chatbot system prompt. Decoding in this large-batch setting can be bottlenecked by the attention operation, which reads large key-value (KV) caches from memory and computes inefficient matri… ▽ More

    Submitted 13 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  6. arXiv:2310.13798  [pdf, other

    cs.CL cs.AI

    Specific versus General Principles for Constitutional AI

    Authors: Sandipan Kundu, Yuntao Bai, Saurav Kadavath, Amanda Askell, Andrew Callahan, Anna Chen, Anna Goldie, Avital Balwit, Azalia Mirhoseini, Brayden McLean, Catherine Olsson, Cassie Evraets, Eli Tran-Johnson, Esin Durmus, Ethan Perez, Jackson Kernion, Jamie Kerr, Kamal Ndousse, Karina Nguyen, Nelson Elhage, Newton Cheng, Nicholas Schiefer, Nova DasSarma, Oliver Rausch, Robin Larson , et al. (11 additional authors not shown)

    Abstract: Human feedback can prevent overtly harmful utterances in conversational models, but may not automatically mitigate subtle problematic behaviors such as a stated desire for self-preservation or power. Constitutional AI offers an alternative, replacing human feedback with feedback from AI models conditioned only on a list of written principles. We find this approach effectively prevents the expressi… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  7. arXiv:2307.11031  [pdf, ps, other

    cs.LG cs.CL

    Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification

    Authors: Neel Guha, Mayee F. Chen, Kush Bhatia, Azalia Mirhoseini, Frederic Sala, Christopher Ré

    Abstract: Recent work has shown that language models' (LMs) prompt-based learning capabilities make them well suited for automating data labeling in domains where manual annotation is expensive. The challenge is that while writing an initial prompt is cheap, improving a prompt is costly -- practitioners often require significant labeled data in order to evaluate the impact of prompt modifications. Our work… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: 38 pages, 22 figures, 8 tables

  8. arXiv:2302.07459  [pdf, other

    cs.CL

    The Capacity for Moral Self-Correction in Large Language Models

    Authors: Deep Ganguli, Amanda Askell, Nicholas Schiefer, Thomas I. Liao, Kamilė Lukošiūtė, Anna Chen, Anna Goldie, Azalia Mirhoseini, Catherine Olsson, Danny Hernandez, Dawn Drain, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jackson Kernion, Jamie Kerr, Jared Mueller, Joshua Landau, Kamal Ndousse, Karina Nguyen, Liane Lovitt, Michael Sellitto, Nelson Elhage, Noemi Mercado, Nova DasSarma , et al. (24 additional authors not shown)

    Abstract: We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful outputs -- if instructed to do so. We find strong evidence in support of this hypothesis across three different experiments, each of which reveal different facets of moral self-correction. We find that the capability… ▽ More

    Submitted 18 February, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

  9. arXiv:2212.08073  [pdf, other

    cs.CL cs.AI

    Constitutional AI: Harmlessness from AI Feedback

    Authors: Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite , et al. (26 additional authors not shown)

    Abstract: As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supe… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

  10. arXiv:2211.03540  [pdf, other

    cs.HC cs.AI cs.CL

    Measuring Progress on Scalable Oversight for Large Language Models

    Authors: Samuel R. Bowman, Jeeyoon Hyun, Ethan Perez, Edwin Chen, Craig Pettit, Scott Heiner, Kamilė Lukošiūtė, Amanda Askell, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Christopher Olah, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Jackson Kernion, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse , et al. (21 additional authors not shown)

    Abstract: Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising systems that potentially outperform us on most skills relevant to the task at hand. Empirical work on this problem is not straightforward, since we do not yet have systems that broadly exceed our abilities. This paper discusses one of the major ways we think abou… ▽ More

    Submitted 11 November, 2022; v1 submitted 4 November, 2022; originally announced November 2022.

    Comments: v2 fixes a few typos from v1

  11. arXiv:2201.08821  [pdf, other

    cs.LG

    Representing Long-Range Context for Graph Neural Networks with Global Attention

    Authors: Zhanghao Wu, Paras Jain, Matthew A. Wright, Azalia Mirhoseini, Joseph E. Gonzalez, Ion Stoica

    Abstract: Graph neural networks are powerful architectures for structured datasets. However, current methods struggle to represent long-range dependencies. Scaling the depth or width of GNNs is insufficient to broaden receptive fields as larger GNNs encounter optimization instabilities such as vanishing gradients and representation oversmoothing, while pooling-based approaches have yet to become as universa… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

    Comments: NeurIPS 2021. The first two authors contributed equally to this work

  12. arXiv:2112.04041  [pdf, other

    cs.LG cs.AR

    A Transferable Approach for Partitioning Machine Learning Models on Multi-Chip-Modules

    Authors: Xinfeng Xie, Prakash Prabhu, Ulysse Beaugnon, Phitchaya Mangpo Phothilimthana, Sudip Roy, Azalia Mirhoseini, Eugene Brevdo, James Laudon, Yanqi Zhou

    Abstract: Multi-Chip-Modules (MCMs) reduce the design and fabrication cost of machine learning (ML) accelerators while delivering performance and energy efficiency on par with a monolithic large chip. However, ML compilers targeting MCMs need to solve complex optimization problems optimally and efficiently to achieve this high performance. One such problem is the multi-chip partitioning problem where compil… ▽ More

    Submitted 7 December, 2021; originally announced December 2021.

  13. arXiv:2109.02587  [pdf, ps, other

    cs.LG cs.AI

    Delving into Macro Placement with Reinforcement Learning

    Authors: Zixuan Jiang, Ebrahim Songhori, Shen Wang, Anna Goldie, Azalia Mirhoseini, Joe Jiang, Young-Joon Lee, David Z. Pan

    Abstract: In physical design, human designers typically place macros via trial and error, which is a Markov decision process. Reinforcement learning (RL) methods have demonstrated superhuman performance on the macro placement. In this paper, we propose an extension to this prior work (Mirhoseini et al., 2020). We first describe the details of the policy and value network architecture. We replace the force-d… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: Accepted at 3rd ACM/IEEE Workshop on Machine Learning for CAD (MLCAD)

  14. arXiv:2105.12842  [pdf, other

    cs.LG cs.AR cs.PF

    A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators

    Authors: Dan Zhang, Safeen Huda, Ebrahim Songhori, Kartik Prabhu, Quoc Le, Anna Goldie, Azalia Mirhoseini

    Abstract: The rapidly-changing deep learning landscape presents a unique opportunity for building inference accelerators optimized for specific datacenter-scale workloads. We propose Full-stack Accelerator Search Technique (FAST), a hardware accelerator search framework that defines a broad optimization environment covering key design decisions within the hardware-software stack, including hardware datapath… ▽ More

    Submitted 1 February, 2022; v1 submitted 26 May, 2021; originally announced May 2021.

    Comments: Fixed typo

  15. arXiv:2101.02281  [pdf, other

    cs.CR

    FLAME: Taming Backdoors in Federated Learning (Extended Version 1)

    Authors: Thien Duc Nguyen, Phillip Rieger, Huili Chen, Hossein Yalame, Helen Möllering, Hossein Fereidooni, Samuel Marchal, Markus Miettinen, Azalia Mirhoseini, Shaza Zeitouni, Farinaz Koushanfar, Ahmad-Reza Sadeghi, Thomas Schneider

    Abstract: Federated Learning (FL) is a collaborative machine learning approach allowing participants to jointly train a model without having to share their private, potentially sensitive local datasets with others. Despite its benefits, FL is vulnerable to backdoor attacks, in which an adversary injects manipulated model updates into the model aggregation process so that the resulting model will provide tar… ▽ More

    Submitted 5 August, 2023; v1 submitted 6 January, 2021; originally announced January 2021.

    Comments: This extended version incorporates a novel section (Section 10) that provides a comprehensive analysis of recent proposed attacks, notably "3DFed: Adaptive and extensible framework for covert backdoor attack in federated learning" by Li et al. This new section addresses flawed assertions made in the papers that aim to bypass FLAME or misinterpreted its fundamental design principles

  16. arXiv:2010.12438  [pdf, other

    cs.LG cs.DC

    Transferable Graph Optimizers for ML Compilers

    Authors: Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Wong, Peter Ma, Qiumin Xu, Hanxiao Liu, Phitchaya Mangpo Phothilimthana, Shen Wang, Anna Goldie, Azalia Mirhoseini, James Laudon

    Abstract: Most compilers for machine learning (ML) frameworks need to solve many correlated optimization problems to generate efficient machine code. Current ML compilers rely on heuristics based algorithms to solve these optimization problems one at a time. However, this approach is not only hard to maintain but often leads to sub-optimal solutions especially for newer model architectures. Existing learnin… ▽ More

    Submitted 19 February, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: arXiv admin note: text overlap with arXiv:1910.01578

    Journal ref: NeurIPS 2020

  17. arXiv:2004.10746  [pdf, other

    cs.LG cs.AI

    Chip Placement with Deep Reinforcement Learning

    Authors: Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Sungmin Bae, Azade Nazi, Jiwoo Pak, Andy Tong, Kavya Srinivasa, William Hang, Emre Tuncer, Anand Babu, Quoc V. Le, James Laudon, Richard Ho, Roger Carpenter, Jeff Dean

    Abstract: In this work, we present a learning-based approach to chip placement, one of the most complex and time-consuming stages of the chip design process. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of chip blocks, our method becomes better at rapidly generating optimized placements for previously… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

  18. arXiv:2003.08445  [pdf

    cs.AI

    Placement Optimization with Deep Reinforcement Learning

    Authors: Anna Goldie, Azalia Mirhoseini

    Abstract: Placement Optimization is an important problem in systems and chip design, which consists of mapping the nodes of a graph onto a limited set of resources to optimize for an objective, subject to constraints. In this paper, we start by motivating reinforcement learning as a solution to the placement problem. We then give an overview of what deep reinforcement learning is. We next formulate the plac… ▽ More

    Submitted 18 March, 2020; originally announced March 2020.

    Comments: International Symposium on Physical Design (ISPD), 2020

  19. arXiv:1910.07623  [pdf, other

    cs.LG stat.ML

    Generalized Clustering by Learning to Optimize Expected Normalized Cuts

    Authors: Azade Nazi, Will Hang, Anna Goldie, Sujith Ravi, Azalia Mirhoseini

    Abstract: We introduce a novel end-to-end approach for learning to cluster in the absence of labeled examples. Our clustering objective is based on optimizing normalized cuts, a criterion which measures both intra-cluster similarity as well as inter-cluster dissimilarity. We define a differentiable loss function equivalent to the expected normalized cuts. Unlike much of the work in unsupervised deep learnin… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

  20. arXiv:1910.01578  [pdf, other

    cs.LG stat.ML

    GDP: Generalized Device Placement for Dataflow Graphs

    Authors: Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Wong, Peter C. Ma, Qiumin Xu, Ming Zhong, Hanxiao Liu, Anna Goldie, Azalia Mirhoseini, James Laudon

    Abstract: Runtime and scalability of large neural networks can be significantly affected by the placement of operations in their dataflow graphs on suitable devices. With increasingly complex neural network architectures and heterogeneous device characteristics, finding a reasonable placement is extremely challenging even for domain experts. Most existing automated device placement approaches are impractica… ▽ More

    Submitted 28 September, 2019; originally announced October 2019.

  21. arXiv:1906.06639  [pdf, other

    cs.LG stat.ML

    Reinforcement Learning Driven Heuristic Optimization

    Authors: Qingpeng Cai, Will Hang, Azalia Mirhoseini, George Tucker, Jingtao Wang, Wei Wei

    Abstract: Heuristic algorithms such as simulated annealing, Concorde, and METIS are effective and widely used approaches to find solutions to combinatorial optimization problems. However, they are limited by the high sample complexity required to reach a reasonable solution from a cold-start. In this paper, we introduce a novel framework to generate better initial solutions for heuristic algorithms using re… ▽ More

    Submitted 15 June, 2019; originally announced June 2019.

    Comments: DRL4KDD'19

  22. arXiv:1903.00614  [pdf, other

    cs.LG stat.ML

    GAP: Generalizable Approximate Graph Partitioning Framework

    Authors: Azade Nazi, Will Hang, Anna Goldie, Sujith Ravi, Azalia Mirhoseini

    Abstract: Graph partitioning is the problem of dividing the nodes of a graph into balanced partitions while minimizing the edge cut across the partitions. Due to its combinatorial nature, many approximate solutions have been developed, including variants of multi-level methods and spectral clustering. We propose GAP, a Generalizable Approximate Partitioning framework that takes a deep learning approach to g… ▽ More

    Submitted 1 March, 2019; originally announced March 2019.

  23. arXiv:1806.01531  [pdf, other

    cs.CV

    Deep Mixture of Experts via Shallow Embedding

    Authors: Xin Wang, Fisher Yu, Lisa Dunlap, Yi-An Ma, Ruth Wang, Azalia Mirhoseini, Trevor Darrell, Joseph E. Gonzalez

    Abstract: Larger networks generally have greater representational power at the cost of increased computational complexity. Sparsifying such networks has been an active area of research but has been generally limited to static regularization or dynamic approaches using reinforcement learning. We explore a mixture of experts (MoE) approach to deep dynamic routing, which activates certain experts in the networ… ▽ More

    Submitted 11 April, 2019; v1 submitted 5 June, 2018; originally announced June 2018.

  24. arXiv:1706.04972  [pdf, ps, other

    cs.LG cs.AI

    Device Placement Optimization with Reinforcement Learning

    Authors: Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jeff Dean

    Abstract: The past few years have witnessed a growth in size and computational requirements for training and inference with neural networks. Currently, a common approach to address these requirements is to use a heterogeneous distributed environment with a mixture of hardware devices such as CPUs and GPUs. Importantly, the decision of placing parts of the neural models on devices is often made by human expe… ▽ More

    Submitted 25 June, 2017; v1 submitted 13 June, 2017; originally announced June 2017.

    Comments: To appear at ICML 2017

  25. arXiv:1701.06538  [pdf, other

    cs.LG cs.CL cs.NE stat.ML

    Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    Authors: Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean

    Abstract: The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic and performance challenges. In this… ▽ More

    Submitted 23 January, 2017; originally announced January 2017.

  26. arXiv:1505.05208  [pdf, other

    stat.ML cs.LG

    oASIS: Adaptive Column Sampling for Kernel Matrix Approximation

    Authors: Raajen Patel, Thomas A. Goldstein, Eva L. Dyer, Azalia Mirhoseini, Richard G. Baraniuk

    Abstract: Kernel matrices (e.g. Gram or similarity matrices) are essential for many state-of-the-art approaches to classification, clustering, and dimensionality reduction. For large datasets, the cost of forming and factoring such kernel matrices becomes intractable. To address this challenge, we introduce a new adaptive sampling algorithm called Accelerated Sequential Incoherence Selection (oASIS) that sa… ▽ More

    Submitted 19 May, 2015; originally announced May 2015.

    ACM Class: G.1.0; G.4

  27. arXiv:1503.08169  [pdf, other

    cs.DC cs.LG

    RankMap: A Platform-Aware Framework for Distributed Learning from Dense Datasets

    Authors: Azalia Mirhoseini, Eva L. Dyer, Ebrahim. M. Songhori, Richard G. Baraniuk, Farinaz Koushanfar

    Abstract: This paper introduces RankMap, a platform-aware end-to-end framework for efficient execution of a broad class of iterative learning algorithms for massive and dense datasets. Our framework exploits data structure to factorize it into an ensemble of lower rank subspaces. The factorization creates sparse low-dimensional representations of the data, a property which is leveraged to devise effective m… ▽ More

    Submitted 27 October, 2016; v1 submitted 27 March, 2015; originally announced March 2015.

    Comments: 13 pages, 10 figures