Zum Hauptinhalt springen

Showing 1–20 of 20 results for author: So, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.00008  [pdf, other

    cs.LG cs.CL

    Brainformers: Trading Simplicity for Efficiency

    Authors: Yanqi Zhou, Nan Du, Yanping Huang, Daiyi Peng, Chang Lan, Da Huang, Siamak Shakeri, David So, Andrew Dai, Yifeng Lu, Zhifeng Chen, Quoc Le, Claire Cui, James Laudon, Jeff Dean

    Abstract: Transformers are central to recent successes in natural language processing and computer vision. Transformers have a mostly uniform backbone where layers alternate between feed-forward and self-attention in order to build a deep network. Here we investigate this design choice and find that more complex blocks that have different permutations of layer primitives can be more efficient. Using this in… ▽ More

    Submitted 25 April, 2024; v1 submitted 29 May, 2023; originally announced June 2023.

  2. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  3. arXiv:2302.14838  [pdf, other

    cs.NE cs.AI cs.CL cs.LG

    EvoPrompting: Language Models for Code-Level Neural Architecture Search

    Authors: Angelica Chen, David M. Dohan, David R. So

    Abstract: Given the recent impressive accomplishments of language models (LMs) for code generation, we explore the use of LMs as adaptive mutation and crossover operators for an evolutionary neural architecture search (NAS) algorithm. While NAS still proves too difficult a task for LMs to succeed at solely through prompting, we find that the combination of evolutionary prompt engineering with soft prompt-tu… ▽ More

    Submitted 16 November, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023

  4. arXiv:2302.05433  [pdf, other

    cs.LG cs.NE

    Unified Functional Hashing in Automatic Machine Learning

    Authors: Ryan Gillard, Stephen Jonany, Yingjie Miao, Michael Munn, Connal de Souza, Jonathan Dungay, Chen Liang, David R. So, Quoc V. Le, Esteban Real

    Abstract: The field of Automatic Machine Learning (AutoML) has recently attained impressive results, including the discovery of state-of-the-art machine learning solutions, such as neural image classifiers. This is often done by applying an evolutionary search method, which samples multiple candidate solutions from a large space and evaluates the quality of each candidate through a long training process. As… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    ACM Class: I.2.2; I.2.6

  5. arXiv:2210.11399  [pdf, other

    cs.CL cs.AI cs.LG

    Transcending Scaling Laws with 0.1% Extra Compute

    Authors: Yi Tay, Jason Wei, Hyung Won Chung, Vinh Q. Tran, David R. So, Siamak Shakeri, Xavier Garcia, Huaixiu Steven Zheng, Jinfeng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc V. Le, Mostafa Dehghani

    Abstract: Scaling language models improves performance but comes with significant computational costs. This paper proposes UL2R, a method that substantially improves existing language models and their scaling curves with a relatively tiny amount of extra compute. The key idea is to continue training a state-of-the-art large language model (e.g., PaLM) on a few more steps with UL2's mixture-of-denoiser objec… ▽ More

    Submitted 16 November, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: V2 has updated references/related work

  6. arXiv:2204.05149  [pdf

    cs.LG cs.AI cs.GL

    The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink

    Authors: David Patterson, Joseph Gonzalez, Urs Hölzle, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, Jeff Dean

    Abstract: Machine Learning (ML) workloads have rapidly grown in importance, but raised concerns about their carbon footprint. Four best practices can reduce ML training energy by up to 100x and CO2 emissions up to 1000x. By following best practices, overall ML energy use (across research, development, and production) held steady at <15% of Google's total energy use for the past three years. If the whole ML… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

  7. arXiv:2109.08668  [pdf, other

    cs.LG cs.AI cs.CL cs.NE

    Primer: Searching for Efficient Transformers for Language Modeling

    Authors: David R. So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc V. Le

    Abstract: Large Transformer models have been central to recent advances in natural language processing. The training and inference costs of these models, however, have grown rapidly and become prohibitively expensive. Here we aim to reduce the costs of Transformers by searching for a more efficient variant. Compared to previous approaches, our search is performed at a lower level, over the primitives that d… ▽ More

    Submitted 24 January, 2022; v1 submitted 17 September, 2021; originally announced September 2021.

    Comments: "Primer: Searching for Efficient Transformers for Language Modeling" NeurIPS camera ready. 34 pages

  8. arXiv:2106.07708  [pdf

    cs.LG cs.AI cs.CV eess.IV

    CathAI: Fully Automated Interpretation of Coronary Angiograms Using Neural Networks

    Authors: Robert Avram, Jeffrey E. Olgin, Alvin Wan, Zeeshan Ahmed, Louis Verreault-Julien, Sean Abreau, Derek Wan, Joseph E. Gonzalez, Derek Y. So, Krishan Soni, Geoffrey H. Tison

    Abstract: Coronary heart disease (CHD) is the leading cause of adult death in the United States and worldwide, and for which the coronary angiography procedure is the primary gateway for diagnosis and clinical management decisions. The standard-of-care for interpretation of coronary angiograms depends upon ad-hoc visual assessment by the physician operator. However, ad-hoc visual interpretation of angiogram… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

    Comments: 62 pages, 3 main figures, 2 main tables

    ACM Class: I.4.9; I.2.10; J.3

  9. arXiv:2105.08050  [pdf, other

    cs.LG cs.CL cs.CV

    Pay Attention to MLPs

    Authors: Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le

    Abstract: Transformers have become one of the most important architectural innovations in deep learning and have enabled many breakthroughs over the past few years. Here we propose a simple network architecture, gMLP, based on MLPs with gating, and show that it can perform as well as Transformers in key language and vision applications. Our comparisons show that self-attention is not critical for Vision Tra… ▽ More

    Submitted 1 June, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

  10. arXiv:2104.10350  [pdf

    cs.LG cs.CY

    Carbon Emissions and Large Neural Network Training

    Authors: David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, Jeff Dean

    Abstract: The computation demand for machine learning (ML) has grown rapidly recently, which comes with a number of costs. Estimating the energy cost helps measure its environmental impact and finding greener strategies, yet it is challenging without detailed information. We calculate the energy use and carbon footprint of several recent large models-T5, Meena, GShard, Switch Transformer, and GPT-3-and refi… ▽ More

    Submitted 23 April, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

  11. arXiv:2102.02340  [pdf, other

    cs.LG cs.AI cs.CL

    MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records

    Authors: Zhen Xu, David R. So, Andrew M. Dai

    Abstract: One important challenge of applying deep learning to electronic health records (EHR) is the complexity of their multimodal structure. EHR usually contains a mixture of structured (codes) and unstructured (free-text) data with sparse and irregular longitudinal features -- all of which doctors utilize when making decisions. In the deep learning regime, determining how different modality representati… ▽ More

    Submitted 5 October, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

    Comments: Accepted for publication at the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

  12. arXiv:2003.03384  [pdf, other

    cs.LG cs.NE stat.ML

    AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

    Authors: Esteban Real, Chen Liang, David R. So, Quoc V. Le

    Abstract: Machine learning research has advanced in multiple aspects, including model structures and learning methods. The effort to automate such research, known as AutoML, has also made significant progress. However, this progress has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks---or similarly restrictive search spac… ▽ More

    Submitted 30 June, 2020; v1 submitted 6 March, 2020; originally announced March 2020.

    Comments: Accepted for publication at the 37th International Conference on Machine Learning (ICML 2020). Near camera-ready version

    ACM Class: I.2.2; I.2.6

  13. arXiv:2001.09977  [pdf, other

    cs.CL cs.LG cs.NE stat.ML

    Towards a Human-like Open-Domain Chatbot

    Authors: Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le

    Abstract: We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation.… ▽ More

    Submitted 27 February, 2020; v1 submitted 27 January, 2020; originally announced January 2020.

    Comments: 38 pages, 12 figures

  14. arXiv:1901.11117  [pdf, other

    cs.LG cs.CL cs.NE stat.ML

    The Evolved Transformer

    Authors: David R. So, Chen Liang, Quoc V. Le

    Abstract: Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models. Our goal is to apply NAS to search for a better alternative to the Transformer. We first construct a large search space inspired by the recent advances in feed-forward sequence models and then run evolu… ▽ More

    Submitted 17 May, 2019; v1 submitted 30 January, 2019; originally announced January 2019.

    Comments: ICML version with SOTA results

  15. arXiv:1803.10342  [pdf, other

    q-bio.BM cs.LG stat.ML

    Classification of crystallization outcomes using deep convolutional neural networks

    Authors: Andrew E. Bruno, Patrick Charbonneau, Janet Newman, Edward H. Snell, David R. So, Vincent Vanhoucke, Christopher J. Watkins, Shawn Williams, Julie Wilson

    Abstract: The Machine Recognition of Crystallization Outcomes (MARCO) initiative has assembled roughly half a million annotated images of macromolecular crystallization experiments from various sources and setups. Here, state-of-the-art machine learning algorithms are trained and tested on different parts of this data set. We find that more than 94% of the test images can be correctly labeled, irrespective… ▽ More

    Submitted 25 May, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

    Comments: 11 pages, 4 figures, minor text and figure updates

  16. arXiv:1709.10459  [pdf, other

    cs.CV cs.LG cs.NE

    Improving image generative models with human interactions

    Authors: Andrew Kyle Lampinen, David So, Douglas Eck, Fred Bertsch

    Abstract: GANs provide a framework for training generative models which mimic a data distribution. However, in many cases we wish to train these generative models to optimize some auxiliary objective function within the data it generates, such as making more aesthetically pleasing images. In some cases, these objective functions are difficult to evaluate, e.g. they may require human interaction. Here, we de… ▽ More

    Submitted 29 September, 2017; originally announced September 2017.

  17. arXiv:1611.00321   

    cs.IT

    Energy Efficiency Optimization with Simultaneous Wireless Information and Power Transfer in MIMO Broadcast Channels

    Authors: Jie Tang, Daniel K. C. So, Arman Shojaeifard, Kai-Kit Wong

    Abstract: Simultaneous wireless information and power transfer (SWIPT) is anticipated to have great applications in fifth-generation (5G) and beyond communication systems. In this paper, we address the energy efficiency (EE) optimization problem for SWIPT multiple-input multiple-output broadcast channel (MIMO-BC) with time-switching (TS) receiver design. Our aim is to maximize the EE of the system whilst sa… ▽ More

    Submitted 23 October, 2017; v1 submitted 1 November, 2016; originally announced November 2016.

    Comments: The optimality of the proposed solution cannot be guaranteed with existing techniques. As a result, this submission is withdrawn

  18. arXiv:1611.00277  [pdf, other

    cs.IT

    Joint Antenna Selection and Spatial Switching for Energy Efficient MIMO SWIPT System

    Authors: Jie Tang, Daniel K. C. So, Arman Shojaeifard, Kai-Kit Wong, Jinming Wen

    Abstract: In this paper, we investigate joint antenna selection and spatial switching (SS) for quality-of-service (QoS)-constrained energy efficiency (EE) optimization in a multiple-input multiple-output (MIMO) simultaneous wireless information and power transfer (SWIPT) system. A practical linear power model taking into account the entire transmit-receive chain is accordingly utilized. The corresponding fr… ▽ More

    Submitted 1 November, 2016; originally announced November 2016.

  19. arXiv:1610.09683  [pdf, other

    cs.IT

    Energy-Efficient Heterogeneous Cellular Networks with Spectrum Underlay and Overlay Access

    Authors: Jie Tang, Daniel K. C. So, Emad Alsusa, Khairi Ashour Hamdi, Arman Shojaeifard, Kai-Kit Wong

    Abstract: In this paper, we provide joint subcarrier assignment and power allocation schemes for quality-of-service (QoS)-constrained energy-efficiency (EE) optimization in the downlink of an orthogonal frequency division multiple access (OFDMA)-based two-tier heterogeneous cellular network (HCN). Considering underlay transmission, where spectrum-efficiency (SE) is fully exploited, the EE solution involves… ▽ More

    Submitted 30 October, 2016; originally announced October 2016.

  20. arXiv:1610.06846  [pdf, other

    cs.IT

    Stochastic Geometric Analysis of Energy-Efficient Dense Cellular Networks

    Authors: Arman Shojaeifard, Kai-Kit Wong, Khairi Ashour Hamdi, Emad Alsusa, Daniel K. C. So, Jie Tang

    Abstract: Dense cellular networks (DenseNets) are fast becoming a reality with the rapid deployment of base stations (BSs) aimed at meeting the explosive data traffic demand. In legacy systems however this comes with the penalties of higher network interference and energy consumption. In order to support network densification in a sustainable manner, the system behavior should be made 'load-proportional' th… ▽ More

    Submitted 21 October, 2016; originally announced October 2016.