Zum Hauptinhalt springen

Showing 51–100 of 125 results for author: Le, Q V

.
  1. arXiv:2003.10580  [pdf, other

    cs.LG stat.ML

    Meta Pseudo Labels

    Authors: Hieu Pham, Zihang Dai, Qizhe Xie, Minh-Thang Luong, Quoc V. Le

    Abstract: We present Meta Pseudo Labels, a semi-supervised learning method that achieves a new state-of-the-art top-1 accuracy of 90.2% on ImageNet, which is 1.6% better than the existing state-of-the-art. Like Pseudo Labels, Meta Pseudo Labels has a teacher network to generate pseudo labels on unlabeled data to teach a student network. However, unlike Pseudo Labels where the teacher is fixed, the teacher i… ▽ More

    Submitted 1 March, 2021; v1 submitted 23 March, 2020; originally announced March 2020.

    Comments: Preprint

  2. arXiv:2003.10555  [pdf, other

    cs.CL

    ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

    Authors: Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning

    Abstract: Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. While they produce good results when transferred to downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we propose a more sample-efficient pre-training task called rep… ▽ More

    Submitted 23 March, 2020; originally announced March 2020.

    Comments: ICLR 2020

  3. arXiv:2003.03384  [pdf, other

    cs.LG cs.NE stat.ML

    AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

    Authors: Esteban Real, Chen Liang, David R. So, Quoc V. Le

    Abstract: Machine learning research has advanced in multiple aspects, including model structures and learning methods. The effort to automate such research, known as AutoML, has also made significant progress. However, this progress has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks---or similarly restrictive search spac… ▽ More

    Submitted 30 June, 2020; v1 submitted 6 March, 2020; originally announced March 2020.

    Comments: Accepted for publication at the 37th International Conference on Machine Learning (ICML 2020). Near camera-ready version

    ACM Class: I.2.2; I.2.6

  4. arXiv:2001.09977  [pdf, other

    cs.CL cs.LG cs.NE stat.ML

    Towards a Human-like Open-Domain Chatbot

    Authors: Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le

    Abstract: We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation.… ▽ More

    Submitted 27 February, 2020; v1 submitted 27 January, 2020; originally announced January 2020.

    Comments: 38 pages, 12 figures

  5. arXiv:1912.05533  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    SpecAugment on Large Scale Datasets

    Authors: Daniel S. Park, Yu Zhang, Chung-Cheng Chiu, Youzheng Chen, Bo Li, William Chan, Quoc V. Le, Yonghui Wu

    Abstract: Recently, SpecAugment, an augmentation scheme for automatic speech recognition that acts directly on the spectrogram of input utterances, has shown to be highly effective in enhancing the performance of end-to-end networks on public datasets. In this paper, we demonstrate its effectiveness on tasks with large scale datasets by investigating its application to the Google Multidomain Dataset (Naraya… ▽ More

    Submitted 11 December, 2019; originally announced December 2019.

    Comments: 5 pages, 3 tables; submitted to ICASSP 2020

  6. arXiv:1912.05027  [pdf, other

    cs.CV cs.LG eess.IV

    SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

    Authors: Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song

    Abstract: Convolutional neural networks typically encode an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection). The encoder-decoder architectures are proposed to resolve this by applying a decoder network onto a b… ▽ More

    Submitted 17 June, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: CVPR 2020

  7. arXiv:1912.01106  [pdf, other

    cs.CV

    MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices

    Authors: Bo Chen, Golnaz Ghiasi, Hanxiao Liu, Tsung-Yi Lin, Dmitry Kalenichenko, Hartwig Adams, Quoc V. Le

    Abstract: Despite the blooming success of architecture search for vision tasks in resource-constrained environments, the design of on-device object detection architectures have mostly been manual. The few automated search efforts are either centered around non-mobile-friendly search spaces or not guided by on-device latency. We propose MnasFPN, a mobile-friendly search space for the detection head, and comb… ▽ More

    Submitted 30 July, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: 10 pages, 7 figures

  8. arXiv:1911.09665  [pdf, other

    cs.CV

    Adversarial Examples Improve Image Recognition

    Authors: Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le

    Abstract: Adversarial examples are commonly viewed as a threat to ConvNets. Here we present an opposite perspective: adversarial examples can be used to improve image recognition models if harnessed in the right manner. We propose AdvProp, an enhanced adversarial training scheme which treats adversarial examples as additional examples, to prevent overfitting. Key to our method is the usage of a separate aux… ▽ More

    Submitted 14 April, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

    Comments: CVPR 2020, models are available at https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet

  9. arXiv:1911.09070  [pdf, other

    cs.CV cs.LG eess.IV

    EfficientDet: Scalable and Efficient Object Detection

    Authors: Mingxing Tan, Ruoming Pang, Quoc V. Le

    Abstract: Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multiscale feature fusion; Second, we propose a compound scal… ▽ More

    Submitted 27 July, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

    Comments: CVPR 2020

    Journal ref: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

  10. arXiv:1911.04252  [pdf, other

    cs.LG cs.CV stat.ML

    Self-training with Noisy Student improves ImageNet classification

    Authors: Qizhe Xie, Minh-Thang Luong, Eduard Hovy, Quoc V. Le

    Abstract: We present Noisy Student Training, a semi-supervised learning approach that works well even when labeled data is abundant. Noisy Student Training achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to 83.7%, reduces ImageNet-C mea… ▽ More

    Submitted 19 June, 2020; v1 submitted 11 November, 2019; originally announced November 2019.

    Comments: CVPR 2020

  11. arXiv:1911.01655  [pdf, other

    cs.CV

    High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

    Authors: Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, Honglak Lee

    Abstract: Predicting future video frames is extremely challenging, as there are many factors of variation that make up the dynamics of how frames change through time. Previously proposed solutions require complex inductive biases inside network architectures with highly specialized computation, including segmentation masks, optical flow, and foreground and background separation. In this work, we question if… ▽ More

    Submitted 5 November, 2019; originally announced November 2019.

    Comments: In Advances in Neural Information Processing Systems (NeurIPS), 2019

  12. arXiv:1909.13719  [pdf, other

    cs.CV

    RandAugment: Practical automated data augmentation with a reduced search space

    Authors: Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, Quoc V. Le

    Abstract: Recent work has shown that data augmentation has the potential to significantly improve the generalization of deep learning models. Recently, automated augmentation strategies have led to state-of-the-art results in image classification and object detection. While these strategies were optimized for improving validation accuracy, they also led to state-of-the-art results in semi-supervised learnin… ▽ More

    Submitted 13 November, 2019; v1 submitted 30 September, 2019; originally announced September 2019.

    Comments: Added ablation experiments

  13. arXiv:1908.07644  [pdf, other

    cs.CV cs.LG stat.ML

    Saccader: Improving Accuracy of Hard Attention Models for Vision

    Authors: Gamaleldin F. Elsayed, Simon Kornblith, Quoc V. Le

    Abstract: Although deep convolutional neural networks achieve state-of-the-art performance across nearly all image classification tasks, their decisions are difficult to interpret. One approach that offers some level of interpretability by design is \textit{hard attention}, which uses only relevant portions of the image. However, training hard attention models with only class label supervision is challengin… ▽ More

    Submitted 6 December, 2019; v1 submitted 20 August, 2019; originally announced August 2019.

    Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

  14. arXiv:1907.09595  [pdf, other

    cs.CV cs.LG

    MixConv: Mixed Depthwise Convolutional Kernels

    Authors: Mingxing Tan, Quoc V. Le

    Abstract: Depthwise convolution is becoming increasingly popular in modern efficient ConvNets, but its kernel size is often overlooked. In this paper, we systematically study the impact of different kernel sizes, and observe that combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency. Based on this observation, we propose a new mixed depthwise convolution (MixConv), which… ▽ More

    Submitted 1 December, 2019; v1 submitted 22 July, 2019; originally announced July 2019.

    Comments: BMVC 2019

    Journal ref: BMVC 2019

  15. arXiv:1907.04829  [pdf, other

    cs.CL

    BAM! Born-Again Multi-Task Networks for Natural Language Understanding

    Authors: Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, Quoc V. Le

    Abstract: It can be challenging to train multi-task neural networks that outperform or even match their single-task counterparts. To help address this, we propose using knowledge distillation where single-task models teach a multi-task model. We enhance this training with teacher annealing, a novel method that gradually transitions the model from distillation to supervised learning, helping the multi-task m… ▽ More

    Submitted 10 July, 2019; originally announced July 2019.

    Comments: ACL 2019

  16. arXiv:1907.04471  [pdf, ps, other

    cs.LG cs.IR stat.ML

    Neural Input Search for Large Scale Recommendation Models

    Authors: Manas R. Joglekar, Cong Li, Jay K. Adams, Pranav Khaitan, Quoc V. Le

    Abstract: Recommendation problems with large numbers of discrete items, such as products, webpages, or videos, are ubiquitous in the technology industry. Deep neural networks are being increasingly used for these recommendation problems. These models use embeddings to represent discrete items as continuous vectors, and the vocabulary sizes and embedding dimensions, although heavily influence the model's acc… ▽ More

    Submitted 9 July, 2019; originally announced July 2019.

  17. arXiv:1906.11172  [pdf, other

    cs.CV cs.LG

    Learning Data Augmentation Strategies for Object Detection

    Authors: Barret Zoph, Ekin D. Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon Shlens, Quoc V. Le

    Abstract: Data augmentation is a critical component of training deep learning models. Although data augmentation has been shown to significantly improve image classification, its potential has not been thoroughly investigated for object detection. Given the additional cost for annotating images for object detection, data augmentation may be of even greater importance for this computer vision task. In this w… ▽ More

    Submitted 26 June, 2019; originally announced June 2019.

  18. arXiv:1906.08237  [pdf, other

    cs.CL cs.LG

    XLNet: Generalized Autoregressive Pretraining for Language Understanding

    Authors: Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le

    Abstract: With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we p… ▽ More

    Submitted 2 January, 2020; v1 submitted 19 June, 2019; originally announced June 2019.

    Comments: Pretrained models and code are available at https://github.com/zihangdai/xlnet

  19. arXiv:1906.02940  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    Selfie: Self-supervised Pretraining for Image Embedding

    Authors: Trieu H. Trinh, Minh-Thang Luong, Quoc V. Le

    Abstract: We introduce a pretraining technique called Selfie, which stands for SELFie supervised Image Embedding. Selfie generalizes the concept of masked language modeling of BERT (Devlin et al., 2019) to continuous data, such as images, by making use of the Contrastive Predictive Coding loss (Oord et al., 2018). Given masked-out patches in an input image, our method learns to select the correct patch, amo… ▽ More

    Submitted 27 July, 2019; v1 submitted 7 June, 2019; originally announced June 2019.

  20. arXiv:1905.11946  [pdf, other

    cs.LG cs.CV stat.ML

    EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

    Authors: Mingxing Tan, Quoc V. Le

    Abstract: Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, we propose a new scaling method that uniformly sc… ▽ More

    Submitted 11 September, 2020; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: ICML 2019

    Journal ref: International Conference on Machine Learning, 2019

  21. arXiv:1905.03776  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

    Authors: Daniel S. Park, Jascha Sohl-Dickstein, Quoc V. Le, Samuel L. Smith

    Abstract: We investigate how the final parameters found by stochastic gradient descent are influenced by over-parameterization. We generate families of models by increasing the number of channels in a base network, and then perform a large hyper-parameter search to study how the test error depends on learning rate, batch size, and network width. We find that the optimal SGD hyper-parameters are determined b… ▽ More

    Submitted 9 May, 2019; originally announced May 2019.

    Comments: 17 pages, 3 tables, 17 figures; accepted to ICML 2019

  22. arXiv:1905.02244  [pdf, other

    cs.CV

    Searching for MobileNetV3

    Authors: Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam

    Abstract: We present the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design. MobileNetV3 is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances. This paper starts the exploration… ▽ More

    Submitted 20 November, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

    Comments: ICCV 2019

  23. arXiv:1904.12848  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    Unsupervised Data Augmentation for Consistency Training

    Authors: Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, Quoc V. Le

    Abstract: Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality… ▽ More

    Submitted 5 November, 2020; v1 submitted 29 April, 2019; originally announced April 2019.

    Comments: NeurIPS 2020

  24. arXiv:1904.09925  [pdf, other

    cs.CV

    Attention Augmented Convolutional Networks

    Authors: Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, Quoc V. Le

    Abstract: Convolutional networks have been the paradigm of choice in many computer vision applications. The convolution operation however has a significant weakness in that it only operates on a local neighborhood, thus missing global information. Self-attention, on the other hand, has emerged as a recent advance to capture long range interactions, but has mostly been applied to sequence modeling and genera… ▽ More

    Submitted 9 September, 2020; v1 submitted 22 April, 2019; originally announced April 2019.

    Comments: ICCV 2019

  25. arXiv:1904.08779  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

    Authors: Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le

    Abstract: We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients). The augmentation policy consists of warping the features, masking blocks of frequency channels, and masking blocks of time steps. We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech… ▽ More

    Submitted 3 December, 2019; v1 submitted 18 April, 2019; originally announced April 2019.

    Comments: 5 pages, 3 figures, 6 tables; v3: references added

    Journal ref: Proc. Interspeech 2019, 2613-2617

  26. arXiv:1904.07392  [pdf, other

    cs.CV cs.LG

    NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

    Authors: Golnaz Ghiasi, Tsung-Yi Lin, Ruoming Pang, Quoc V. Le

    Abstract: Current state-of-the-art convolutional architectures for object detection are manually designed. Here we aim to learn a better architecture of feature pyramid network for object detection. We adopt Neural Architecture Search and discover a new feature pyramid architecture in a novel scalable search space covering all cross-scale connections. The discovered architecture, named NAS-FPN, consists of… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: Accepted at CVPR 2019

  27. arXiv:1904.04971  [pdf, other

    cs.CV cs.AI cs.LG

    CondConv: Conditionally Parameterized Convolutions for Efficient Inference

    Authors: Brandon Yang, Gabriel Bender, Quoc V. Le, Jiquan Ngiam

    Abstract: Convolutional layers are one of the basic building blocks of modern deep neural networks. One fundamental assumption is that convolutional kernels should be shared for all examples in a dataset. We propose conditionally parameterized convolutions (CondConv), which learn specialized convolutional kernels for each example. Replacing normal convolutions with CondConv enables us to increase the size a… ▽ More

    Submitted 3 September, 2020; v1 submitted 9 April, 2019; originally announced April 2019.

    Journal ref: NeurIPS 2019

  28. arXiv:1901.11117  [pdf, other

    cs.LG cs.CL cs.NE stat.ML

    The Evolved Transformer

    Authors: David R. So, Chen Liang, Quoc V. Le

    Abstract: Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models. Our goal is to apply NAS to search for a better alternative to the Transformer. We first construct a large search space inspired by the recent advances in feed-forward sequence models and then run evolu… ▽ More

    Submitted 17 May, 2019; v1 submitted 30 January, 2019; originally announced January 2019.

    Comments: ICML version with SOTA results

  29. arXiv:1901.02860  [pdf, other

    cs.LG cs.CL stat.ML

    Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

    Authors: Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov

    Abstract: Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not… ▽ More

    Submitted 2 June, 2019; v1 submitted 9 January, 2019; originally announced January 2019.

    Comments: ACL 2019 long paper. Code and pretrained models are available at https://github.com/kimiyoung/transformer-xl

  30. arXiv:1811.07056  [pdf, other

    cs.CV cs.LG

    Domain Adaptive Transfer Learning with Specialist Models

    Authors: Jiquan Ngiam, Daiyi Peng, Vijay Vasudevan, Simon Kornblith, Quoc V. Le, Ruoming Pang

    Abstract: Transfer learning is a widely used method to build high performing computer vision models. In this paper, we study the efficacy of transfer learning by examining how the choice of data impacts performance. We find that more pre-training data does not always help, and transfer performance depends on a judicious choice of pre-training data. These findings are important given the continued increase i… ▽ More

    Submitted 11 December, 2018; v1 submitted 16 November, 2018; originally announced November 2018.

  31. arXiv:1811.06965  [pdf, other

    cs.CV

    GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

    Authors: Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, Zhifeng Chen

    Abstract: Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructure. These solutions are often architecture-specific and do not transfer to other tasks. To address t… ▽ More

    Submitted 25 July, 2019; v1 submitted 16 November, 2018; originally announced November 2018.

    Comments: 11 pages. Work in progress. Copyright 2018 by the authors

  32. arXiv:1810.12890  [pdf, other

    cs.CV

    DropBlock: A regularization method for convolutional networks

    Authors: Golnaz Ghiasi, Tsung-Yi Lin, Quoc V. Le

    Abstract: Deep neural networks often work well when they are over-parameterized and trained with a massive amount of noise and regularization, such as weight decay and dropout. Although dropout is widely used as a regularization technique for fully connected layers, it is often less effective for convolutional layers. This lack of success of dropout for convolutional layers is perhaps due to the fact that a… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

    Comments: Accepted at NIPS 2018

  33. arXiv:1809.08370  [pdf, other

    cs.CL

    Semi-Supervised Sequence Modeling with Cross-View Training

    Authors: Kevin Clark, Minh-Thang Luong, Christopher D. Manning, Quoc V. Le

    Abstract: Unsupervised representation learning algorithms such as word2vec and ELMo improve the accuracy of many supervised NLP models, mainly because they can take advantage of large amounts of unlabeled text. However, the supervised models only learn from task-specific labeled data during the main training phase. We therefore propose Cross-View Training (CVT), a semi-supervised learning algorithm that imp… ▽ More

    Submitted 21 September, 2018; originally announced September 2018.

    Comments: EMNLP 2018

  34. arXiv:1807.11626  [pdf, other

    cs.CV cs.LG

    MnasNet: Platform-Aware Neural Architecture Search for Mobile

    Authors: Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le

    Abstract: Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, we propose a… ▽ More

    Submitted 28 May, 2019; v1 submitted 30 July, 2018; originally announced July 2018.

    Comments: Published in CVPR 2019

    Journal ref: CVPR 2019

  35. arXiv:1806.09597  [pdf, other

    cs.LG cs.AI stat.ML

    Stochastic natural gradient descent draws posterior samples in function space

    Authors: Samuel L. Smith, Daniel Duckworth, Semon Rezchikov, Quoc V. Le, Jascha Sohl-Dickstein

    Abstract: Recent work has argued that stochastic gradient descent can approximate the Bayesian uncertainty in model parameters near local minima. In this work we develop a similar correspondence for minibatch natural gradient descent (NGD). We prove that for sufficiently small learning rates, if the model predictions on the training set approach the true conditional distribution of labels given inputs, the… ▽ More

    Submitted 28 November, 2018; v1 submitted 25 June, 2018; originally announced June 2018.

    Comments: Workshop on Bayesian Deep Learning (NeurIPS 2018)

  36. arXiv:1806.02847  [pdf, other

    cs.AI cs.CL cs.LG

    A Simple Method for Commonsense Reasoning

    Authors: Trieu H. Trinh, Quoc V. Le

    Abstract: Commonsense reasoning is a long-standing challenge for deep learning. For example, it is difficult to use neural networks to tackle the Winograd Schema dataset (Levesque et al., 2011). In this paper, we present a simple method for commonsense reasoning with neural networks, using unsupervised learning. Key to our method is the use of language models, trained on a massive amount of unlabled data, t… ▽ More

    Submitted 26 September, 2019; v1 submitted 7 June, 2018; originally announced June 2018.

  37. arXiv:1805.09501  [pdf, other

    cs.CV cs.LG stat.ML

    AutoAugment: Learning Augmentation Policies from Data

    Authors: Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, Quoc V. Le

    Abstract: Data augmentation is an effective technique for improving the accuracy of modern image classifiers. However, current data augmentation implementations are manually designed. In this paper, we describe a simple procedure called AutoAugment to automatically search for improved data augmentation policies. In our implementation, we have designed a search space where a policy consists of many sub-polic… ▽ More

    Submitted 11 April, 2019; v1 submitted 24 May, 2018; originally announced May 2018.

    Comments: CVPR 2019

  38. arXiv:1805.08974  [pdf, other

    cs.CV cs.LG stat.ML

    Do Better ImageNet Models Transfer Better?

    Authors: Simon Kornblith, Jonathon Shlens, Quoc V. Le

    Abstract: Transfer learning is a cornerstone of computer vision, yet little work has been done to evaluate the relationship between architecture and transfer. An implicit hypothesis in modern computer vision research is that models that perform better on ImageNet necessarily perform better on other vision tasks. However, this hypothesis has never been systematically tested. Here, we compare the performance… ▽ More

    Submitted 17 June, 2019; v1 submitted 23 May, 2018; originally announced May 2018.

    Comments: CVPR 2019 Oral

  39. arXiv:1804.09541  [pdf, other

    cs.CL cs.AI cs.LG

    QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension

    Authors: Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, Quoc V. Le

    Abstract: Current end-to-end machine reading and question answering (Q\&A) models are primarily based on recurrent neural networks (RNNs) with attention. Despite their success, these models are often slow for both training and inference due to the sequential nature of RNNs. We propose a new Q\&A architecture called QANet, which does not require recurrent networks: Its encoder consists exclusively of convolu… ▽ More

    Submitted 23 April, 2018; originally announced April 2018.

    Comments: Published as full paper in ICLR 2018

  40. arXiv:1803.00144  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Learning Longer-term Dependencies in RNNs with Auxiliary Losses

    Authors: Trieu H. Trinh, Andrew M. Dai, Minh-Thang Luong, Quoc V. Le

    Abstract: Despite recent advances in training recurrent neural networks (RNNs), capturing long-term dependencies in sequences remains a fundamental challenge. Most approaches use backpropagation through time (BPTT), which is difficult to scale to very long sequences. This paper proposes a simple method that improves the ability to capture long term dependencies in RNNs by adding an unsupervised auxiliary lo… ▽ More

    Submitted 13 June, 2018; v1 submitted 28 February, 2018; originally announced March 2018.

    Comments: ICML 2018

  41. arXiv:1802.03268  [pdf, ps, other

    cs.LG cs.CL cs.CV cs.NE stat.ML

    Efficient Neural Architecture Search via Parameter Sharing

    Authors: Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean

    Abstract: We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the m… ▽ More

    Submitted 11 February, 2018; v1 submitted 9 February, 2018; originally announced February 2018.

  42. arXiv:1802.01548  [pdf, other

    cs.NE cs.AI cs.CV cs.DC

    Regularized Evolution for Image Classifier Architecture Search

    Authors: Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V Le

    Abstract: The effort devoted to hand-crafting neural network image classifiers has motivated the use of architecture search to discover them automatically. Although evolutionary algorithms have been repeatedly applied to neural network topologies, the image classifiers thus discovered have remained inferior to human-crafted ones. Here, we evolve an image classifier---AmoebaNet-A---that surpasses hand-design… ▽ More

    Submitted 16 February, 2019; v1 submitted 5 February, 2018; originally announced February 2018.

    Comments: Accepted for publication at AAAI 2019, the Thirty-Third AAAI Conference on Artificial Intelligence

    ACM Class: I.2.6; I.5.1; I.5.2

  43. arXiv:1801.03526  [pdf, ps, other

    cs.AI

    Neural Program Synthesis with Priority Queue Training

    Authors: Daniel A. Abolafia, Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le

    Abstract: We consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards. We employ an iterative optimization scheme, where we train an RNN on a dataset of K best programs from a priority queue of the generated programs so far. Then, we synthesize new programs and add them to the priority queue by sampling f… ▽ More

    Submitted 23 March, 2018; v1 submitted 10 January, 2018; originally announced January 2018.

  44. arXiv:1711.02846  [pdf, other

    stat.ML cs.LG

    Intriguing Properties of Adversarial Examples

    Authors: Ekin D. Cubuk, Barret Zoph, Samuel S. Schoenholz, Quoc V. Le

    Abstract: It is becoming increasingly clear that many machine learning classifiers are vulnerable to adversarial examples. In attempting to explain the origin of adversarial examples, previous studies have typically focused on the fact that neural networks operate on high dimensional data, they overfit, or they are too linear. Here we argue that the origin of adversarial examples is primarily due to an inhe… ▽ More

    Submitted 8 November, 2017; originally announced November 2017.

    Comments: 17 pages

  45. arXiv:1711.02301  [pdf, other

    cs.AI cs.NE stat.ML

    Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?

    Authors: Maithra Raghu, Alex Irpan, Jacob Andreas, Robert Kleinberg, Quoc V. Le, Jon Kleinberg

    Abstract: Deep reinforcement learning has achieved many recent successes, but our understanding of its strengths and limitations is hampered by the lack of rich environments in which we can fully characterize optimal behavior, and correspondingly diagnose individual actions against such a characterization. Here we consider a family of combinatorial games, arising from work of Erdos, Selfridge, and Spencer,… ▽ More

    Submitted 28 June, 2018; v1 submitted 7 November, 2017; originally announced November 2017.

    Comments: Accepted to ICML 2018, code opensourced at: https://github.com/rubai5/ESS_Game

  46. arXiv:1711.00489  [pdf, other

    cs.LG cs.CV cs.DC stat.ML

    Don't Decay the Learning Rate, Increase the Batch Size

    Authors: Samuel L. Smith, Pieter-Jan Kindermans, Chris Ying, Quoc V. Le

    Abstract: It is common practice to decay the learning rate. Here we show one can usually obtain the same learning curve on both training and test sets by instead increasing the batch size during training. This procedure is successful for stochastic gradient descent (SGD), SGD with momentum, Nesterov momentum, and Adam. It reaches equivalent test accuracies after the same number of training epochs, but with… ▽ More

    Submitted 23 February, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

    Comments: 11 pages, 8 figures. Published as a conference paper at ICLR 2018

  47. arXiv:1710.06451  [pdf, other

    cs.LG cs.AI stat.ML

    A Bayesian Perspective on Generalization and Stochastic Gradient Descent

    Authors: Samuel L. Smith, Quoc V. Le

    Abstract: We consider two questions at the heart of machine learning; how can we predict if a minimum will generalize to the test set, and why does stochastic gradient descent find minima that generalize well? Our work responds to Zhang et al. (2016), who showed deep neural networks can easily memorize randomly labeled training data, despite generalizing well on real labels of the same inputs. We show that… ▽ More

    Submitted 14 February, 2018; v1 submitted 17 October, 2017; originally announced October 2017.

    Comments: 13 pages, 9 figures. Published as a conference paper at ICLR 2018

  48. arXiv:1710.05941  [pdf, other

    cs.NE cs.CV cs.LG

    Searching for Activation Functions

    Authors: Prajit Ramachandran, Barret Zoph, Quoc V. Le

    Abstract: The choice of activation functions in deep networks has a significant effect on the training dynamics and task performance. Currently, the most successful and widely-used activation function is the Rectified Linear Unit (ReLU). Although various hand-designed alternatives to ReLU have been proposed, none have managed to replace it due to inconsistent gains. In this work, we propose to leverage auto… ▽ More

    Submitted 27 October, 2017; v1 submitted 16 October, 2017; originally announced October 2017.

    Comments: Updated version of "Swish: a Self-Gated Activation Function"

  49. arXiv:1709.07417  [pdf, other

    cs.AI cs.LG stat.ML

    Neural Optimizer Search with Reinforcement Learning

    Authors: Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le

    Abstract: We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathematical update equation based on a list of primitive functions, such as the gradient, running average of the gradient, etc. The controller is trained w… ▽ More

    Submitted 22 September, 2017; v1 submitted 21 September, 2017; originally announced September 2017.

    Comments: ICML 2017 Conference paper

  50. arXiv:1707.07012  [pdf, other

    cs.CV cs.LG stat.ML

    Learning Transferable Architectures for Scalable Image Recognition

    Authors: Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le

    Abstract: Developing neural network image classification models often requires significant architecture engineering. In this paper, we study a method to learn the model architectures directly on the dataset of interest. As this approach is expensive when the dataset is large, we propose to search for an architectural building block on a small dataset and then transfer the block to a larger dataset. The key… ▽ More

    Submitted 11 April, 2018; v1 submitted 21 July, 2017; originally announced July 2017.