Zum Hauptinhalt springen

Showing 101–125 of 125 results for author: Le, Q V

.
  1. arXiv:1706.04972  [pdf, ps, other

    cs.LG cs.AI

    Device Placement Optimization with Reinforcement Learning

    Authors: Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jeff Dean

    Abstract: The past few years have witnessed a growth in size and computational requirements for training and inference with neural networks. Currently, a common approach to address these requirements is to use a heterogeneous distributed environment with a mixture of hardware devices such as CPUs and GPUs. Importantly, the decision of placing parts of the neural models on devices is often made by human expe… ▽ More

    Submitted 25 June, 2017; v1 submitted 13 June, 2017; originally announced June 2017.

    Comments: To appear at ICML 2017

  2. arXiv:1704.06877  [pdf, other

    cs.CL cs.LG

    Learning to Skim Text

    Authors: Adams Wei Yu, Hongrae Lee, Quoc V. Le

    Abstract: Recurrent Neural Networks are showing much promise in many sub-areas of natural language processing, ranging from document classification to machine translation to automatic question answering. Despite their promise, many recurrent models have to read the whole text word by word, making it slow to handle long documents. For example, it is difficult to use a recurrent network to read a book and ans… ▽ More

    Submitted 29 April, 2017; v1 submitted 22 April, 2017; originally announced April 2017.

  3. arXiv:1611.09940  [pdf, ps, other

    cs.AI cs.LG stat.ML

    Neural Combinatorial Optimization with Reinforcement Learning

    Authors: Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, Samy Bengio

    Abstract: This paper presents a framework to tackle combinatorial optimization problems using neural networks and reinforcement learning. We focus on the traveling salesman problem (TSP) and train a recurrent network that, given a set of city coordinates, predicts a distribution over different city permutations. Using negative tour length as the reward signal, we optimize the parameters of the recurrent net… ▽ More

    Submitted 12 January, 2017; v1 submitted 29 November, 2016; originally announced November 2016.

    Comments: Under review as a conference paper at ICLR 2017

  4. arXiv:1611.08945  [pdf, other

    cs.CL cs.LG stat.ML

    Learning a Natural Language Interface with Neural Programmer

    Authors: Arvind Neelakantan, Quoc V. Le, Martin Abadi, Andrew McCallum, Dario Amodei

    Abstract: Learning a natural language interface for database tables is a challenging task that involves deep language understanding and multi-step reasoning. The task is often approached by mapping natural language queries to logical forms or programs that provide the desired response when executed on the database. To our knowledge, this paper presents the first weakly supervised, end-to-end neural network… ▽ More

    Submitted 2 March, 2017; v1 submitted 27 November, 2016; originally announced November 2016.

    Comments: Published as a conference paper at ICLR 2017

  5. arXiv:1611.04558  [pdf, other

    cs.CL cs.AI

    Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

    Authors: Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean

    Abstract: We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no change in the model architecture from our base system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. The rest of the model, which includes encoder, decoder and attention, rem… ▽ More

    Submitted 21 August, 2017; v1 submitted 14 November, 2016; originally announced November 2016.

  6. arXiv:1611.02683  [pdf, other

    cs.CL cs.LG cs.NE

    Unsupervised Pretraining for Sequence to Sequence Learning

    Authors: Prajit Ramachandran, Peter J. Liu, Quoc V. Le

    Abstract: This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models. In our method, the weights of the encoder and decoder of a seq2seq model are initialized with the pretrained weights of two language models and then fine-tuned with labeled data. We apply this method to challenging benchmarks in machine translation and abstractive summarizati… ▽ More

    Submitted 21 February, 2018; v1 submitted 8 November, 2016; originally announced November 2016.

    Comments: Updated to accepted EMNLP 2017 version

  7. arXiv:1611.01578  [pdf, other

    cs.LG cs.AI cs.NE

    Neural Architecture Search with Reinforcement Learning

    Authors: Barret Zoph, Quoc V. Le

    Abstract: Neural networks are powerful and flexible models that work well for many difficult learning tasks in image, speech and natural language understanding. Despite their success, neural networks are still hard to design. In this paper, we use a recurrent network to generate the model descriptions of neural networks and train this RNN with reinforcement learning to maximize the expected accuracy of the… ▽ More

    Submitted 15 February, 2017; v1 submitted 4 November, 2016; originally announced November 2016.

  8. arXiv:1609.09106  [pdf, other

    cs.LG

    HyperNetworks

    Authors: David Ha, Andrew Dai, Quoc V. Le

    Abstract: This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network. Hypernetworks provide an abstraction that is similar to what is found in nature: the relationship between a genotype - the hypernetwork - and a phenotype - the main network. Though they are also reminiscent of HyperNEAT in evolution, our hypernetworks are… ▽ More

    Submitted 1 December, 2016; v1 submitted 27 September, 2016; originally announced September 2016.

  9. arXiv:1609.08144  [pdf, other

    cs.CL cs.AI cs.LG

    Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

    Authors: Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith , et al. (6 additional authors not shown)

    Abstract: Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NM… ▽ More

    Submitted 8 October, 2016; v1 submitted 26 September, 2016; originally announced September 2016.

  10. arXiv:1511.06807  [pdf, other

    stat.ML cs.LG

    Adding Gradient Noise Improves Learning for Very Deep Networks

    Authors: Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens

    Abstract: Deep feedforward and recurrent networks have achieved impressive results in many perception and language processing applications. This success is partially attributed to architectural innovations such as convolutional and long short-term memory networks. The main motivation for these architectural innovations is that they capture better domain knowledge, and importantly are easier to optimize than… ▽ More

    Submitted 20 November, 2015; originally announced November 2015.

  11. arXiv:1511.06114  [pdf, ps, other

    cs.LG cs.CL stat.ML

    Multi-task Sequence to Sequence Learning

    Authors: Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser

    Abstract: Sequence to sequence learning has recently emerged as a new paradigm in supervised learning. To date, most of its applications focused on only one task and not much work explored this framework for multiple tasks. This paper examines three multi-task learning (MTL) settings for sequence to sequence models: (a) the oneto-many setting - where the encoder is shared between several tasks such as machi… ▽ More

    Submitted 1 March, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: 10 pages, 4 figures, ICLR 2016 camera-ready, added parsing SOTA results

  12. arXiv:1511.04868  [pdf, other

    cs.LG cs.CL cs.NE

    A Neural Transducer

    Authors: Navdeep Jaitly, David Sussillo, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, Samy Bengio

    Abstract: Sequence-to-sequence models have achieved impressive results on various tasks. However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences. This is because they generate an output sequence conditioned on an entire input sequence. In this paper, we present a Neural Transducer that can make i… ▽ More

    Submitted 4 August, 2016; v1 submitted 16 November, 2015; originally announced November 2015.

  13. arXiv:1511.04834  [pdf, other

    cs.LG cs.CL stat.ML

    Neural Programmer: Inducing Latent Programs with Gradient Descent

    Authors: Arvind Neelakantan, Quoc V. Le, Ilya Sutskever

    Abstract: Deep neural networks have achieved impressive supervised classification performance in many tasks including image recognition, speech recognition, and sequence to sequence learning. However, this success has not been translated to applications like question answering that may involve complex arithmetic and logic reasoning. A major limitation of these models is in their inability to learn even simp… ▽ More

    Submitted 4 August, 2016; v1 submitted 16 November, 2015; originally announced November 2015.

    Comments: Accepted as a conference paper at ICLR 2015

  14. arXiv:1511.01432  [pdf, ps, other

    cs.LG cs.CL

    Semi-supervised Sequence Learning

    Authors: Andrew M. Dai, Quoc V. Le

    Abstract: We present two approaches that use unlabeled data to improve sequence learning with recurrent networks. The first approach is to predict what comes next in a sequence, which is a conventional language model in natural language processing. The second approach is to use a sequence autoencoder, which reads the input sequence into a vector and predicts the input sequence again. These two algorithms ca… ▽ More

    Submitted 4 November, 2015; originally announced November 2015.

  15. arXiv:1508.01211  [pdf, other

    cs.CL cs.LG cs.NE stat.ML

    Listen, Attend and Spell

    Authors: William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals

    Abstract: We present Listen, Attend and Spell (LAS), a neural network that learns to transcribe speech utterances to characters. Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has two components: a listener and a speller. The listener is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The speller is an atte… ▽ More

    Submitted 19 August, 2015; v1 submitted 5 August, 2015; originally announced August 2015.

  16. arXiv:1507.07998  [pdf, other

    cs.CL cs.AI cs.LG

    Document Embedding with Paragraph Vectors

    Authors: Andrew M. Dai, Christopher Olah, Quoc V. Le

    Abstract: Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts. In their work, the authors showed that the method can learn an embedding of movie review texts which can be leveraged for sentiment analysis. That proof of concept, while encouraging, was rather narrow. Here we consider tasks other than sentiment analysis, provide a… ▽ More

    Submitted 28 July, 2015; originally announced July 2015.

    Comments: 8 pages

  17. arXiv:1504.00941  [pdf, ps, other

    cs.NE cs.LG

    A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

    Authors: Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton

    Abstract: Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix… ▽ More

    Submitted 7 April, 2015; v1 submitted 3 April, 2015; originally announced April 2015.

  18. arXiv:1410.8206  [pdf, ps, other

    cs.CL cs.LG cs.NE

    Addressing the Rare Word Problem in Neural Machine Translation

    Authors: Minh-Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba

    Abstract: Neural Machine Translation (NMT) is a new approach to machine translation that has shown promising results that are comparable to traditional approaches. A significant weakness in conventional NMT systems is their inability to correctly translate very rare words: end-to-end NMTs tend to have relatively small vocabularies with a single unk symbol that represents every possible out-of-vocabulary (OO… ▽ More

    Submitted 30 May, 2015; v1 submitted 29 October, 2014; originally announced October 2014.

    Comments: ACL 2015 camera-ready version

  19. arXiv:1409.3215  [pdf, ps, other

    cs.CL cs.LG

    Sequence to Sequence Learning with Neural Networks

    Authors: Ilya Sutskever, Oriol Vinyals, Quoc V. Le

    Abstract: Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a mu… ▽ More

    Submitted 14 December, 2014; v1 submitted 10 September, 2014; originally announced September 2014.

    Comments: 9 pages

  20. arXiv:1408.3060  [pdf, other

    cs.LG stat.ML

    Fastfood: Approximate Kernel Expansions in Loglinear Time

    Authors: Quoc Viet Le, Tamas Sarlos, Alexander Johannes Smola

    Abstract: Despite their successes, what makes kernel methods difficult to use in many large scale problems is the fact that storing and computing the decision function is typically expensive, especially at prediction time. In this paper, we overcome this difficulty by proposing Fastfood, an approximation that accelerates such computation significantly. Key to Fastfood is the observation that Hadamard matric… ▽ More

    Submitted 13 August, 2014; originally announced August 2014.

  21. arXiv:1405.4053  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Distributed Representations of Sentences and Documents

    Authors: Quoc V. Le, Tomas Mikolov

    Abstract: Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. For example, "powerful," "strong" and "Paris" are equal… ▽ More

    Submitted 22 May, 2014; v1 submitted 16 May, 2014; originally announced May 2014.

  22. arXiv:1309.4168  [pdf, other

    cs.CL

    Exploiting Similarities among Languages for Machine Translation

    Authors: Tomas Mikolov, Quoc V. Le, Ilya Sutskever

    Abstract: Dictionaries and phrase tables are the basis of modern statistical machine translation systems. This paper develops a method that can automate the process of generating and extending dictionaries and phrase tables. Our method can translate missing word and phrase entries by learning language structures based on large monolingual data and mapping between languages from small bilingual data. It uses… ▽ More

    Submitted 16 September, 2013; originally announced September 2013.

  23. arXiv:1112.6209  [pdf, other

    cs.LG

    Building high-level features using large scale unsupervised learning

    Authors: Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, Andrew Y. Ng

    Abstract: We consider the problem of building high-level, class-specific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images? To answer this, we train a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization on a large dataset of images (the model has 1 billion connections, the dataset has 10 milli… ▽ More

    Submitted 12 July, 2012; v1 submitted 28 December, 2011; originally announced December 2011.

  24. arXiv:1108.1634  [pdf

    physics.acc-ph

    Status of high temperature superconductor based magnets and the conductors they depend upon

    Authors: J. Schwartz, F. Hunte, W. K. Chan, X. F. Gou, X. T. Liu, M. Phillips, Q. V. Le, G. Naderi, M. Turenne, L. Ye

    Abstract: This paper reviews the status of high temperature superconductors for high field magnets for future devices such as a high energy LHC or a muon collider. Some of the primary challenges faced for the implementation of systems are discussed. Two conductor technologies, Bi$_2$Sr$_2$CaCu$_2$O$_{8+x}$ and YBa$_2$Cu$_3$O$_{7-δ}$, have emerged as high field conductor options, but their relative advantage… ▽ More

    Submitted 8 August, 2011; originally announced August 2011.

    Comments: 11 pages, contribution to the EuCARD-AccNet-EuroLumi Workshop: The High-Energy Large Hadron Collider, Malta, 14 -- 16 Oct 2010; CERN Yellow Report CERN-2011-003, pp. 59-69

  25. arXiv:0806.2890  [pdf, other

    cs.CV cs.LG

    Learning Graph Matching

    Authors: Tiberio S. Caetano, Julian J. McAuley, Li Cheng, Quoc V. Le, Alex J. Smola

    Abstract: As a fundamental problem in pattern recognition, graph matching has applications in a variety of fields, from computer vision to computational biology. In graph matching, patterns are modeled as graphs and pattern recognition amounts to finding a correspondence between the nodes of different graphs. Many formulations of this problem can be cast in general as a quadratic assignment problem, where… ▽ More

    Submitted 17 June, 2008; originally announced June 2008.

    Comments: 10 pages, 4 figures