Zum Hauptinhalt springen

Showing 1–25 of 25 results for author: Vaswani, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2206.08653  [pdf, other

    cs.LG cs.AI cs.CV

    All Mistakes Are Not Equal: Comprehensive Hierarchy Aware Multi-label Predictions (CHAMP)

    Authors: Ashwin Vaswani, Gaurav Aggarwal, Praneeth Netrapalli, Narayan G Hegde

    Abstract: This paper considers the problem of Hierarchical Multi-Label Classification (HMC), where (i) several labels can be present for each example, and (ii) labels are related via a domain-specific hierarchy tree. Guided by the intuition that all mistakes are not equal, we present Comprehensive Hierarchy Aware Multi-label Predictions (CHAMP), a framework that penalizes a misprediction depending on its se… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

  2. arXiv:2110.12894  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    The Efficiency Misnomer

    Authors: Mostafa Dehghani, Anurag Arnab, Lucas Beyer, Ashish Vaswani, Yi Tay

    Abstract: Model efficiency is a critical aspect of developing and deploying machine learning models. Inference time and latency directly affect the user experience, and some applications have hard requirements. In addition to inference costs, model training also have direct financial and environmental impacts. Although there are numerous well-established metrics (cost indicators) for measuring model efficie… ▽ More

    Submitted 16 March, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

  3. arXiv:2109.10686  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

    Authors: Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

    Abstract: There remain many open questions pertaining to the scaling behaviour of Transformer architectures. These scaling decisions and findings can be critical, as training runs often come with an associated computational cost which have both financial and/or environmental impact. The goal of this paper is to present scaling insights from pretraining and finetuning Transformers. While Kaplan et al. presen… ▽ More

    Submitted 30 January, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

    Comments: ICLR 2022 + Updated Checkpoint Release

  4. arXiv:2104.08710  [pdf, other

    cs.CL

    Simple and Efficient ways to Improve REALM

    Authors: Vidhisha Balachandran, Ashish Vaswani, Yulia Tsvetkov, Niki Parmar

    Abstract: Dense retrieval has been shown to be effective for retrieving relevant documents for Open Domain QA, surpassing popular sparse retrieval methods like BM25. REALM (Guu et al., 2020) is an end-to-end dense retrieval system that relies on MLM based pretraining for improved downstream QA efficiency across multiple datasets. We study the finetuning of REALM on various QA tasks and explore the limits of… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

  5. arXiv:2103.13511  [pdf, other

    cs.LG cs.AI cs.CV

    Addressing catastrophic forgetting for medical domain expansion

    Authors: Sharut Gupta, Praveer Singh, Ken Chang, Liangqiong Qu, Mehak Aggarwal, Nishanth Arun, Ashwin Vaswani, Shruti Raghavan, Vibha Agarwal, Mishka Gidwani, Katharina Hoebel, Jay Patel, Charles Lu, Christopher P. Bridge, Daniel L. Rubin, Jayashree Kalpathy-Cramer

    Abstract: Model brittleness is a key concern when deploying deep learning models in real-world medical settings. A model that has high performance at one institution may suffer a significant decline in performance when tested at other institutions. While pooling datasets from multiple institutions and retraining may provide a straightforward solution, it is often infeasible and may compromise patient privac… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: First three authors contributed equally

  6. arXiv:2103.12731  [pdf, other

    cs.CV

    Scaling Local Self-Attention for Parameter Efficient Visual Backbones

    Authors: Ashish Vaswani, Prajit Ramachandran, Aravind Srinivas, Niki Parmar, Blake Hechtman, Jonathon Shlens

    Abstract: Self-attention has the promise of improving computer vision systems due to parameter-independent scaling of receptive fields and content-dependent interactions, in contrast to parameter-dependent scaling and content-independent interactions of convolutions. Self-attention models have recently been shown to have encouraging improvements on accuracy-parameter trade-offs compared to baseline convolut… ▽ More

    Submitted 7 June, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: CVPR 2021 Oral

  7. arXiv:2101.11605  [pdf, other

    cs.CV cs.AI cs.LG

    Bottleneck Transformers for Visual Recognition

    Authors: Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani

    Abstract: We present BoTNet, a conceptually simple yet powerful backbone architecture that incorporates self-attention for multiple computer vision tasks including image classification, object detection and instance segmentation. By just replacing the spatial convolutions with global self-attention in the final three bottleneck blocks of a ResNet and no other changes, our approach improves upon the baseline… ▽ More

    Submitted 2 August, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

    Comments: Technical Report, 20 pages, 13 figures, 19 tables

  8. arXiv:2011.08096  [pdf, other

    cs.LG

    The unreasonable effectiveness of Batch-Norm statistics in addressing catastrophic forgetting across medical institutions

    Authors: Sharut Gupta, Praveer Singh, Ken Chang, Mehak Aggarwal, Nishanth Arun, Liangqiong Qu, Katharina Hoebel, Jay Patel, Mishka Gidwani, Ashwin Vaswani, Daniel L Rubin, Jayashree Kalpathy-Cramer

    Abstract: Model brittleness is a primary concern when deploying deep learning models in medical settings owing to inter-institution variations, like patient demographics and intra-institution variation, such as multiple scanner types. While simply training on the combined datasets is fraught with data privacy limitations, fine-tuning the model on subsequent institutions after training it on the original ins… ▽ More

    Submitted 16 November, 2020; originally announced November 2020.

    Comments: Accepted as oral presentation in Machine Learning for Health (ML4H) at NeurIPS 2020 - Extended Abstract ; 6 pages and 4 figures

  9. arXiv:2011.07482  [pdf, other

    cs.CV cs.LG eess.IV

    Towards Trainable Saliency Maps in Medical Imaging

    Authors: Mehak Aggarwal, Nishanth Arun, Sharut Gupta, Ashwin Vaswani, Bryan Chen, Matthew Li, Ken Chang, Jay Patel, Katherine Hoebel, Mishka Gidwani, Jayashree Kalpathy-Cramer, Praveer Singh

    Abstract: While success of Deep Learning (DL) in automated diagnosis can be transformative to the medicinal practice especially for people with little or no access to doctors, its widespread acceptability is severely limited by inherent black-box decision making and unsafe failure modes. While saliency methods attempt to tackle this problem in non-medical contexts, their apriori explanations do not transfer… ▽ More

    Submitted 15 November, 2020; originally announced November 2020.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2020 - Extended Abstract

  10. arXiv:2007.10257  [pdf, other

    cs.AI cs.LG

    An Autoencoder Based Approach to Simulate Sports Games

    Authors: Ashwin Vaswani, Rijul Ganguly, Het Shah, Sharan Ranjit S, Shrey Pandit, Samruddhi Bothara

    Abstract: Sports data has become widely available in the recent past. With the improvement of machine learning techniques, there have been attempts to use sports data to analyze not only the outcome of individual games but also to improve insights and strategies. The outbreak of COVID-19 has interrupted sports leagues globally, giving rise to increasing questions and speculations about the outcome of this s… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

  11. arXiv:2003.05997  [pdf, other

    cs.LG eess.AS stat.ML

    Efficient Content-Based Sparse Attention with Routing Transformers

    Authors: Aurko Roy, Mohammad Saffar, Ashish Vaswani, David Grangier

    Abstract: Self-attention has recently been adopted for a wide range of sequence modeling problems. Despite its effectiveness, self-attention suffers from quadratic compute and memory requirements with respect to sequence length. Successful approaches to reduce this complexity focused on attending to local sliding windows or a small set of locations independent of content. Our work proposes to learn dynamic… ▽ More

    Submitted 24 October, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

    Comments: TACL 2020; pre-MIT Press publication version; v5 has a random attention baseline

  12. arXiv:1906.05909  [pdf, other

    cs.CV

    Stand-Alone Self-Attention in Vision Models

    Authors: Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jonathon Shlens

    Abstract: Convolutions are a fundamental building block of modern computer vision systems. Recent approaches have argued for going beyond convolutions in order to capture long-range dependencies. These efforts focus on augmenting convolutional models with content-based interactions, such as self-attention and non-local means, to achieve gains on a number of vision tasks. The natural question that arises is… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

  13. arXiv:1905.12255  [pdf, other

    cs.AI cs.CL

    Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation

    Authors: Vihan Jain, Gabriel Magalhaes, Alexander Ku, Ashish Vaswani, Eugene Ie, Jason Baldridge

    Abstract: Advances in learning and representations have reinvigorated work that connects language to other modalities. A particularly exciting direction is Vision-and-Language Navigation(VLN), in which agents interpret natural language instructions and visual scenes to move through environments and reach goals. Despite recent progress, current research leaves unclear how much of a role language understandin… ▽ More

    Submitted 21 June, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: Accepted at ACL 2019 as long paper

  14. arXiv:1904.09925  [pdf, other

    cs.CV

    Attention Augmented Convolutional Networks

    Authors: Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, Quoc V. Le

    Abstract: Convolutional networks have been the paradigm of choice in many computer vision applications. The convolution operation however has a significant weakness in that it only operates on a local neighborhood, thus missing global information. Self-attention, on the other hand, has emerged as a recent advance to capture long range interactions, but has mostly been applied to sequence modeling and genera… ▽ More

    Submitted 9 September, 2020; v1 submitted 22 April, 2019; originally announced April 2019.

    Comments: ICCV 2019

  15. arXiv:1811.02084  [pdf, other

    cs.LG cs.DC stat.ML

    Mesh-TensorFlow: Deep Learning for Supercomputers

    Authors: Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman

    Abstract: Batch-splitting (data-parallelism) is the dominant distributed Deep Neural Network (DNN) training strategy, due to its universal applicability and its amenability to Single-Program-Multiple-Data (SPMD) programming. However, batch-splitting suffers from problems including the inability to train very large models (due to memory constraints), high latency, and inefficiency at small batch sizes. All o… ▽ More

    Submitted 5 November, 2018; originally announced November 2018.

  16. arXiv:1809.04281  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Music Transformer

    Authors: Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck

    Abstract: Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani et al., 2017), a sequence model based on self-attention, has achieved compelling results in many generation tasks that require maintaining long-range coherence.… ▽ More

    Submitted 12 December, 2018; v1 submitted 12 September, 2018; originally announced September 2018.

    Comments: Improved skewing section and accompanying figures. Previous titles are "An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation" and "Music Transformer"

  17. arXiv:1806.01261  [pdf, other

    cs.LG cs.AI stat.ML

    Relational inductive biases, deep learning, and graph networks

    Authors: Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matt Botvinick, Oriol Vinyals , et al. (2 additional authors not shown)

    Abstract: Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, rema… ▽ More

    Submitted 17 October, 2018; v1 submitted 4 June, 2018; originally announced June 2018.

  18. arXiv:1805.11063  [pdf, other

    cs.LG stat.ML

    Theory and Experiments on Vector Quantized Autoencoders

    Authors: Aurko Roy, Ashish Vaswani, Arvind Neelakantan, Niki Parmar

    Abstract: Deep neural networks with discrete latent variables offer the promise of better symbolic reasoning, and learning abstractions that are more useful to new tasks. There has been a surge in interest in discrete latent variable models, however, despite several recent improvements, the training of discrete latent variable models has remained challenging and their performance has mostly failed to match… ▽ More

    Submitted 20 July, 2018; v1 submitted 28 May, 2018; originally announced May 2018.

  19. arXiv:1803.07416  [pdf, other

    cs.LG cs.CL stat.ML

    Tensor2Tensor for Neural Machine Translation

    Authors: Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit

    Abstract: Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.

    Submitted 16 March, 2018; originally announced March 2018.

    Comments: arXiv admin note: text overlap with arXiv:1706.03762

  20. arXiv:1803.03382  [pdf, other

    cs.LG

    Fast Decoding in Sequence Models using Discrete Latent Variables

    Authors: Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob Uszkoreit, Noam Shazeer

    Abstract: Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and the Transformer attain state-of-the-art results on many tasks. However, they are difficult to parallelize and are thus slow at processing long sequences. RNNs lack parallelism both during training and decoding, while architectures like WaveNet and Transformer are much more parallelizable during training, yet st… ▽ More

    Submitted 7 June, 2018; v1 submitted 8 March, 2018; originally announced March 2018.

    Comments: ICML 2018

  21. arXiv:1803.02155  [pdf, other

    cs.CL

    Self-Attention with Relative Position Representations

    Authors: Peter Shaw, Jakob Uszkoreit, Ashish Vaswani

    Abstract: Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure. Instead, it requires adding representations of absolute positions to its inputs. In this work we… ▽ More

    Submitted 12 April, 2018; v1 submitted 6 March, 2018; originally announced March 2018.

    Comments: NAACL 2018

  22. arXiv:1802.05751  [pdf, other

    cs.CV

    Image Transformer

    Authors: Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran

    Abstract: Image generation has been successfully cast as an autoregressive sequence generation or transformation problem. Recent work has shown that self-attention is an effective way of modeling textual sequences. In this work, we generalize a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood. By… ▽ More

    Submitted 15 June, 2018; v1 submitted 15 February, 2018; originally announced February 2018.

    Comments: Appears in International Conference on Machine Learning, 2018. Code available at https://github.com/tensorflow/tensor2tensor

  23. arXiv:1706.05137  [pdf, other

    cs.LG stat.ML

    One Model To Learn Them All

    Authors: Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit

    Abstract: Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrentl… ▽ More

    Submitted 15 June, 2017; originally announced June 2017.

  24. arXiv:1706.03762  [pdf, other

    cs.CL cs.LG

    Attention Is All You Need

    Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

    Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experi… ▽ More

    Submitted 1 August, 2023; v1 submitted 12 June, 2017; originally announced June 2017.

    Comments: 15 pages, 5 figures

  25. arXiv:1609.09007  [pdf, other

    cs.CL cs.LG

    Unsupervised Neural Hidden Markov Models

    Authors: Ke Tran, Yonatan Bisk, Ashish Vaswani, Daniel Marcu, Kevin Knight

    Abstract: In this work, we present the first results for neuralizing an Unsupervised Hidden Markov Model. We evaluate our approach on tag in- duction. Our approach outperforms existing generative models and is competitive with the state-of-the-art though with a simpler model easily extended to include additional context.

    Submitted 28 September, 2016; originally announced September 2016.

    Comments: accepted at EMNLP 2016, Workshop on Structured Prediction for NLP. Oral presentation