Zum Hauptinhalt springen

Showing 1–11 of 11 results for author: Bolya, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.05613  [pdf, other

    cs.CV

    Window Attention is Bugged: How not to Interpolate Position Embeddings

    Authors: Daniel Bolya, Chaitanya Ryali, Judy Hoffman, Christoph Feichtenhofer

    Abstract: Window attention, position embeddings, and high resolution finetuning are core concepts in the modern transformer era of computer vision. However, we find that naively combining these near ubiquitous components can have a detrimental effect on performance. The issue is simple: interpolating position embeddings while using window attention is wrong. We study two state-of-the-art methods that have t… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: Preprint. Code release will be coming in the future

  2. arXiv:2306.00989  [pdf, other

    cs.CV cs.LG

    Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

    Authors: Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer

    Abstract: Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance. While these components lead to effective accuracies and attractive FLOP counts, the added complexity actually makes these transformers slower than their vanilla ViT counterparts. In this paper, we argue that this additional bulk is unnecessary. By pretraini… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: ICML 2023 Oral version. Code+Models: https://github.com/facebookresearch/hiera

  3. arXiv:2305.03053  [pdf, other

    cs.CV cs.LG

    ZipIt! Merging Models from Different Tasks without Training

    Authors: George Stoica, Daniel Bolya, Jakob Bjorner, Pratik Ramesh, Taylor Hearn, Judy Hoffman

    Abstract: Typical deep visual recognition models are capable of performing the one task they were trained on. In this paper, we tackle the extremely difficult problem of combining distinct models with different initializations, each solving a separate task, into one multi-task model without any additional training. Prior work in model merging permutes one model to the space of the other then averages them t… ▽ More

    Submitted 12 March, 2024; v1 submitted 4 May, 2023; originally announced May 2023.

  4. arXiv:2303.17604  [pdf, other

    cs.CV

    Token Merging for Fast Stable Diffusion

    Authors: Daniel Bolya, Judy Hoffman

    Abstract: The landscape of image generation has been forever changed by open vocabulary diffusion models. However, at their core these models use transformers, which makes generation slow. Better implementations to increase the throughput of these transformers have emerged, but they still evaluate the entire model. In this paper, we instead speed up diffusion models by exploiting natural redundancy in gener… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: Check out the code at https://github.com/dbolya/tomesd

  5. arXiv:2210.09461  [pdf, other

    cs.CV

    Token Merging: Your ViT But Faster

    Authors: Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Christoph Feichtenhofer, Judy Hoffman

    Abstract: We introduce Token Merging (ToMe), a simple method to increase the throughput of existing ViT models without needing to train. ToMe gradually combines similar tokens in a transformer using a general and light-weight matching algorithm that is as fast as pruning while being more accurate. Off-the-shelf, ToMe can 2x the throughput of state-of-the-art ViT-L @ 512 and ViT-H @ 518 models on images and… ▽ More

    Submitted 1 March, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: Accepted ICLR 2023 Oral (top 5%) [final v2]. This version includes stable diffusion experiments. See code at https://github.com/facebookresearch/ToMe

  6. arXiv:2209.07484  [pdf, other

    cs.CV

    Hydra Attention: Efficient Attention with Many Heads

    Authors: Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Judy Hoffman

    Abstract: While transformers have begun to dominate many tasks in vision, applying them to large images is still computationally difficult. A large reason for this is that self-attention scales quadratically with the number of tokens, which in turn, scales quadratically with the image size. On larger images (e.g., 1080p), over 60% of the total computation in the network is spent solely on creating and apply… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: Accepted CADL 2022 (ECCV Workshop)

  7. arXiv:2111.06977  [pdf, other

    cs.LG cs.CV

    Scalable Diverse Model Selection for Accessible Transfer Learning

    Authors: Daniel Bolya, Rohit Mittapalli, Judy Hoffman

    Abstract: With the preponderance of pretrained deep learning models available off-the-shelf from model banks today, finding the best weights to fine-tune to your use-case can be a daunting task. Several methods have recently been proposed to find good models for transfer learning, but they either don't scale well to large model banks or don't perform well on the diversity of off-the-shelf models. Ideally th… ▽ More

    Submitted 10 January, 2022; v1 submitted 12 November, 2021; originally announced November 2021.

    Comments: NeurIPS 2021 camera ready v2 + Appendix. Added a missing citation and fixed Table 4 header. Table 1 is still purple. No, I do not know why

  8. arXiv:2008.11300  [pdf, other

    cs.CV cs.LG

    Likelihood Landscapes: A Unifying Principle Behind Many Adversarial Defenses

    Authors: Fu Lin, Rohit Mittapalli, Prithvijit Chattopadhyay, Daniel Bolya, Judy Hoffman

    Abstract: Convolutional Neural Networks have been shown to be vulnerable to adversarial examples, which are known to locate in subspaces close to where normal data lies but are not naturally occurring and of low probability. In this work, we investigate the potential effect defense techniques have on the geometry of the likelihood landscape - likelihood of the input images under the trained model. We first… ▽ More

    Submitted 25 August, 2020; originally announced August 2020.

    Comments: ECCV 2020 Workshop on Adversarial Robustness in the Real World

  9. arXiv:2008.08115  [pdf, other

    cs.CV

    TIDE: A General Toolbox for Identifying Object Detection Errors

    Authors: Daniel Bolya, Sean Foley, James Hays, Judy Hoffman

    Abstract: We introduce TIDE, a framework and associated toolbox for analyzing the sources of error in object detection and instance segmentation algorithms. Importantly, our framework is applicable across datasets and can be applied directly to output prediction files without required knowledge of the underlying prediction system. Thus, our framework can be used as a drop-in replacement for the standard mAP… ▽ More

    Submitted 31 August, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

    Comments: Updated LVIS results with the v1.0.1 error calculation

  10. arXiv:1912.06218  [pdf, other

    cs.CV cs.LG eess.IV

    YOLACT++: Better Real-time Instance Segmentation

    Authors: Daniel Bolya, Chong Zhou, Fanyi Xiao, Yong Jae Lee

    Abstract: We present a simple, fully-convolutional model for real-time (>30 fps) instance segmentation that achieves competitive results on MS COCO evaluated on a single Titan Xp, which is significantly faster than any previous state-of-the-art approach. Moreover, we obtain this result after training on only one GPU. We accomplish this by breaking instance segmentation into two parallel subtasks: (1) genera… ▽ More

    Submitted 23 September, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

    Comments: Journal extension of our previous conference paper arXiv:1904.02689

  11. arXiv:1904.02689  [pdf, other

    cs.CV

    YOLACT: Real-time Instance Segmentation

    Authors: Daniel Bolya, Chong Zhou, Fanyi Xiao, Yong Jae Lee

    Abstract: We present a simple, fully-convolutional model for real-time instance segmentation that achieves 29.8 mAP on MS COCO at 33.5 fps evaluated on a single Titan Xp, which is significantly faster than any previous competitive approach. Moreover, we obtain this result after training on only one GPU. We accomplish this by breaking instance segmentation into two parallel subtasks: (1) generating a set of… ▽ More

    Submitted 24 October, 2019; v1 submitted 4 April, 2019; originally announced April 2019.

    Comments: Updated for ICCV 2019 and added appendix