Zum Hauptinhalt springen

Showing 1–10 of 10 results for author: Duerig, T

.
  1. arXiv:2403.02626  [pdf, other

    cs.CV cs.LG

    Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

    Authors: Imad Eddine Toubal, Aditya Avinash, Neil Gordon Alldrin, Jan Dlabal, Wenlei Zhou, Enming Luo, Otilia Stretcu, Hao Xiong, Chun-Ta Lu, Howard Zhou, Ranjay Krishna, Ariel Fuxman, Tom Duerig

    Abstract: From content moderation to wildlife conservation, the number of applications that require models to recognize nuanced or subjective visual concepts is growing. Traditionally, developing classifiers for such concepts requires substantial manual effort measured in hours, days, or even months to identify and annotate data needed for training. Even with recently proposed Agile Modeling techniques, whi… ▽ More

    Submitted 19 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2105.12849  [pdf, ps, other

    cs.LG

    CARLS: Cross-platform Asynchronous Representation Learning System

    Authors: Chun-Ta Lu, Yun Zeng, Da-Cheng Juan, Yicheng Fan, Zhe Li, Jan Dlabal, Yi-Ting Chen, Arjun Gopalan, Allan Heydon, Chun-Sung Ferng, Reah Miyara, Ariel Fuxman, Futang Peng, Zhen Li, Tom Duerig, Andrew Tomkins

    Abstract: In this work, we propose CARLS, a novel framework for augmenting the capacity of existing deep learning frameworks by enabling multiple components -- model trainers, knowledge makers and knowledge banks -- to concertedly work together in an asynchronous fashion across hardware platforms. The proposed CARLS is particularly suitable for learning paradigms where model training benefits from additiona… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

  4. arXiv:2102.05918  [pdf, other

    cs.CV cs.CL cs.LG

    Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

    Authors: Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig

    Abstract: Pre-trained representations are becoming crucial for many NLP and perception tasks. While representation learning in NLP has transitioned to training on raw text without human annotations, visual and vision-language representations still rely heavily on curated training datasets that are expensive or require expert knowledge. For vision applications, representations are mostly learned using datase… ▽ More

    Submitted 11 June, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: ICML 2021

    Journal ref: International Conference on Machine Learning 2021

  5. arXiv:2003.03701  [pdf, other

    cs.CV

    Unifying Specialist Image Embedding into Universal Image Embedding

    Authors: Yang Feng, Futang Peng, Xu Zhang, Wei Zhu, Shanfeng Zhang, Howard Zhou, Zhen Li, Tom Duerig, Shih-Fu Chang, Jiebo Luo

    Abstract: Deep image embedding provides a way to measure the semantic similarity of two images. It plays a central role in many applications such as image search, face verification, and zero-shot learning. It is desirable to have a universal deep embedding model applicable to various domains of images. However, existing methods mainly rely on training specialist embedding models each of which is applicable… ▽ More

    Submitted 7 March, 2020; originally announced March 2020.

  6. arXiv:1902.10814  [pdf, other

    cs.CV cs.LG stat.ML

    Graph-RISE: Graph-Regularized Image Semantic Embedding

    Authors: Da-Cheng Juan, Chun-Ta Lu, Zhen Li, Futang Peng, Aleksei Timofeev, Yi-Ting Chen, Yaxi Gao, Tom Duerig, Andrew Tomkins, Sujith Ravi

    Abstract: Learning image representations to capture fine-grained semantics has been a challenging and important task enabling many applications such as image search and clustering. In this paper, we present Graph-Regularized Image Semantic Embedding (Graph-RISE), a large-scale neural graph learning framework that allows us to train embeddings to discriminate an unprecedented O(40M) ultra-fine-grained semant… ▽ More

    Submitted 13 February, 2019; originally announced February 2019.

    Comments: 9 pages, 7 figures

  7. The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale

    Authors: Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, Vittorio Ferrari

    Abstract: We present Open Images V4, a dataset of 9.2M images with unified annotations for image classification, object detection and visual relationship detection. The images have a Creative Commons Attribution license that allows to share and adapt the material, and they have been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an in… ▽ More

    Submitted 21 February, 2020; v1 submitted 2 November, 2018; originally announced November 2018.

    Comments: Accepted to International Journal of Computer Vision, 2020

  8. Influences of Granular Constraints and Surface Effects on the Heterogeneity of Elastic, Superelastic, and Plastic Responses of Polycrystalline Shape Memory Alloys

    Authors: Harshad M. Paranjape, Partha P. Paul, Hemant Sharma, Peter Kenesei, Jun-Sang Park, T. W. Duerig, L. Catherine Brinson, Aaron P. Stebner

    Abstract: Deformation heterogeneities within microstructures of polycrystalline shape memory alloys (SMAs) during superelastic stressing are studied using both experiments and simulations. In situ X-ray diffraction, specifically the far-field high energy diffraction microscopy (ff-HEDM) technique was used to non-destructively measure the grain-averaged statistics of position, crystal orientation, elastic st… ▽ More

    Submitted 13 February, 2017; v1 submitted 26 October, 2016; originally announced October 2016.

  9. Blockout: Dynamic Model Selection for Hierarchical Deep Networks

    Authors: Calvin Murdock, Zhen Li, Howard Zhou, Tom Duerig

    Abstract: Most deep architectures for image classification--even those that are trained to classify a large number of diverse categories--learn shared image representations with a single model. Intuitively, however, categories that are more similar should share more information than those that are very different. While hierarchical deep networks address this problem by learning separate features for subsets… ▽ More

    Submitted 16 December, 2015; originally announced December 2015.

  10. arXiv:1511.06789  [pdf, other

    cs.CV

    The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition

    Authors: Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, Li Fei-Fei

    Abstract: Current approaches for fine-grained recognition do the following: First, recruit experts to annotate a dataset of images, optionally also collecting more structured data in the form of part annotations and bounding boxes. Second, train a model utilizing this data. Toward the goal of solving fine-grained recognition, we introduce an alternative approach, leveraging free, noisy data from the web and… ▽ More

    Submitted 18 October, 2016; v1 submitted 20 November, 2015; originally announced November 2015.

    Comments: ECCV 2016, data is released