Zum Hauptinhalt springen

Showing 1–4 of 4 results for author: Pruthi, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  2. arXiv:2305.18373  [pdf, other

    cs.CV cs.CL

    KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models

    Authors: Zhiwei Jia, Pradyumna Narayana, Arjun R. Akula, Garima Pruthi, Hao Su, Sugato Basu, Varun Jampani

    Abstract: Image ad understanding is a crucial task with wide real-world applications. Although highly challenging with the involvement of diverse atypical scenes, real-world entities, and reasoning over scene-texts, how to interpret image ads is relatively under-explored, especially in the era of foundational vision-language models (VLMs) featuring impressive generalizability and adaptability. In this paper… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  3. arXiv:2212.09898  [pdf, other

    cs.CV

    MetaCLUE: Towards Comprehensive Visual Metaphors Research

    Authors: Arjun R. Akula, Brendan Driscoll, Pradyumna Narayana, Soravit Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi, Sugato Basu, Leonidas Guibas, William T. Freeman, Yuanzhen Li, Varun Jampani

    Abstract: Creativity is an indispensable part of human cognition and also an inherent part of how we make sense of the world. Metaphorical abstraction is fundamental in communicating creative ideas through nuanced relationships between abstract concepts such as feelings. While computer vision benchmarks and approaches predominantly focus on understanding and generating literal interpretations of images, met… ▽ More

    Submitted 2 June, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Accepted in CVPR 2023. Project page: https://metaclue.github.io/ , Video summary: https://youtu.be/V3TmeNETL-o

  4. arXiv:2002.08484  [pdf, other

    cs.LG stat.ML

    Estimating Training Data Influence by Tracing Gradient Descent

    Authors: Garima Pruthi, Frederick Liu, Mukund Sundararajan, Satyen Kale

    Abstract: We introduce a method called TracIn that computes the influence of a training example on a prediction made by the model. The idea is to trace how the loss on the test point changes during the training process whenever the training example of interest was utilized. We provide a scalable implementation of TracIn via: (a) a first-order gradient approximation to the exact computation, (b) saved checkp… ▽ More

    Submitted 14 November, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: NeurIPS 2020