Zum Hauptinhalt springen

Showing 1–2 of 2 results for author: Zhu, W Y

.
  1. arXiv:2408.04102  [pdf, other

    cs.CV cs.AI

    ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling

    Authors: William Y. Zhu, Keren Ye, Junjie Ke, Jiahui Yu, Leonidas Guibas, Peyman Milanfar, Feng Yang

    Abstract: Recognizing and disentangling visual attributes from objects is a foundation to many computer vision applications. While large vision language representations like CLIP had largely resolved the task of zero-shot object recognition, zero-shot visual attribute recognition remains a challenge because CLIP's contrastively-learned vision-language representation cannot effectively capture object-attribu… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted at ECCV 2024

  2. arXiv:2211.15402  [pdf, other

    cs.CV cs.AI

    Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation

    Authors: Jiangyong Huang, William Yicheng Zhu, Baoxiong Jia, Zan Wang, Xiaojian Ma, Qing Li, Siyuan Huang

    Abstract: Current computer vision models, unlike the human visual system, cannot yet achieve general-purpose visual understanding. Existing efforts to create a general vision model are limited in the scope of assessed tasks and offer no overarching framework to perform them holistically. We present a new comprehensive benchmark, General-purpose Visual Understanding Evaluation (G-VUE), covering the full spec… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.