Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: Zhao, H H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.14974  [pdf, other

    cs.CV cs.AI cs.CL

    LOVA3: Learning to Visual Question Answering, Asking and Assessment

    Authors: Henry Hengyuan Zhao, Pan Zhou, Difei Gao, Mike Zheng Shou

    Abstract: Question answering, asking, and assessment are three innate human traits crucial for understanding the world and acquiring knowledge. By enhancing these capabilities, humans can more effectively utilize data, leading to better comprehension and learning outcomes. However, current Multimodal Large Language Models (MLLMs) primarily focus on question answering, often neglecting the full potential of… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: The code is available at https://github.com/showlab/LOVA3

  2. arXiv:2312.06731  [pdf, other

    cs.CV cs.AI

    Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator

    Authors: Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou

    Abstract: Multimodal Large Language Models (MLLMs) demonstrate exceptional problem-solving capabilities, but few research studies aim to gauge the ability to generate visual instruction tuning data. This paper proposes to explore the potential of empowering MLLMs to generate data independently without relying on GPT-4. We introduce Genixer, a comprehensive data generation pipeline consisting of four key ste… ▽ More

    Submitted 27 August, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted by ECCV 2024

  3. arXiv:2309.08513  [pdf, other

    cs.CV cs.AI

    SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels

    Authors: Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou

    Abstract: Pre-trained vision transformers have strong representation benefits to various downstream tasks. Recently, many parameter-efficient fine-tuning (PEFT) methods have been proposed, and their experiments demonstrate that tuning only 1\% extra parameters could surpass full fine-tuning in low-data resource scenarios. However, these methods overlook the task-specific information when fine-tuning diverse… ▽ More

    Submitted 29 April, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: This work has been accepted by IJCV