Zum Hauptinhalt springen

Showing 1–1 of 1 results for author: Jhalani, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09994  [pdf, other

    cs.CL

    Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models

    Authors: Manas Jhalani, Annervaz K M, Pushpak Bhattacharyya

    Abstract: In the realm of multimodal tasks, Visual Question Answering (VQA) plays a crucial role by addressing natural language questions grounded in visual content. Knowledge-Based Visual Question Answering (KBVQA) advances this concept by adding external knowledge along with images to respond to questions. We introduce an approach for KBVQA, augmenting the existing vision-language transformer encoder-deco… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 16 pages, 12 figures