Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering

Thai, Triet Minh; Luu, Son T.

doi:10.15625/1813-9663/18155

Computer Science > Computer Vision and Pattern Recognition

arXiv:2303.12671 (cs)

[Submitted on 22 Mar 2023 (v1), last revised 3 Sep 2023 (this version, v2)]

Title:Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering

Authors:Triet Minh Thai, Son T. Luu

View PDF

Abstract:Visual Question Answering (VQA) is a task that requires computers to give correct answers for the input questions based on the images. This task can be solved by humans with ease but is a challenge for computers. The VLSP2022-EVJVQA shared task carries the Visual Question Answering task in the multilingual domain on a newly released dataset: UIT-EVJVQA, in which the questions and answers are written in three different languages: English, Vietnamese and Japanese. We approached the challenge as a sequence-to-sequence learning task, in which we integrated hints from pre-trained state-of-the-art VQA models and image features with Convolutional Sequence-to-Sequence network to generate the desired answers. Our results obtained up to 0.3442 by F1 score on the public test set, 0.4210 on the private test set, and placed 3rd in the competition.

Comments:	VLSP2022-EVJVQA
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2303.12671 [cs.CV]
	(or arXiv:2303.12671v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2303.12671
Related DOI:	https://doi.org/10.15625/1813-9663/18155

Submission history

From: Triet Minh Thai [view email]
[v1] Wed, 22 Mar 2023 15:49:33 UTC (3,324 KB)
[v2] Sun, 3 Sep 2023 14:50:34 UTC (2,336 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators