Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering

Su, Hung-Ting; Niu, Yulei; Lin, Xudong; Hsu, Winston H.; Chang, Shih-Fu

Computer Science > Computation and Language

arXiv:2304.03754 (cs)

[Submitted on 7 Apr 2023]

Title:Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering

Authors:Hung-Ting Su, Yulei Niu, Xudong Lin, Winston H. Hsu, Shih-Fu Chang

View PDF

Abstract:Causal Video Question Answering (CVidQA) queries not only association or temporal relations but also causal relations in a video. Existing question synthesis methods pre-trained question generation (QG) systems on reading comprehension datasets with text descriptions as inputs. However, QG models only learn to ask association questions (e.g., ``what is someone doing...'') and result in inferior performance due to the poor transfer of association knowledge to CVidQA, which focuses on causal questions like ``why is someone doing ...''. Observing this, we proposed to exploit causal knowledge to generate question-answer pairs, and proposed a novel framework, Causal Knowledge Extraction from Language Models (CaKE-LM), leveraging causal commonsense knowledge from language models to tackle CVidQA. To extract knowledge from LMs, CaKE-LM generates causal questions containing two events with one triggering another (e.g., ``score a goal'' triggers ``soccer player kicking ball'') by prompting LM with the action (soccer player kicking ball) to retrieve the intention (to score a goal). CaKE-LM significantly outperforms conventional methods by 4% to 6% of zero-shot CVidQA accuracy on NExT-QA and Causal-VidQA datasets. We also conduct comprehensive analyses and provide key findings for future research.

Comments:	CVPR 2023 Workshop L3D-IVU
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2304.03754 [cs.CL]
	(or arXiv:2304.03754v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2304.03754

Submission history

From: Hung-Ting Su [view email]
[v1] Fri, 7 Apr 2023 17:45:49 UTC (750 KB)

Computer Science > Computation and Language

Title:Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators