Admitting Ignorance Helps the Video Question Answering Models to Answer

Li, Haopeng; Drummond, Tom; Gong, Mingming; Bennamoun, Mohammed; Ke, Qiuhong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.08771 (cs)

[Submitted on 15 Jan 2025]

Title:Admitting Ignorance Helps the Video Question Answering Models to Answer

Authors:Haopeng Li, Tom Drummond, Mingming Gong, Mohammed Bennamoun, Qiuhong Ke

View PDF HTML (experimental)

Abstract:Significant progress has been made in the field of video question answering (VideoQA) thanks to deep learning and large-scale pretraining. Despite the presence of sophisticated model structures and powerful video-text foundation models, most existing methods focus solely on maximizing the correlation between answers and video-question pairs during training. We argue that these models often establish shortcuts, resulting in spurious correlations between questions and answers, especially when the alignment between video and text data is suboptimal. To address these spurious correlations, we propose a novel training framework in which the model is compelled to acknowledge its ignorance when presented with an intervened question, rather than making guesses solely based on superficial question-answer correlations. We introduce methodologies for intervening in questions, utilizing techniques such as displacement and perturbation, and design frameworks for the model to admit its lack of knowledge in both multi-choice VideoQA and open-ended settings. In practice, we integrate a state-of-the-art model into our framework to validate its effectiveness. The results clearly demonstrate that our framework can significantly enhance the performance of VideoQA models with minimal structural modifications.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.08771 [cs.CV]
	(or arXiv:2501.08771v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.08771

Submission history

From: Haopeng Li [view email]
[v1] Wed, 15 Jan 2025 12:44:52 UTC (834 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Admitting Ignorance Helps the Video Question Answering Models to Answer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Admitting Ignorance Helps the Video Question Answering Models to Answer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators