REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

Lin, Yuanze; Xie, Yujia; Chen, Dongdong; Xu, Yichong; Zhu, Chenguang; Yuan, Lu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2206.01201 (cs)

[Submitted on 2 Jun 2022 (v1), last revised 10 Oct 2022 (this version, v2)]

Title:REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

Authors:Yuanze Lin, Yujia Xie, Dongdong Chen, Yichong Xu, Chenguang Zhu, Lu Yuan

View PDF

Abstract:This paper revisits visual representation in knowledge-based visual question answering (VQA) and demonstrates that using regional information in a better way can significantly improve the performance. While visual representation is extensively studied in traditional VQA, it is under-explored in knowledge-based VQA even though these two tasks share the common spirit, i.e., rely on visual input to answer the question. Specifically, we observe that in most state-of-the-art knowledge-based VQA methods: 1) visual features are extracted either from the whole image or in a sliding window manner for retrieving knowledge, and the important relationship within/among object regions is neglected; 2) visual features are not well utilized in the final answering model, which is counter-intuitive to some extent. Based on these observations, we propose a new knowledge-based VQA method REVIVE, which tries to utilize the explicit information of object regions not only in the knowledge retrieval stage but also in the answering model. The key motivation is that object regions and inherent relationship are important for knowledge-based VQA. We perform extensive experiments on the standard OK-VQA dataset and achieve new state-of-the-art performance, i.e., 58.0% accuracy, surpassing previous state-of-the-art method by a large margin (+3.6%). We also conduct detailed analysis and show the necessity of regional information in different framework components for knowledge-based VQA. Code is publicly available at this https URL.

Comments:	Accepted by NeurIPS 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2206.01201 [cs.CV]
	(or arXiv:2206.01201v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2206.01201

Submission history

From: Yuanze Lin [view email]
[v1] Thu, 2 Jun 2022 17:59:56 UTC (6,793 KB)
[v2] Mon, 10 Oct 2022 04:46:32 UTC (12,176 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators