Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Ramakrishnan, Sainandan; Agrawal, Aishwarya; Lee, Stefan

Computer Science > Computer Vision and Pattern Recognition

arXiv:1810.03649 (cs)

[Submitted on 8 Oct 2018 (v1), last revised 8 Nov 2018 (this version, v2)]

Title:Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Authors:Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee

View PDF

Abstract:Modern Visual Question Answering (VQA) models have been shown to rely heavily on superficial correlations between question and answer words learned during training such as overwhelmingly reporting the type of room as kitchen or the sport being played as tennis, irrespective of the image. Most alarmingly, this shortcoming is often not well reflected during evaluation because the same strong priors exist in test distributions; however, a VQA system that fails to ground questions in image content would likely perform poorly in real-world settings. In this work, we present a novel regularization scheme for VQA that reduces this effect. We introduce a question-only model that takes as input the question encoding from the VQA model and must leverage language biases in order to succeed. We then pose training as an adversarial game between the VQA model and this question-only adversary -- discouraging the VQA model from capturing language biases in its question encoding. Further,we leverage this question-only model to estimate the increase in model confidence after considering the image, which we maximize explicitly to encourage visual grounding. Our approach is a model agnostic training procedure and simple to implement. We show empirically that it can improve performance significantly on a bias-sensitive split of the VQA dataset for multiple base models -- achieving state-of-the-art on this task. Further, on standard VQA tasks, our approach shows significantly less drop in accuracy compared to existing bias-reducing VQA models.

Comments:	NIPS 2018. 11 pages ( with references ), 4 figures, 2 tables
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1810.03649 [cs.CV]
	(or arXiv:1810.03649v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1810.03649

Submission history

From: Sainandan Ramakrishnan [view email]
[v1] Mon, 8 Oct 2018 18:29:05 UTC (5,907 KB)
[v2] Thu, 8 Nov 2018 20:51:44 UTC (6,165 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators