Measuring CLEVRness: Blackbox testing of Visual Reasoning Models

Mouselinos, Spyridon; Michalewski, Henryk; Malinowski, Mateusz

Computer Science > Machine Learning

arXiv:2202.12162 (cs)

[Submitted on 24 Feb 2022 (v1), last revised 28 Feb 2022 (this version, v2)]

Title:Measuring CLEVRness: Blackbox testing of Visual Reasoning Models

Authors:Spyridon Mouselinos, Henryk Michalewski, Mateusz Malinowski

View PDF

Abstract:How can we measure the reasoning capabilities of intelligence systems? Visual question answering provides a convenient framework for testing the model's abilities by interrogating the model through questions about the scene. However, despite scores of various visual QA datasets and architectures, which sometimes yield even a super-human performance, the question of whether those architectures can actually reason remains open to debate. To answer this, we extend the visual question answering framework and propose the following behavioral test in the form of a two-player game. We consider black-box neural models of CLEVR. These models are trained on a diagnostic dataset benchmarking reasoning. Next, we train an adversarial player that re-configures the scene to fool the CLEVR model. We show that CLEVR models, which otherwise could perform at a human level, can easily be fooled by our agent. Our results put in doubt whether data-driven approaches can do reasoning without exploiting the numerous biases that are often present in those datasets. Finally, we also propose a controlled experiment measuring the efficiency of such models to learn and perform reasoning.

Comments:	ICLR 2022
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2202.12162 [cs.LG]
	(or arXiv:2202.12162v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2202.12162

Submission history

From: Spyridon Mouselinos [view email]
[v1] Thu, 24 Feb 2022 15:59:29 UTC (16,075 KB)
[v2] Mon, 28 Feb 2022 14:02:08 UTC (16,075 KB)

Computer Science > Machine Learning

Title:Measuring CLEVRness: Blackbox testing of Visual Reasoning Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Measuring CLEVRness: Blackbox testing of Visual Reasoning Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators