Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images

Yuan, Zhenghang; Mou, Lichao; Zhu, Xiao Xiang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2304.03844 (cs)

[Submitted on 7 Apr 2023]

Title:Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images

Authors:Zhenghang Yuan, Lichao Mou, Xiao Xiang Zhu

View PDF

Abstract:Aiming at answering questions based on the content of remotely sensed images, visual question answering for remote sensing data (RSVQA) has attracted much attention nowadays. However, previous works in RSVQA have focused little on the robustness of RSVQA. As we aim to enhance the reliability of RSVQA models, how to learn robust representations against new words and different question templates with the same meaning is the key challenge. With the proposed augmented dataset, we are able to obtain more questions in addition to the original ones with the same meaning. To make better use of this information, in this study, we propose a contrastive learning strategy for training robust RSVQA models against diverse question templates and words. Experimental results demonstrate that the proposed augmented dataset is effective in improving the robustness of the RSVQA model. In addition, the contrastive learning strategy performs well on the low resolution (LR) dataset.

Comments:	This paper was submitted to the JURSE 2023 conference on November 5, 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2304.03844 [cs.CV]
	(or arXiv:2304.03844v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2304.03844

Submission history

From: Zhenghang Yuan [view email]
[v1] Fri, 7 Apr 2023 21:06:58 UTC (1,783 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators