Integrating Query-aware Segmentation and Cross-Attention for Robust VQA

Choi, Wonjun; Lee, Sangbeom; Lee, Seungyeon; Jung, Heechul; Lee, Dong-Gyu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.12055 (cs)

[Submitted on 9 Jul 2024]

Title:Integrating Query-aware Segmentation and Cross-Attention for Robust VQA

Authors:Wonjun Choi, Sangbeom Lee, Seungyeon Lee, Heechul Jung, Dong-Gyu Lee

View PDF HTML (experimental)

Abstract:This paper introduces a method for VizWiz-VQA using LVLM with trainable cross-attention and LoRA finetuning. We train the model with the following conditions: 1) Training with original images. 2) Training with enhanced images using CLIPSeg to highlight or contrast the original image. 3) Training with integrating the output features of Vision Transformer (ViT) and CLIPSeg features of the original images. Then, we ensemble the results based on Levenshtein distance to enhance the prediction of the final answer. In the experiments, we demonstrate and analyze the proposed method's effectiveness.

Comments:	CVPR Workshop accepted, Vizwiz Grand Challenge(VQA) 3rd Prize, this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.12055 [cs.CV]
	(or arXiv:2407.12055v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.12055

Submission history

From: Seungyeon Lee [view email]
[v1] Tue, 9 Jul 2024 04:48:44 UTC (10,221 KB)

Full-text links:

Access Paper:

view license

Current browse context:

< prev | next >

new | recent | 2024-07

Change to browse by:

cs.CV

References & Citations

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Integrating Query-aware Segmentation and Cross-Attention for Robust VQA

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Integrating Query-aware Segmentation and Cross-Attention for Robust VQA

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators