DiagramQG: A Dataset for Generating Concept-Focused Questions from Diagrams

Zhang, Xinyu; Zhang, Lingling; Wu, Yanrui; Huang, Muye; Wu, Wenjun; Li, Bo; Wang, Shaowei; Liu, Jun

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.17771v2 (cs)

[Submitted on 26 Nov 2024 (v1), revised 27 Feb 2025 (this version, v2), latest version 10 Mar 2025 (v3)]

Title:DiagramQG: A Dataset for Generating Concept-Focused Questions from Diagrams

Authors:Xinyu Zhang, Lingling Zhang, Yanrui Wu, Muye Huang, Wenjun Wu, Bo Li, Shaowei Wang, Jun Liu

View PDF HTML (experimental)

Abstract:Visual Question Generation (VQG) has gained significant attention due to its potential in educational applications. However, VQG researches mainly focus on natural images, neglecting diagrams in educational materials used to assess students' conceptual understanding. To address this gap, we introduce DiagramQG, a dataset containing 8,372 diagrams and 19,475 questions across various subjects. DiagramQG introduces concept and target text constraints, guiding the model to generate concept-focused questions for educational purposes. Meanwhile, we present the Hierarchical Knowledge Integration framework for Diagram Question Generation (HKI-DQG) as a strong baseline. This framework obtains multi-scale patches of diagrams and acquires knowledge using a visual language model with frozen parameters. It then integrates knowledge, text constraints and patches to generate concept-focused questions. We evaluate the performance of existing VQG models, open-source and closed-source vision-language models, and HKI-DQG on the DiagramQG dataset. Our HKI-DQG outperform existing methods, demonstrating that it serves as a strong baseline. Furthermore, we apply HKI-DQG to four other VQG datasets of natural images, namely VQG-COCO, K-VQG, OK-VQA and A-OKVQA, achieving state-of-the-art performance. The dataset and code are available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.17771 [cs.CV]
	(or arXiv:2411.17771v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.17771

Submission history

From: Xinyu Zhang [view email]
[v1] Tue, 26 Nov 2024 08:27:50 UTC (9,936 KB)
[v2] Thu, 27 Feb 2025 15:16:17 UTC (20,232 KB)
[v3] Mon, 10 Mar 2025 07:48:31 UTC (22,578 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DiagramQG: A Dataset for Generating Concept-Focused Questions from Diagrams

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DiagramQG: A Dataset for Generating Concept-Focused Questions from Diagrams

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators