Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework

Yang, Yuming; Zhong, Jiang; Jin, Li; Huang, Jingwang; Gao, Jingpeng; Liu, Qing; Bai, Yang; Zhang, Jingyuan; Jiang, Rui; Wei, Kaiwen

Computer Science > Artificial Intelligence

arXiv:2502.14864 (cs)

[Submitted on 20 Feb 2025]

Title:Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework

Authors:Yuming Yang, Jiang Zhong, Li Jin, Jingwang Huang, Jingpeng Gao, Qing Liu, Yang Bai, Jingyuan Zhang, Rui Jiang, Kaiwen Wei

View PDF HTML (experimental)

Abstract:Multimodal Retrieval-Augmented Generation (MRAG) enhances reasoning capabilities by integrating external knowledge. However, existing benchmarks primarily focus on simple image-text interactions, overlooking complex visual formats like charts that are prevalent in real-world applications. In this work, we introduce a novel task, Chart-based MRAG, to address this limitation. To semi-automatically generate high-quality evaluation samples, we propose CHARt-based document question-answering GEneration (CHARGE), a framework that produces evaluation data through structured keypoint extraction, crossmodal verification, and keypoint-based generation. By combining CHARGE with expert validation, we construct Chart-MRAG Bench, a comprehensive benchmark for chart-based MRAG evaluation, featuring 4,738 question-answering pairs across 8 domains from real-world documents. Our evaluation reveals three critical limitations in current approaches: (1) unified multimodal embedding retrieval methods struggles in chart-based scenarios, (2) even with ground-truth retrieval, state-of-the-art MLLMs achieve only 58.19% Correctness and 73.87% Coverage scores, and (3) MLLMs demonstrate consistent text-over-visual modality bias during Chart-based MRAG reasoning. The CHARGE and Chart-MRAG Bench are released at this https URL.

Subjects:	Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2502.14864 [cs.AI]
	(or arXiv:2502.14864v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2502.14864

Submission history

From: YuMing Yang [view email]
[v1] Thu, 20 Feb 2025 18:59:42 UTC (6,786 KB)

Computer Science > Artificial Intelligence

Title:Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators