MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation

Park, Chanhee; Moon, Hyeonseok; Park, Chanjun; Lim, Heuiseok

Computer Science > Computation and Language

arXiv:2504.17137 (cs)

[Submitted on 23 Apr 2025]

Title:MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation

Authors:Chanhee Park, Hyeonseok Moon, Chanjun Park, Heuiseok Lim

View PDF HTML (experimental)

Abstract:Retrieval-Augmented Generation (RAG) has gained prominence as an effective method for enhancing the generative capabilities of Large Language Models (LLMs) through the incorporation of external knowledge. However, the evaluation of RAG systems remains a challenge, due to the intricate interplay between retrieval and generation components. This limitation has resulted in a scarcity of benchmarks that facilitate a detailed, component-specific assessment. In this work, we present MIRAGE, a Question Answering dataset specifically designed for RAG evaluation. MIRAGE consists of 7,560 curated instances mapped to a retrieval pool of 37,800 entries, enabling an efficient and precise evaluation of both retrieval and generation tasks. We also introduce novel evaluation metrics aimed at measuring RAG adaptability, encompassing dimensions such as noise vulnerability, context acceptability, context insensitivity, and context misinterpretation. Through comprehensive experiments across various retriever-LLM configurations, we provide new insights into the optimal alignment of model pairs and the nuanced dynamics within RAG systems. The dataset and evaluation code are publicly available, allowing for seamless integration and customization in diverse research settings\footnote{The MIRAGE code and data are available at this https URL.

Comments:	Accepted to NAACL2025 Findings
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2504.17137 [cs.CL]
	(or arXiv:2504.17137v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2504.17137

Submission history

From: Chanhee Park [view email]
[v1] Wed, 23 Apr 2025 23:05:46 UTC (1,451 KB)

Computer Science > Computation and Language

Title:MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators