Computer Science > Computation and Language
[Submitted on 30 Apr 2024 (v1), last revised 23 Oct 2024 (this version, v7)]
Title:GRAMMAR: Grounded and Modular Methodology for Assessment of Closed-Domain Retrieval-Augmented Language Model
View PDF HTML (experimental)Abstract:Retrieval-Augmented Generation (RAG) systems are widely used across various industries for querying closed-domain and in-house knowledge bases. However, evaluating these systems presents significant challenges due to the private nature of closed-domain data and a scarcity of queries with verifiable ground truths. Moreover, there is a lack of analytical methods to diagnose problematic modules and identify types of failure, such as those caused by knowledge deficits or issues with robustness. To address these challenges, we introduce GRAMMAR (GRounded And Modular Methodology for Assessment of RAG), an evaluation framework comprising a grounded data generation process and an evaluation protocol that effectively pinpoints defective modules. Our validation experiments reveal that GRAMMAR provides a reliable approach for identifying vulnerable modules and supports hypothesis testing for textual form vulnerabilities. An open-source tool accompanying this framework is available in our GitHub repository (see this https URL), allowing for easy reproduction of our results and enabling reliable and modular evaluation in closed-domain settings.
Submission history
From: Xinzhe Li [view email][v1] Tue, 30 Apr 2024 03:29:30 UTC (8,420 KB)
[v2] Thu, 2 May 2024 05:32:23 UTC (8,420 KB)
[v3] Thu, 9 May 2024 01:46:48 UTC (8,420 KB)
[v4] Wed, 29 May 2024 11:12:21 UTC (8,414 KB)
[v5] Fri, 12 Jul 2024 05:16:30 UTC (8,729 KB)
[v6] Thu, 15 Aug 2024 21:56:49 UTC (8,856 KB)
[v7] Wed, 23 Oct 2024 11:19:02 UTC (9,210 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.