Bi'an: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation

Jiang, Zhouyu; Sun, Mengshu; Zhang, Zhiqiang; Liang, Lei

Computer Science > Computation and Language

arXiv:2502.19209 (cs)

[Submitted on 26 Feb 2025]

Title:Bi'an: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation

Authors:Zhouyu Jiang, Mengshu Sun, Zhiqiang Zhang, Lei Liang

View PDF HTML (experimental)

Abstract:Retrieval-Augmented Generation (RAG) effectively reduces hallucinations in Large Language Models (LLMs) but can still produce inconsistent or unsupported content. Although LLM-as-a-Judge is widely used for RAG hallucination detection due to its implementation simplicity, it faces two main challenges: the absence of comprehensive evaluation benchmarks and the lack of domain-optimized judge models. To bridge these gaps, we introduce \textbf{Bi'an}, a novel framework featuring a bilingual benchmark dataset and lightweight judge models. The dataset supports rigorous evaluation across multiple RAG scenarios, while the judge models are fine-tuned from compact open-source LLMs. Extensive experimental evaluations on Bi'anBench show our 14B model outperforms baseline models with over five times larger parameter scales and rivals state-of-the-art closed-source LLMs. We will release our data and models soon at this https URL.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.19209 [cs.CL]
	(or arXiv:2502.19209v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.19209

Submission history

From: Zhouyu Jiang [view email]
[v1] Wed, 26 Feb 2025 15:12:59 UTC (743 KB)

Computer Science > Computation and Language

Title:Bi'an: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Bi'an: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators