ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty

Zong, Qing; Wang, Zhaowei; Zheng, Tianshi; Ren, Xiyu; Song, Yangqiu

Computer Science > Computation and Language

arXiv:2412.20251 (cs)

[Submitted on 28 Dec 2024]

Title:ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty

Authors:Qing Zong, Zhaowei Wang, Tianshi Zheng, Xiyu Ren, Yangqiu Song

View PDF HTML (experimental)

Abstract:The rapid development of LLMs has sparked extensive research into their factual knowledge. Current works claim that LLMs fall short on questions requiring less frequent knowledge. However, their proof is incomplete since they only study the influence of entity frequency, which can not fully represent knowledge frequency. So we introduce ComparisonQA benchmark, containing 283K abstract questions, each instantiated by a pair of high-frequency and low-frequency entities. It ensures a controllable comparison because the difference of knowledge frequency between such a pair is only related to entity frequency. In addition, to avoid possible semantic shortcuts, which is a severe problem of current LLMs study, we design a two-round method for knowledge robustness measurement utilizing both correctness and uncertainty. Experiments reveal that LLMs exhibit particularly low robustness regarding low-frequency knowledge, and GPT-4o is even the worst under this measurement. Besides, we introduce an automatic method to filter out questions with low-quality and shortcuts to form ComparisonQA-Hard. We find that uncertainty effectively identifies such questions while maintaining the data size.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2412.20251 [cs.CL]
	(or arXiv:2412.20251v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.20251

Submission history

From: Qing Zong [view email]
[v1] Sat, 28 Dec 2024 19:51:08 UTC (305 KB)

Computer Science > Computation and Language

Title:ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators