Hallucination Detection in LLMs Using Spectral Features of Attention Maps

Binkowski, Jakub; Janiak, Denis; Sawczyn, Albert; Gabrys, Bogdan; Kajdanowicz, Tomasz

Computer Science > Machine Learning

arXiv:2502.17598 (cs)

[Submitted on 24 Feb 2025]

Title:Hallucination Detection in LLMs Using Spectral Features of Attention Maps

Authors:Jakub Binkowski, Denis Janiak, Albert Sawczyn, Bogdan Gabrys, Tomasz Kajdanowicz

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have demonstrated remarkable performance across various tasks but remain prone to hallucinations. Detecting hallucinations is essential for safety-critical applications, and recent methods leverage attention map properties to this end, though their effectiveness remains limited. In this work, we investigate the spectral features of attention maps by interpreting them as adjacency matrices of graph structures. We propose the $\text{LapEigvals}$ method, which utilises the top-$k$ eigenvalues of the Laplacian matrix derived from the attention maps as an input to hallucination detection probes. Empirical evaluations demonstrate that our approach achieves state-of-the-art hallucination detection performance among attention-based methods. Extensive ablation studies further highlight the robustness and generalisation of $\text{LapEigvals}$, paving the way for future advancements in the hallucination detection domain.

Comments:	Preprint, under review
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2502.17598 [cs.LG]
	(or arXiv:2502.17598v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.17598

Submission history

From: Jakub Binkowski [view email]
[v1] Mon, 24 Feb 2025 19:30:24 UTC (4,591 KB)

Computer Science > Machine Learning

Title:Hallucination Detection in LLMs Using Spectral Features of Attention Maps

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Hallucination Detection in LLMs Using Spectral Features of Attention Maps

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators