Hallucinations are inevitable but statistically negligible

Suzuki, Atsushi; He, Yulan; Tian, Feng; Wang, Zhongyuan

Computer Science > Computation and Language

arXiv:2502.12187 (cs)

[Submitted on 15 Feb 2025]

Title:Hallucinations are inevitable but statistically negligible

Authors:Atsushi Suzuki, Yulan He, Feng Tian, Zhongyuan Wang

View PDF

Abstract:Hallucinations, a phenomenon where a language model (LM) generates nonfactual content, pose a significant challenge to the practical deployment of LMs. While many empirical methods have been proposed to mitigate hallucinations, a recent study established a computability-theoretic result showing that any LM will inevitably generate hallucinations on an infinite set of inputs, regardless of the quality and quantity of training datasets and the choice of the language model architecture and training and inference algorithms. Although the computability-theoretic result may seem pessimistic, its significance in practical viewpoints has remained unclear. In contrast, we present a positive theoretical result from a probabilistic perspective. Specifically, we prove that hallucinations can be made statistically negligible, provided that the quality and quantity of the training data are sufficient. Interestingly, our positive result coexists with the computability-theoretic result, implying that while hallucinations on an infinite set of inputs cannot be entirely eliminated, their probability can always be reduced by improving algorithms and training data. By evaluating the two seemingly contradictory results through the lens of information theory, we argue that our probability-theoretic positive result better reflects practical considerations than the computability-theoretic negative result.

Subjects:	Computation and Language (cs.CL); Formal Languages and Automata Theory (cs.FL); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Cite as:	arXiv:2502.12187 [cs.CL]
	(or arXiv:2502.12187v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.12187

Submission history

From: Atsushi Suzuki [view email]
[v1] Sat, 15 Feb 2025 07:28:40 UTC (316 KB)

Computer Science > Computation and Language

Title:Hallucinations are inevitable but statistically negligible

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Hallucinations are inevitable but statistically negligible

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators