Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow

Bai, Jiaqi; Guo, Hongcheng; Peng, Zhongyuan; Yang, Jian; Li, Zhoujun; Li, Mohan; Tian, Zhihong

Computer Science > Computation and Language

arXiv:2502.20750 (cs)

[Submitted on 28 Feb 2025]

Title:Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow

Authors:Jiaqi Bai, Hongcheng Guo, Zhongyuan Peng, Jian Yang, Zhoujun Li, Mohan Li, Zhihong Tian

View PDF HTML (experimental)

Abstract:Large vision-language models show tremendous potential in understanding visual information through human languages. However, they are prone to suffer from object hallucination, i.e., the generated image descriptions contain objects that do not exist in the image. In this paper, we reveal that object hallucination can be attributed to overconfidence in irrelevant visual features when soft visual tokens map to the LLM's word embedding space. Specifically, by figuring out the semantic similarity between visual tokens and LLM's word embedding, we observe that the smoothness of similarity distribution strongly correlates with the emergence of object hallucinations. To mitigate hallucinations, we propose using the Variational Information Bottleneck (VIB) to alleviate overconfidence by introducing stochastic noise, facilitating the constraining of irrelevant information. Furthermore, we propose an entropy-based noise-controlling strategy to enable the injected noise to be adaptively constrained regarding the smoothness of the similarity distribution. We adapt the proposed AdaVIB across distinct model architectures. Experimental results demonstrate that the proposed AdaVIB mitigates object hallucinations by effectively alleviating the overconfidence in irrelevant visual features, with consistent improvements on two object hallucination benchmarks.

Comments:	Accepted to AAAI 2025. Camera ready version
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.20750 [cs.CL]
	(or arXiv:2502.20750v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.20750

Submission history

From: Jiaqi Bai [view email]
[v1] Fri, 28 Feb 2025 05:56:23 UTC (2,569 KB)

Computer Science > Computation and Language

Title:Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators