Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink

Wang, Yining; Zhang, Mi; Sun, Junjie; Wang, Chenyue; Yang, Min; Xue, Hui; Tao, Jialing; Duan, Ranjie; Liu, Jiexi

Computer Science > Machine Learning

arXiv:2501.15269 (cs)

[Submitted on 25 Jan 2025]

Title:Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink

Authors:Yining Wang, Mi Zhang, Junjie Sun, Chenyue Wang, Min Yang, Hui Xue, Jialing Tao, Ranjie Duan, Jiexi Liu

View PDF HTML (experimental)

Abstract:Fusing visual understanding into language generation, Multi-modal Large Language Models (MLLMs) are revolutionizing visual-language applications. Yet, these models are often plagued by the hallucination problem, which involves generating inaccurate objects, attributes, and relationships that do not match the visual content. In this work, we delve into the internal attention mechanisms of MLLMs to reveal the underlying causes of hallucination, exposing the inherent vulnerabilities in the instruction-tuning process.
We propose a novel hallucination attack against MLLMs that exploits attention sink behaviors to trigger hallucinated content with minimal image-text relevance, posing a significant threat to critical downstream applications. Distinguished from previous adversarial methods that rely on fixed patterns, our approach generates dynamic, effective, and highly transferable visual adversarial inputs, without sacrificing the quality of model responses. Comprehensive experiments on 6 prominent MLLMs demonstrate the efficacy of our attack in compromising black-box MLLMs even with extensive mitigating mechanisms, as well as the promising results against cutting-edge commercial APIs, such as GPT-4o and Gemini 1.5. Our code is available at this https URL.

Comments:	USENIX Security 2025
Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.15269 [cs.LG]
	(or arXiv:2501.15269v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2501.15269

Submission history

From: Yining Wang [view email]
[v1] Sat, 25 Jan 2025 16:36:00 UTC (24,235 KB)

Computer Science > Machine Learning

Title:Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators