The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative

Tan, Zhen; Zhao, Chengshuai; Moraffah, Raha; Li, Yifan; Kong, Yu; Chen, Tianlong; Liu, Huan

Computer Science > Cryptography and Security

arXiv:2402.14859 (cs)

[Submitted on 20 Feb 2024 (v1), last revised 3 Jun 2024 (this version, v2)]

Title:The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative

Authors:Zhen Tan, Chengshuai Zhao, Raha Moraffah, Yifan Li, Yu Kong, Tianlong Chen, Huan Liu

View PDF HTML (experimental)

Abstract:Due to their unprecedented ability to process and respond to various types of data, Multimodal Large Language Models (MLLMs) are constantly defining the new boundary of Artificial General Intelligence (AGI). As these advanced generative models increasingly form collaborative networks for complex tasks, the integrity and security of these systems are crucial. Our paper, ``The Wolf Within'', explores a novel vulnerability in MLLM societies - the indirect propagation of malicious content. Unlike direct harmful output generation for MLLMs, our research demonstrates how a single MLLM agent can be subtly influenced to generate prompts that, in turn, induce other MLLM agents in the society to output malicious content. Our findings reveal that, an MLLM agent, when manipulated to produce specific prompts or instructions, can effectively ``infect'' other agents within a society of MLLMs. This infection leads to the generation and circulation of harmful outputs, such as dangerous instructions or misinformation, across the society. We also show the transferability of these indirectly generated prompts, highlighting their possibility in propagating malice through inter-agent communication. This research provides a critical insight into a new dimension of threat posed by MLLMs, where a single agent can act as a catalyst for widespread malevolent influence. Our work underscores the urgent need for developing robust mechanisms to detect and mitigate such covert manipulations within MLLM societies, ensuring their safe and ethical utilization in societal applications.

Comments:	Accepted to workshop on ReGenAI@CVPR 2024
Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2402.14859 [cs.CR]
	(or arXiv:2402.14859v2 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2402.14859

Submission history

From: Zhen Tan [view email]
[v1] Tue, 20 Feb 2024 23:08:21 UTC (4,892 KB)
[v2] Mon, 3 Jun 2024 03:29:07 UTC (19,451 KB)

Computer Science > Cryptography and Security

Title:The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators