Divide and Conquer: A Hybrid Strategy Defeats Multimodal Large Language Models

Mao, Yanxu; Liu, Peipei; Cui, Tiehan; Liu, Congying; You, Datao

Abstract:Large language models (LLMs) are widely applied in various fields of society due to their powerful reasoning, understanding, and generation capabilities. However, the security issues associated with these models are becoming increasingly severe. Jailbreaking attacks, as an important method for detecting vulnerabilities in LLMs, have been explored by researchers who attempt to induce these models to generate harmful content through various attack methods. Nevertheless, existing jailbreaking methods face numerous limitations, such as excessive query counts, limited coverage of jailbreak modalities, low attack success rates, and simplistic evaluation methods. To overcome these constraints, this paper proposes a multimodal jailbreaking method: JMLLM. This method integrates multiple strategies to perform comprehensive jailbreak attacks across text, visual, and auditory modalities. Additionally, we contribute a new and comprehensive dataset for multimodal jailbreaking research: TriJail, which includes jailbreak prompts for all three modalities. Experiments on the TriJail dataset and the benchmark dataset AdvBench, conducted on 13 popular LLMs, demonstrate advanced attack success rates and significant reduction in time overhead.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2412.16555 [cs.CL]
	(or arXiv:2412.16555v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.16555

Computer Science > Computation and Language

Title:Divide and Conquer: A Hybrid Strategy Defeats Multimodal Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators