Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking

Xu, Nan; Wang, Fei; Zhou, Ben; Li, Bang Zheng; Xiao, Chaowei; Chen, Muhao

Computer Science > Computation and Language

arXiv:2311.09827 (cs)

[Submitted on 16 Nov 2023 (v1), last revised 29 Feb 2024 (this version, v2)]

Title:Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking

Authors:Nan Xu, Fei Wang, Ben Zhou, Bang Zheng Li, Chaowei Xiao, Muhao Chen

View PDF HTML (experimental)

Abstract:While large language models (LLMs) have demonstrated increasing power, they have also given rise to a wide range of harmful behaviors. As representatives, jailbreak attacks can provoke harmful or unethical responses from LLMs, even after safety alignment. In this paper, we investigate a novel category of jailbreak attacks specifically designed to target the cognitive structure and processes of LLMs. Specifically, we analyze the safety vulnerability of LLMs in the face of (1) multilingual cognitive overload, (2) veiled expression, and (3) effect-to-cause reasoning. Different from previous jailbreak attacks, our proposed cognitive overload is a black-box attack with no need for knowledge of model architecture or access to model weights. Experiments conducted on AdvBench and MasterKey reveal that various LLMs, including both popular open-source model Llama 2 and the proprietary model ChatGPT, can be compromised through cognitive overload. Motivated by cognitive psychology work on managing cognitive load, we further investigate defending cognitive overload attack from two perspectives. Empirical studies show that our cognitive overload from three perspectives can jailbreak all studied LLMs successfully, while existing defense strategies can hardly mitigate the caused malicious uses effectively.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2311.09827 [cs.CL]
	(or arXiv:2311.09827v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2311.09827

Submission history

From: Muhao Chen [view email]
[v1] Thu, 16 Nov 2023 11:52:22 UTC (9,334 KB)
[v2] Thu, 29 Feb 2024 08:20:07 UTC (10,449 KB)

Computer Science > Computation and Language

Title:Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators