Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models

Li, Xiao; Li, Zhuhong; Li, Qiongxiu; Lee, Bingze; Cui, Jinghao; Hu, Xiaolin

Computer Science > Machine Learning

arXiv:2410.15362 (cs)

[Submitted on 20 Oct 2024]

Title:Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models

Authors:Xiao Li, Zhuhong Li, Qiongxiu Li, Bingze Lee, Jinghao Cui, Xiaolin Hu

View PDF HTML (experimental)

Abstract:Aligned Large Language Models (LLMs) have demonstrated remarkable performance across various tasks. However, LLMs remain susceptible to jailbreak adversarial attacks, where adversaries manipulate prompts to elicit malicious responses that aligned LLMs should have avoided. Identifying these vulnerabilities is crucial for understanding the inherent weaknesses of LLMs and preventing their potential misuse. One pioneering work in jailbreaking is the GCG attack, a discrete token optimization algorithm that seeks to find a suffix capable of jailbreaking aligned LLMs. Despite the success of GCG, we find it suboptimal, requiring significantly large computational costs, and the achieved jailbreaking performance is limited. In this work, we propose Faster-GCG, an efficient adversarial jailbreak method by delving deep into the design of GCG. Experiments demonstrate that Faster-GCG can surpass the original GCG with only 1/10 of the computational cost, achieving significantly higher attack success rates on various open-source aligned LLMs. In addition, We demonstrate that Faster-GCG exhibits improved attack transferability when testing on closed-sourced LLMs such as ChatGPT.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Cite as:	arXiv:2410.15362 [cs.LG]
	(or arXiv:2410.15362v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.15362

Submission history

From: Xiao Li [view email]
[v1] Sun, 20 Oct 2024 11:27:41 UTC (361 KB)

Computer Science > Machine Learning

Title:Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators