Adversarial Tokenization

Geh, Renato Lui; Shao, Zilei; Broeck, Guy Van den

Computer Science > Computation and Language

arXiv:2503.02174 (cs)

[Submitted on 4 Mar 2025]

Title:Adversarial Tokenization

Authors:Renato Lui Geh, Zilei Shao, Guy Van den Broeck

View PDF

Abstract:Current LLM pipelines account for only one possible tokenization for a given string, ignoring exponentially many alternative tokenizations during training and inference. For example, the standard Llama3 tokenization of penguin is [p,enguin], yet [peng,uin] is another perfectly valid alternative. In this paper, we show that despite LLMs being trained solely on one tokenization, they still retain semantic understanding of other tokenizations, raising questions about their implications in LLM safety. Put succinctly, we answer the following question: can we adversarially tokenize an obviously malicious string to evade safety and alignment restrictions? We show that not only is adversarial tokenization an effective yet previously neglected axis of attack, but it is also competitive against existing state-of-the-art adversarial approaches without changing the text of the harmful request. We empirically validate this exploit across three state-of-the-art LLMs and adversarial datasets, revealing a previously unknown vulnerability in subword models.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2503.02174 [cs.CL]
	(or arXiv:2503.02174v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2503.02174

Submission history

From: Zilei Shao [view email]
[v1] Tue, 4 Mar 2025 01:31:17 UTC (149 KB)

Computer Science > Computation and Language

Title:Adversarial Tokenization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Adversarial Tokenization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators