Character-level White-Box Adversarial Attacks against Transformers via Attachable Subwords Substitution

Liu, Aiwei; Yu, Honghai; Hu, Xuming; Li, Shu'ang; Lin, Li; Ma, Fukun; Yang, Yawen; Wen, Lijie

Computer Science > Computation and Language

arXiv:2210.17004 (cs)

[Submitted on 31 Oct 2022]

Title:Character-level White-Box Adversarial Attacks against Transformers via Attachable Subwords Substitution

Authors:Aiwei Liu, Honghai Yu, Xuming Hu, Shu'ang Li, Li Lin, Fukun Ma, Yawen Yang, Lijie Wen

View PDF

Abstract:We propose the first character-level white-box adversarial attack method against transformer models. The intuition of our method comes from the observation that words are split into subtokens before being fed into the transformer models and the substitution between two close subtokens has a similar effect to the character modification. Our method mainly contains three steps. First, a gradient-based method is adopted to find the most vulnerable words in the sentence. Then we split the selected words into subtokens to replace the origin tokenization result from the transformer tokenizer. Finally, we utilize an adversarial loss to guide the substitution of attachable subtokens in which the Gumbel-softmax trick is introduced to ensure gradient propagation. Meanwhile, we introduce the visual and length constraint in the optimization process to achieve minimum character modifications. Extensive experiments on both sentence-level and token-level tasks demonstrate that our method could outperform the previous attack methods in terms of success rate and edit distance. Furthermore, human evaluation verifies our adversarial examples could preserve their origin labels.

Comments:	13 pages, 3 figures. EMNLP 2022
Subjects:	Computation and Language (cs.CL)
MSC classes:	68T50
ACM classes:	I.2.7
Cite as:	arXiv:2210.17004 [cs.CL]
	(or arXiv:2210.17004v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.17004
Journal reference:	EMNLP 2022

Submission history

From: Aiwei Liu [view email]
[v1] Mon, 31 Oct 2022 01:46:29 UTC (1,058 KB)

Computer Science > Computation and Language

Title:Character-level White-Box Adversarial Attacks against Transformers via Attachable Subwords Substitution

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Character-level White-Box Adversarial Attacks against Transformers via Attachable Subwords Substitution

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators