NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

Mei, Kai; Li, Zheng; Wang, Zhenting; Zhang, Yang; Ma, Shiqing

Computer Science > Computation and Language

arXiv:2305.17826 (cs)

[Submitted on 28 May 2023]

Title:NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

Authors:Kai Mei, Zheng Li, Zhenting Wang, Yang Zhang, Shiqing Ma

View PDF

Abstract:Prompt-based learning is vulnerable to backdoor attacks. Existing backdoor attacks against prompt-based models consider injecting backdoors into the entire embedding layers or word embedding vectors. Such attacks can be easily affected by retraining on downstream tasks and with different prompting strategies, limiting the transferability of backdoor attacks. In this work, we propose transferable backdoor attacks against prompt-based models, called NOTABLE, which is independent of downstream tasks and prompting strategies. Specifically, NOTABLE injects backdoors into the encoders of PLMs by utilizing an adaptive verbalizer to bind triggers to specific words (i.e., anchors). It activates the backdoor by pasting input with triggers to reach adversary-desired anchors, achieving independence from downstream tasks and prompting strategies. We conduct experiments on six NLP tasks, three popular models, and three prompting strategies. Empirical results show that NOTABLE achieves superior attack performance (i.e., attack success rate over 90% on all the datasets), and outperforms two state-of-the-art baselines. Evaluations on three defenses show the robustness of NOTABLE. Our code can be found at this https URL.

Subjects:	Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Cite as:	arXiv:2305.17826 [cs.CL]
	(or arXiv:2305.17826v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.17826

Submission history

From: Kai Mei [view email]
[v1] Sun, 28 May 2023 23:35:17 UTC (7,700 KB)

Computer Science > Computation and Language

Title:NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators