BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements

Chen, Xiaoyi; Salem, Ahmed; Chen, Dingfan; Backes, Michael; Ma, Shiqing; Shen, Qingni; Wu, Zhonghai; Zhang, Yang

doi:10.1145/3485832.3485837

Computer Science > Cryptography and Security

arXiv:2006.01043 (cs)

[Submitted on 1 Jun 2020 (v1), last revised 4 Oct 2021 (this version, v2)]

Title:BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements

Authors:Xiaoyi Chen, Ahmed Salem, Dingfan Chen, Michael Backes, Shiqing Ma, Qingni Shen, Zhonghai Wu, Yang Zhang

View PDF

Abstract:Deep neural networks (DNNs) have progressed rapidly during the past decade and have been deployed in various real-world applications. Meanwhile, DNN models have been shown to be vulnerable to security and privacy attacks. One such attack that has attracted a great deal of attention recently is the backdoor attack. Specifically, the adversary poisons the target model's training set to mislead any input with an added secret trigger to a target class.
Previous backdoor attacks predominantly focus on computer vision (CV) applications, such as image classification. In this paper, we perform a systematic investigation of backdoor attack on NLP models, and propose BadNL, a general NLP backdoor attack framework including novel attack methods. Specifically, we propose three methods to construct triggers, namely BadChar, BadWord, and BadSentence, including basic and semantic-preserving variants. Our attacks achieve an almost perfect attack success rate with a negligible effect on the original model's utility. For instance, using the BadChar, our backdoor attack achieves a 98.9% attack success rate with yielding a utility improvement of 1.5% on the SST-5 dataset when only poisoning 3% of the original set. Moreover, we conduct a user study to prove that our triggers can well preserve the semantics from humans perspective.

Comments:	To appear in Annual Computer Security Applications Conference (ACSAC) 2021
Subjects:	Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:2006.01043 [cs.CR]
	(or arXiv:2006.01043v2 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2006.01043
Related DOI:	https://doi.org/10.1145/3485832.3485837

Submission history

From: Xiaoyi Chen [view email]
[v1] Mon, 1 Jun 2020 16:17:14 UTC (1,024 KB)
[v2] Mon, 4 Oct 2021 18:59:32 UTC (1,148 KB)

Computer Science > Cryptography and Security

Title:BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators