BadNL: Backdoor Attacks Against NLP Models

Chen, Xiaoyi; Salem, Ahmed; Backes, Michael; Ma, Shiqing; Zhang, Yang

Computer Science > Cryptography and Security

arXiv:2006.01043v1 (cs)

[Submitted on 1 Jun 2020 (this version), latest version 4 Oct 2021 (v2)]

Title:BadNL: Backdoor Attacks Against NLP Models

Authors:Xiaoyi Chen, Ahmed Salem, Michael Backes, Shiqing Ma, Yang Zhang

View PDF

Abstract:Machine learning (ML) has progressed rapidly during the past decade and ML models have been deployed in various real-world applications. Meanwhile, machine learning models have been shown to be vulnerable to various security and privacy attacks. One attack that has attracted a great deal of attention recently is the backdoor attack. Specifically, the adversary poisons the target model training set, to mislead any input with an added secret trigger to a target class, while keeping the accuracy for original inputs unchanged.
Previous backdoor attacks mainly focus on computer vision tasks. In this paper, we present the first systematic investigation of the backdoor attack against models designed for natural language processing (NLP) tasks. Specifically, we propose three methods to construct triggers in the NLP setting, including Char-level, Word-level, and Sentence-level triggers. Our Attacks achieve an almost perfect success rate without jeopardizing the original model utility. For instance, using the word-level triggers, our backdoor attack achieves 100% backdoor accuracy with only a drop of 0.18%, 1.26%, and 0.19% in the models utility, for the IMDB, Amazon, and Stanford Sentiment Treebank datasets, respectively.

Subjects:	Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:2006.01043 [cs.CR]
	(or arXiv:2006.01043v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2006.01043

Submission history

From: Xiaoyi Chen [view email]
[v1] Mon, 1 Jun 2020 16:17:14 UTC (1,024 KB)
[v2] Mon, 4 Oct 2021 18:59:32 UTC (1,148 KB)

Computer Science > Cryptography and Security

Title:BadNL: Backdoor Attacks Against NLP Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:BadNL: Backdoor Attacks Against NLP Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators