From Shortcuts to Triggers: Backdoor Defense with Denoised PoE

Liu, Qin; Wang, Fei; Xiao, Chaowei; Chen, Muhao

Computer Science > Computation and Language

arXiv:2305.14910 (cs)

[Submitted on 24 May 2023 (v1), last revised 2 Apr 2024 (this version, v3)]

Title:From Shortcuts to Triggers: Backdoor Defense with Denoised PoE

Authors:Qin Liu, Fei Wang, Chaowei Xiao, Muhao Chen

View PDF HTML (experimental)

Abstract:Language models are often at risk of diverse backdoor attacks, especially data poisoning. Thus, it is important to investigate defense solutions for addressing them. Existing backdoor defense methods mainly focus on backdoor attacks with explicit triggers, leaving a universal defense against various backdoor attacks with diverse triggers largely unexplored. In this paper, we propose an end-to-end ensemble-based backdoor defense framework, DPoE (Denoised Product-of-Experts), which is inspired by the shortcut nature of backdoor attacks, to defend various backdoor attacks. DPoE consists of two models: a shallow model that captures the backdoor shortcuts and a main model that is prevented from learning the backdoor shortcuts. To address the label flip caused by backdoor attackers, DPoE incorporates a denoising design. Experiments on SST-2 dataset show that DPoE significantly improves the defense performance against various types of backdoor triggers including word-level, sentence-level, and syntactic triggers. Furthermore, DPoE is also effective under a more challenging but practical setting that mixes multiple types of trigger.

Comments:	Accepted by NAACL 2024 Main Conference
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:2305.14910 [cs.CL]
	(or arXiv:2305.14910v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.14910

Submission history

From: Qin Liu [view email]
[v1] Wed, 24 May 2023 08:59:25 UTC (7,429 KB)
[v2] Sat, 23 Dec 2023 17:57:30 UTC (7,877 KB)
[v3] Tue, 2 Apr 2024 23:01:17 UTC (7,894 KB)

Computer Science > Computation and Language

Title:From Shortcuts to Triggers: Backdoor Defense with Denoised PoE

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:From Shortcuts to Triggers: Backdoor Defense with Denoised PoE

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators