Accuracy of TextFooler black box adversarial attacks on 01 loss sign activation neural network ensemble

Xue, Yunzhe; Roshan, Usman

Computer Science > Machine Learning

arXiv:2402.07347 (cs)

[Submitted on 12 Feb 2024]

Title:Accuracy of TextFooler black box adversarial attacks on 01 loss sign activation neural network ensemble

Authors:Yunzhe Xue, Usman Roshan

View PDF

Abstract:Recent work has shown the defense of 01 loss sign activation neural networks against image classification adversarial attacks. A public challenge to attack the models on CIFAR10 dataset remains undefeated. We ask the following question in this study: are 01 loss sign activation neural networks hard to deceive with a popular black box text adversarial attack program called TextFooler? We study this question on four popular text classification datasets: IMDB reviews, Yelp reviews, MR sentiment classification, and AG news classification. We find that our 01 loss sign activation network is much harder to attack with TextFooler compared to sigmoid activation cross entropy and binary neural networks. We also study a 01 loss sign activation convolutional neural network with a novel global pooling step specific to sign activation networks. With this new variation we see a significant gain in adversarial accuracy rendering TextFooler practically useless against it. We make our code freely available at \url{this https URL} and \url{this https URL}. Our work here suggests that 01 loss sign activation networks could be further developed to create fool proof models against text adversarial attacks.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Cite as:	arXiv:2402.07347 [cs.LG]
	(or arXiv:2402.07347v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.07347

Submission history

From: Usman Roshan [view email]
[v1] Mon, 12 Feb 2024 00:36:34 UTC (258 KB)

Computer Science > Machine Learning

Title:Accuracy of TextFooler black box adversarial attacks on 01 loss sign activation neural network ensemble

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Accuracy of TextFooler black box adversarial attacks on 01 loss sign activation neural network ensemble

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators