AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks

Tu, Chun-Chen; Ting, Paishun; Chen, Pin-Yu; Liu, Sijia; Zhang, Huan; Yi, Jinfeng; Hsieh, Cho-Jui; Cheng, Shin-Ming

Computer Science > Computer Vision and Pattern Recognition

arXiv:1805.11770 (cs)

[Submitted on 30 May 2018 (v1), last revised 31 Jan 2020 (this version, v5)]

Title:AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks

Authors:Chun-Chen Tu, Paishun Ting, Pin-Yu Chen, Sijia Liu, Huan Zhang, Jinfeng Yi, Cho-Jui Hsieh, Shin-Ming Cheng

View PDF

Abstract:Recent studies have shown that adversarial examples in state-of-the-art image classifiers trained by deep neural networks (DNN) can be easily generated when the target model is transparent to an attacker, known as the white-box setting. However, when attacking a deployed machine learning service, one can only acquire the input-output correspondences of the target model; this is the so-called black-box attack setting. The major drawback of existing black-box attacks is the need for excessive model queries, which may give a false sense of model robustness due to inefficient query designs. To bridge this gap, we propose a generic framework for query-efficient black-box attacks. Our framework, AutoZOOM, which is short for Autoencoder-based Zeroth Order Optimization Method, has two novel building blocks towards efficient black-box attacks: (i) an adaptive random gradient estimation strategy to balance query counts and distortion, and (ii) an autoencoder that is either trained offline with unlabeled data or a bilinear resizing operation for attack acceleration. Experimental results suggest that, by applying AutoZOOM to a state-of-the-art black-box attack (ZOO), a significant reduction in model queries can be achieved without sacrificing the attack success rate and the visual quality of the resulting adversarial examples. In particular, when compared to the standard ZOO method, AutoZOOM can consistently reduce the mean query counts in finding successful adversarial examples (or reaching the same distortion level) by at least 93% on MNIST, CIFAR-10 and ImageNet datasets, leading to novel insights on adversarial robustness.

Comments:	Chun-Chen Tu, Paishun Ting and Pin-Yu Chen contribute equally to this work; Paper accepted to AAAI 2019; updated model information in Table S2
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Cite as:	arXiv:1805.11770 [cs.CV]
	(or arXiv:1805.11770v5 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1805.11770

Submission history

From: Pin-Yu Chen [view email]
[v1] Wed, 30 May 2018 01:39:34 UTC (1,652 KB)
[v2] Thu, 6 Sep 2018 21:03:43 UTC (1,881 KB)
[v3] Tue, 13 Nov 2018 22:26:12 UTC (1,681 KB)
[v4] Wed, 30 Jan 2019 00:41:40 UTC (1,681 KB)
[v5] Fri, 31 Jan 2020 11:46:26 UTC (1,681 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks

Submission history

Access Paper:

References & Citations

2 blog links

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks

Submission history

Access Paper:

References & Citations

2 blog links

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators