Impact of Adversarial Attacks on Deep Learning Model Explainability

Nur, Gazi Nazia; Sadat, Mohammad Ahnaf

Computer Science > Machine Learning

arXiv:2412.11119 (cs)

[Submitted on 15 Dec 2024]

Title:Impact of Adversarial Attacks on Deep Learning Model Explainability

Authors:Gazi Nazia Nur, Mohammad Ahnaf Sadat

View PDF

Abstract:In this paper, we investigate the impact of adversarial attacks on the explainability of deep learning models, which are commonly criticized for their black-box nature despite their capacity for autonomous feature extraction. This black-box nature can affect the perceived trustworthiness of these models. To address this, explainability techniques such as GradCAM, SmoothGrad, and LIME have been developed to clarify model decision-making processes. Our research focuses on the robustness of these explanations when models are subjected to adversarial attacks, specifically those involving subtle image perturbations that are imperceptible to humans but can significantly mislead models. For this, we utilize attack methods like the Fast Gradient Sign Method (FGSM) and the Basic Iterative Method (BIM) and observe their effects on model accuracy and explanations. The results reveal a substantial decline in model accuracy, with accuracies dropping from 89.94% to 58.73% and 45.50% under FGSM and BIM attacks, respectively. Despite these declines in accuracy, the explanation of the models measured by metrics such as Intersection over Union (IoU) and Root Mean Square Error (RMSE) shows negligible changes, suggesting that these metrics may not be sensitive enough to detect the presence of adversarial perturbations.

Comments:	29 pages with reference included, submitted to a journal
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.11119 [cs.LG]
	(or arXiv:2412.11119v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.11119

Submission history

From: Mohammad Ahnaf Sadat [view email]
[v1] Sun, 15 Dec 2024 08:41:37 UTC (868 KB)

Computer Science > Machine Learning

Title:Impact of Adversarial Attacks on Deep Learning Model Explainability

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Impact of Adversarial Attacks on Deep Learning Model Explainability

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators