Semantics and explanation: why counterfactual explanations produce adversarial examples in deep neural networks

Browne, Kieran; Swift, Ben

Computer Science > Artificial Intelligence

arXiv:2012.10076 (cs)

[Submitted on 18 Dec 2020]

Title:Semantics and explanation: why counterfactual explanations produce adversarial examples in deep neural networks

Authors:Kieran Browne, Ben Swift

View PDF

Abstract:Recent papers in explainable AI have made a compelling case for counterfactual modes of explanation. While counterfactual explanations appear to be extremely effective in some instances, they are formally equivalent to adversarial examples. This presents an apparent paradox for explainability researchers: if these two procedures are formally equivalent, what accounts for the explanatory divide apparent between counterfactual explanations and adversarial examples? We resolve this paradox by placing emphasis back on the semantics of counterfactual expressions. Producing satisfactory explanations for deep learning systems will require that we find ways to interpret the semantics of hidden layer representations in deep neural networks.

Subjects:	Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2012.10076 [cs.AI]
	(or arXiv:2012.10076v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2012.10076

Submission history

From: Kieran Browne [view email]
[v1] Fri, 18 Dec 2020 07:04:04 UTC (22 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.AI

< prev | next >

new | recent | 2020-12

Change to browse by:

cs
cs.CY
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

export BibTeX citation

Computer Science > Artificial Intelligence

Title:Semantics and explanation: why counterfactual explanations produce adversarial examples in deep neural networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Semantics and explanation: why counterfactual explanations produce adversarial examples in deep neural networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators