Interpretations are useful: penalizing explanations to align neural networks with prior knowledge

Rieger, Laura; Singh, Chandan; Murdoch, W. James; Yu, Bin

Computer Science > Machine Learning

arXiv:1909.13584 (cs)

[Submitted on 30 Sep 2019 (v1), last revised 8 Oct 2020 (this version, v4)]

Title:Interpretations are useful: penalizing explanations to align neural networks with prior knowledge

Authors:Laura Rieger, Chandan Singh, W. James Murdoch, Bin Yu

View PDF

Abstract:For an explanation of a deep learning model to be effective, it must provide both insight into a model and suggest a corresponding action in order to achieve some objective. Too often, the litany of proposed explainable deep learning methods stop at the first step, providing practitioners with insight into a model, but no way to act on it. In this paper, we propose contextual decomposition explanation penalization (CDEP), a method which enables practitioners to leverage existing explanation methods in order to increase the predictive accuracy of deep learning models. In particular, when shown that a model has incorrectly assigned importance to some features, CDEP enables practitioners to correct these errors by directly regularizing the provided explanations. Using explanations provided by contextual decomposition (CD) (Murdoch et al., 2018), we demonstrate the ability of our method to increase performance on an array of toy and real datasets.

Comments:	18 pages; published in ICML2020; Erratum: numbers in table 1 were too high (now corrected) with the trend remaining the same
Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Cite as:	arXiv:1909.13584 [cs.LG]
	(or arXiv:1909.13584v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1909.13584

Submission history

From: Laura Rieger [view email]
[v1] Mon, 30 Sep 2019 11:02:01 UTC (7,465 KB)
[v2] Tue, 1 Oct 2019 12:05:59 UTC (7,431 KB)
[v3] Sat, 1 Aug 2020 19:24:50 UTC (1,856 KB)
[v4] Thu, 8 Oct 2020 12:43:21 UTC (1,856 KB)

Computer Science > Machine Learning

Title:Interpretations are useful: penalizing explanations to align neural networks with prior knowledge

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Interpretations are useful: penalizing explanations to align neural networks with prior knowledge

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators