Post-hoc Interpretability for Neural NLP: A Survey

Madsen, Andreas; Reddy, Siva; Chandar, Sarath

doi:10.1145/3546577

Computer Science > Computation and Language

arXiv:2108.04840 (cs)

[Submitted on 10 Aug 2021 (v1), last revised 28 Nov 2023 (this version, v5)]

Title:Post-hoc Interpretability for Neural NLP: A Survey

Authors:Andreas Madsen, Siva Reddy, Sarath Chandar

View PDF

Abstract:Neural networks for NLP are becoming increasingly complex and widespread, and there is a growing concern if these models are responsible to use. Explaining models helps to address the safety and ethical concerns and is essential for accountability. Interpretability serves to provide these explanations in terms that are understandable to humans. Additionally, post-hoc methods provide explanations after a model is learned and are generally model-agnostic. This survey provides a categorization of how recent post-hoc interpretability methods communicate explanations to humans, it discusses each method in-depth, and how they are validated, as the latter is often a common concern.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:2108.04840 [cs.CL]
	(or arXiv:2108.04840v5 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2108.04840
Journal reference:	ACM Comput. Surv. 55, 8, Article 155 (December 2022)
Related DOI:	https://doi.org/10.1145/3546577

Submission history

From: Andreas Madsen [view email]
[v1] Tue, 10 Aug 2021 18:00:14 UTC (191 KB)
[v2] Fri, 13 Aug 2021 16:51:08 UTC (190 KB)
[v3] Fri, 11 Feb 2022 16:57:04 UTC (310 KB)
[v4] Fri, 29 Apr 2022 16:49:20 UTC (296 KB)
[v5] Tue, 28 Nov 2023 06:39:41 UTC (321 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-08

Change to browse by:

cs
cs.LG
cs.NE

References & Citations

DBLP - CS Bibliography

listing | bibtex

Siva Reddy
Sarath Chandar

export BibTeX citation

Computer Science > Computation and Language

Title:Post-hoc Interpretability for Neural NLP: A Survey

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Post-hoc Interpretability for Neural NLP: A Survey

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators