InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks

Swamy, Vinitra; Blackwell, Julian; Frej, Jibril; Jaggi, Martin; Käser, Tanja

Computer Science > Machine Learning

arXiv:2402.02933v1 (cs)

[Submitted on 5 Feb 2024 (this version), latest version 29 May 2024 (v3)]

Title:InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks

Authors:Vinitra Swamy, Julian Blackwell, Jibril Frej, Martin Jaggi, Tanja Käser

View PDF HTML (experimental)

Abstract:Real-world interpretability for neural networks is a tradeoff between three concerns: 1) it requires humans to trust the explanation approximation (e.g. post-hoc approaches), 2) it compromises the understandability of the explanation (e.g. automatically identified feature masks), and 3) it compromises the model performance (e.g. decision trees). These shortcomings are unacceptable for human-facing domains, like education, healthcare, or natural language, which require trustworthy explanations, actionable interpretations, and accurate predictions. In this work, we present InterpretCC (interpretable conditional computation), a family of interpretable-by-design neural networks that guarantee human-centric interpretability while maintaining comparable performance to state-of-the-art models by adaptively and sparsely activating features before prediction. We extend this idea into an interpretable mixture-of-experts model, that allows humans to specify topics of interest, discretely separates the feature space for each data point into topical subnetworks, and adaptively and sparsely activates these topical subnetworks. We demonstrate variations of the InterpretCC architecture for text and tabular data across several real-world benchmarks: six online education courses, news classification, breast cancer diagnosis, and review sentiment.

Subjects:	Machine Learning (cs.LG); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2402.02933 [cs.LG]
	(or arXiv:2402.02933v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.02933

Submission history

From: Vinitra Swamy [view email]
[v1] Mon, 5 Feb 2024 11:55:50 UTC (1,792 KB)
[v2] Tue, 28 May 2024 14:58:26 UTC (1,341 KB)
[v3] Wed, 29 May 2024 12:03:40 UTC (1,332 KB)

Computer Science > Machine Learning

Title:InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators