Disentangling Polysemantic Channels in Convolutional Neural Networks

Hesse, Robin; Fischer, Jonas; Schaub-Meyer, Simone; Roth, Stefan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2504.12939 (cs)

[Submitted on 17 Apr 2025]

Title:Disentangling Polysemantic Channels in Convolutional Neural Networks

Authors:Robin Hesse, Jonas Fischer, Simone Schaub-Meyer, Stefan Roth

View PDF

Abstract:Mechanistic interpretability is concerned with analyzing individual components in a (convolutional) neural network (CNN) and how they form larger circuits representing decision mechanisms. These investigations are challenging since CNNs frequently learn polysemantic channels that encode distinct concepts, making them hard to interpret. To address this, we propose an algorithm to disentangle a specific kind of polysemantic channel into multiple channels, each responding to a single concept. Our approach restructures weights in a CNN, utilizing that different concepts within the same channel exhibit distinct activation patterns in the previous layer. By disentangling these polysemantic features, we enhance the interpretability of CNNs, ultimately improving explanatory techniques such as feature visualizations.

Comments:	Accepted at CVPR 2025 Workshop on Mechanistic Interpretability for Vision (MIV). Code: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2504.12939 [cs.CV]
	(or arXiv:2504.12939v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2504.12939

Submission history

From: Robin Hesse [view email]
[v1] Thu, 17 Apr 2025 13:37:47 UTC (3,136 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Disentangling Polysemantic Channels in Convolutional Neural Networks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Disentangling Polysemantic Channels in Convolutional Neural Networks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators