Post-Hoc Concept Disentanglement: From Correlated to Isolated Concept Representations

Erogullari, Eren; Lapuschkin, Sebastian; Samek, Wojciech; Pahde, Frederik

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.05522 (cs)

[Submitted on 7 Mar 2025]

Title:Post-Hoc Concept Disentanglement: From Correlated to Isolated Concept Representations

Authors:Eren Erogullari, Sebastian Lapuschkin, Wojciech Samek, Frederik Pahde

View PDF HTML (experimental)

Abstract:Concept Activation Vectors (CAVs) are widely used to model human-understandable concepts as directions within the latent space of neural networks. They are trained by identifying directions from the activations of concept samples to those of non-concept samples. However, this method often produces similar, non-orthogonal directions for correlated concepts, such as "beard" and "necktie" within the CelebA dataset, which frequently co-occur in images of men. This entanglement complicates the interpretation of concepts in isolation and can lead to undesired effects in CAV applications, such as activation steering. To address this issue, we introduce a post-hoc concept disentanglement method that employs a non-orthogonality loss, facilitating the identification of orthogonal concept directions while preserving directional correctness. We evaluate our approach with real-world and controlled correlated concepts in CelebA and a synthetic FunnyBirds dataset with VGG16 and ResNet18 architectures. We further demonstrate the superiority of orthogonalized concept representations in activation steering tasks, allowing (1) the insertion of isolated concepts into input images through generative models and (2) the removal of concepts for effective shortcut suppression with reduced impact on correlated concepts in comparison to baseline CAVs.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2503.05522 [cs.CV]
	(or arXiv:2503.05522v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.05522

Submission history

From: Eren Erogullari [view email]
[v1] Fri, 7 Mar 2025 15:45:43 UTC (6,672 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Post-Hoc Concept Disentanglement: From Correlated to Isolated Concept Representations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Post-Hoc Concept Disentanglement: From Correlated to Isolated Concept Representations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators