ULTra: Unveiling Latent Token Interpretability in Transformer Based Understanding

Hosseini, Hesam; Mighan, Ghazal Hosseini; Afzali, Amirabbas; Amini, Sajjad; Houmansadr, Amir

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.12589 (cs)

[Submitted on 15 Nov 2024]

Title:ULTra: Unveiling Latent Token Interpretability in Transformer Based Understanding

Authors:Hesam Hosseini, Ghazal Hosseini Mighan, Amirabbas Afzali, Sajjad Amini, Amir Houmansadr

View PDF HTML (experimental)

Abstract:Transformers have revolutionized Computer Vision (CV) and Natural Language Processing (NLP) through self-attention mechanisms. However, due to their complexity, their latent token representations are often difficult to interpret. We introduce a novel framework that interprets Transformer embeddings, uncovering meaningful semantic patterns within them. Based on this framework, we demonstrate that zero-shot unsupervised semantic segmentation can be performed effectively without any fine-tuning using a model pre-trained for tasks other than segmentation. Our method reveals the inherent capacity of Transformer models for understanding input semantics and achieves state-of-the-art performance in semantic segmentation, outperforming traditional segmentation models. Specifically, our approach achieves an accuracy of 67.2 % and an mIoU of 32.9 % on the COCO-Stuff dataset, as well as an mIoU of 51.9 % on the PASCAL VOC dataset. Additionally, we validate our interpretability framework on LLMs for text summarization, demonstrating its broad applicability and robustness.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2411.12589 [cs.CV]
	(or arXiv:2411.12589v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.12589

Submission history

From: Hesam Hosseini [view email]
[v1] Fri, 15 Nov 2024 19:36:50 UTC (9,254 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ULTra: Unveiling Latent Token Interpretability in Transformer Based Understanding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ULTra: Unveiling Latent Token Interpretability in Transformer Based Understanding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators