Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models

Tian, Bowei; Lyu, Xuntao; Liu, Meng; Wang, Hongyi; Li, Ang

Computer Science > Machine Learning

arXiv:2503.22720 (cs)

[Submitted on 25 Mar 2025]

Title:Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models

Authors:Bowei Tian, Xuntao Lyu, Meng Liu, Hongyi Wang, Ang Li

View PDF HTML (experimental)

Abstract:Representation Engineering (RepE) has emerged as a powerful paradigm for enhancing AI transparency by focusing on high-level representations rather than individual neurons or circuits. It has proven effective in improving interpretability and control, showing that representations can emerge, propagate, and shape final model outputs in large language models (LLMs). However, in Vision-Language Models (VLMs), visual input can override factual linguistic knowledge, leading to hallucinated responses that contradict reality. To address this challenge, we make the first attempt to extend RepE to VLMs, analyzing how multimodal representations are preserved and transformed. Building on our findings and drawing inspiration from successful RepE applications, we develop a theoretical framework that explains the stability of neural activity across layers using the principal eigenvector, uncovering the underlying mechanism of RepE. We empirically validate these instrinsic properties, demonstrating their broad applicability and significance. By bridging theoretical insights with empirical validation, this work transforms RepE from a descriptive tool into a structured theoretical framework, opening new directions for improving AI robustness, fairness, and transparency.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2503.22720 [cs.LG]
	(or arXiv:2503.22720v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2503.22720

Submission history

From: Bowei Tian [view email]
[v1] Tue, 25 Mar 2025 20:32:15 UTC (4,813 KB)

Computer Science > Machine Learning

Title:Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators