Characterizing stable regions in the residual stream of LLMs

Janiak, Jett; Karwowski, Jacek; Mangat, Chatrik Singh; Giglemiani, Giorgi; Petrova, Nora; Heimersheim, Stefan

Computer Science > Machine Learning

arXiv:2409.17113 (cs)

[Submitted on 25 Sep 2024 (v1), last revised 26 Sep 2024 (this version, v2)]

Title:Characterizing stable regions in the residual stream of LLMs

Authors:Jett Janiak, Jacek Karwowski, Chatrik Singh Mangat, Giorgi Giglemiani, Nora Petrova, Stefan Heimersheim

View PDF HTML (experimental)

Abstract:We identify "stable regions" in the residual stream of Transformers, where the model's output remains insensitive to small activation changes, but exhibits high sensitivity at region boundaries. These regions emerge during training and become more defined as training progresses or model size increases. The regions appear to be much larger than previously studied polytopes. Our analysis suggests that these stable regions align with semantic distinctions, where similar prompts cluster within regions, and activations from the same region lead to similar next token predictions. This work provides a promising research direction for understanding the complexity of neural networks, shedding light on training dynamics, and advancing interpretability.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2409.17113 [cs.LG]
	(or arXiv:2409.17113v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2409.17113

Submission history

From: Stefan Heimersheim [view email]
[v1] Wed, 25 Sep 2024 17:27:02 UTC (1,087 KB)
[v2] Thu, 26 Sep 2024 13:30:51 UTC (2,629 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2024-09

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Machine Learning

Title:Characterizing stable regions in the residual stream of LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Characterizing stable regions in the residual stream of LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators