Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials

Furuta, Hiroki; Minegishi, Gouki; Iwasawa, Yusuke; Matsuo, Yutaka

Computer Science > Machine Learning

arXiv:2402.16726 (cs)

[Submitted on 26 Feb 2024 (v1), last revised 30 Dec 2024 (this version, v4)]

Title:Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials

Authors:Hiroki Furuta, Gouki Minegishi, Yusuke Iwasawa, Yutaka Matsuo

View PDF

Abstract:Grokking has been actively explored to reveal the mystery of delayed generalization and identifying interpretable representations and algorithms inside the grokked models is a suggestive hint to understanding its mechanism. Grokking on modular addition has been known to implement Fourier representation and its calculation circuits with trigonometric identities in Transformers. Considering the periodicity in modular arithmetic, the natural question is to what extent these explanations and interpretations hold for the grokking on other modular operations beyond addition. For a closer look, we first hypothesize that any modular operations can be characterized with distinctive Fourier representation or internal circuits, grokked models obtain common features transferable among similar operations, and mixing datasets with similar operations promotes grokking. Then, we extensively examine them by learning Transformers on complex modular arithmetic tasks, including polynomials. Our Fourier analysis and novel progress measure for modular arithmetic, Fourier Frequency Density and Fourier Coefficient Ratio, characterize distinctive internal representations of grokked models per modular operation; for instance, polynomials often result in the superposition of the Fourier components seen in elementary arithmetic, but clear patterns do not emerge in challenging non-factorizable polynomials. In contrast, our ablation study on the pre-grokked models reveals that the transferability among the models grokked with each operation can be only limited to specific combinations, such as from elementary arithmetic to linear expressions. Moreover, some multi-task mixtures may lead to co-grokking -- where grokking simultaneously happens for all the tasks -- and accelerate generalization, while others may not find optimal solutions. We provide empirical steps towards the interpretability of internal circuits.

Comments:	Published at Transactions on Machine Learning Research (TMLR), Code: this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2402.16726 [cs.LG]
	(or arXiv:2402.16726v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.16726

Submission history

From: Hiroki Furuta [view email]
[v1] Mon, 26 Feb 2024 16:48:12 UTC (19,645 KB)
[v2] Tue, 27 Feb 2024 04:58:24 UTC (19,645 KB)
[v3] Mon, 18 Nov 2024 02:56:27 UTC (36,682 KB)
[v4] Mon, 30 Dec 2024 11:00:27 UTC (36,683 KB)

Computer Science > Machine Learning

Title:Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators