Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking

Gu, Zihan; Chen, Ruoyu; Zhang, Hua; Hu, Yue; Cao, Xiaochun

Computer Science > Machine Learning

arXiv:2504.03162 (cs)

[Submitted on 4 Apr 2025 (v1), last revised 14 Apr 2025 (this version, v2)]

Title:Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking

Authors:Zihan Gu, Ruoyu Chen, Hua Zhang, Yue Hu, Xiaochun Cao

View PDF HTML (experimental)

Abstract:Grokking, referring to the abrupt improvement in test accuracy after extended overfitting, offers valuable insights into the mechanisms of model generalization. Existing researches based on progress measures imply that grokking relies on understanding the optimization dynamics when the loss function is dominated solely by the weight decay term. However, we find that this optimization merely leads to token uniformity, which is not a sufficient condition for grokking. In this work, we investigate the grokking mechanism underlying the Transformer in the task of prime number operations. Based on theoretical analysis and experimental validation, we present the following insights: (i) The weight decay term encourages uniformity across all tokens in the embedding space when it is minimized. (ii) The occurrence of grokking is jointly determined by the uniformity of the embedding space and the distribution of the training dataset. Building on these insights, we provide a unified perspective for understanding various previously proposed progress measures and introduce a novel, concise, and effective progress measure that could trace the changes in test loss more accurately. Finally, to demonstrate the versatility of our theoretical framework, we design a dedicated dataset to validate our theory on ResNet-18, successfully showcasing the occurrence of grokking. The code is released at this https URL.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2504.03162 [cs.LG]
	(or arXiv:2504.03162v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2504.03162

Submission history

From: Ruoyu Chen [view email]
[v1] Fri, 4 Apr 2025 04:42:38 UTC (13,285 KB)
[v2] Mon, 14 Apr 2025 08:32:27 UTC (13,285 KB)

Computer Science > Machine Learning

Title:Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators