Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition

Huang, Yufei; Hu, Shengding; Han, Xu; Liu, Zhiyuan; Sun, Maosong

Computer Science > Machine Learning

arXiv:2402.15175 (cs)

[Submitted on 23 Feb 2024 (v1), last revised 26 Feb 2024 (this version, v2)]

Title:Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition

Authors:Yufei Huang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun

View PDF HTML (experimental)

Abstract:Recent studies have uncovered intriguing phenomena in deep learning, such as grokking, double descent, and emergent abilities in large language models, which challenge human intuition and are crucial for a deeper understanding of neural models. In this paper, we present a comprehensive framework that provides a unified view of these three phenomena, focusing on the competition between memorization and generalization circuits. This approach, initially employed to explain grokking, is extended in our work to encompass a wider range of model sizes and training data volumes. Our framework delineates four distinct training dynamics, each depending on varying combinations of model size and training data quantity. Utilizing this framework, we provide a detailed analysis of the double descent phenomenon and propose two verifiable predictions regarding its occurrence, both substantiated by our experimental results. Moreover, we expand our framework to the multi-task learning paradigm, demonstrating how algorithm tasks can be turned into emergent abilities. This offers a novel perspective to understand emergent abilities in Large Language Models.

Comments:	13 pages, 10 figures
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2402.15175 [cs.LG]
	(or arXiv:2402.15175v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.15175

Submission history

From: Yufei Huang [view email]
[v1] Fri, 23 Feb 2024 08:14:36 UTC (264 KB)
[v2] Mon, 26 Feb 2024 02:49:16 UTC (271 KB)

Computer Science > Machine Learning

Title:Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators