Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise

Kothapalli, Vignesh; Pang, Tianyu; Deng, Shenyang; Liu, Zongmin; Yang, Yaoqing

Computer Science > Machine Learning

arXiv:2406.04657 (cs)

[Submitted on 7 Jun 2024]

Title:Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise

Authors:Vignesh Kothapalli, Tianyu Pang, Shenyang Deng, Zongmin Liu, Yaoqing Yang

View PDF

Abstract:Modern training strategies of deep neural networks (NNs) tend to induce a heavy-tailed (HT) spectra of layer weights. Extensive efforts to study this phenomenon have found that NNs with HT weight spectra tend to generalize well. A prevailing notion for the occurrence of such HT spectra attributes gradient noise during training as a key contributing factor. Our work shows that gradient noise is unnecessary for generating HT weight spectra: two-layer NNs trained with full-batch Gradient Descent/Adam can exhibit HT spectra in their weights after finite training steps. To this end, we first identify the scale of the learning rate at which one step of full-batch Adam can lead to feature learning in the shallow NN, particularly when learning a single index teacher model. Next, we show that multiple optimizer steps with such (sufficiently) large learning rates can transition the bulk of the weight's spectra into an HT distribution. To understand this behavior, we present a novel perspective based on the singular vectors of the weight matrices and optimizer updates. We show that the HT weight spectrum originates from the `spike', which is generated from feature learning and interacts with the main bulk to generate an HT spectrum. Finally, we analyze the correlations between the HT weight spectra and generalization after multiple optimizer updates with varying learning rates.

Comments:	31 pages, 37 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Statistics Theory (math.ST); Machine Learning (stat.ML)
Cite as:	arXiv:2406.04657 [cs.LG]
	(or arXiv:2406.04657v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.04657

Submission history

From: Vignesh Kothapalli [view email]
[v1] Fri, 7 Jun 2024 05:51:57 UTC (4,907 KB)

Computer Science > Machine Learning

Title:Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators