Gauss-Newton Dynamics for Neural Networks: A Riemannian Optimization Perspective

Cayci, Semih

Mathematics > Optimization and Control

arXiv:2412.14031 (math)

[Submitted on 18 Dec 2024 (v1), last revised 19 Dec 2024 (this version, v2)]

Title:Gauss-Newton Dynamics for Neural Networks: A Riemannian Optimization Perspective

Authors:Semih Cayci

View PDF HTML (experimental)

Abstract:We analyze the convergence of Gauss-Newton dynamics for training neural networks with smooth activation functions. In the underparameterized regime, the Gauss-Newton gradient flow induces a Riemannian gradient flow on a low-dimensional, smooth, embedded submanifold of the Euclidean output space. Using tools from Riemannian optimization, we prove \emph{last-iterate} convergence of the Riemannian gradient flow to the optimal in-class predictor at an \emph{exponential rate} that is independent of the conditioning of the Gram matrix, \emph{without} requiring explicit regularization. We further characterize the critical impacts of the neural network scaling factor and the initialization on the convergence behavior. In the overparameterized regime, we show that the Levenberg-Marquardt dynamics with an appropriately chosen damping factor yields robustness to ill-conditioned kernels, analogous to the underparameterized regime. These findings demonstrate the potential of Gauss-Newton methods for efficiently optimizing neural networks, particularly in ill-conditioned problems where kernel and Gram matrices have small singular values.

Subjects:	Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
Cite as:	arXiv:2412.14031 [math.OC]
	(or arXiv:2412.14031v2 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.2412.14031

Submission history

From: Semih Cayci [view email]
[v1] Wed, 18 Dec 2024 16:51:47 UTC (276 KB)
[v2] Thu, 19 Dec 2024 08:21:15 UTC (276 KB)

Mathematics > Optimization and Control

Title:Gauss-Newton Dynamics for Neural Networks: A Riemannian Optimization Perspective

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:Gauss-Newton Dynamics for Neural Networks: A Riemannian Optimization Perspective

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators