Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures

Eschenhagen, Runa; Immer, Alexander; Turner, Richard E.; Schneider, Frank; Hennig, Philipp

Computer Science > Machine Learning

arXiv:2311.00636 (cs)

[Submitted on 1 Nov 2023 (v1), last revised 11 Jan 2024 (this version, v2)]

Title:Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures

Authors:Runa Eschenhagen, Alexander Immer, Richard E. Turner, Frank Schneider, Philipp Hennig

View PDF HTML (experimental)

Abstract:The core components of many modern neural network architectures, such as transformers, convolutional, or graph neural networks, can be expressed as linear layers with $\textit{weight-sharing}$. Kronecker-Factored Approximate Curvature (K-FAC), a second-order optimisation method, has shown promise to speed up neural network training and thereby reduce computational costs. However, there is currently no framework to apply it to generic architectures, specifically ones with linear weight-sharing layers. In this work, we identify two different settings of linear weight-sharing layers which motivate two flavours of K-FAC -- $\textit{expand}$ and $\textit{reduce}$. We show that they are exact for deep linear networks with weight-sharing in their respective setting. Notably, K-FAC-reduce is generally faster than K-FAC-expand, which we leverage to speed up automatic hyperparameter selection via optimising the marginal likelihood for a Wide ResNet. Finally, we observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer. However, both variations are able to reach a fixed validation metric target in $50$-$75\%$ of the number of steps of a first-order reference run, which translates into a comparable improvement in wall-clock time. This highlights the potential of applying K-FAC to modern neural network architectures.

Comments:	NeurIPS 2023
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2311.00636 [cs.LG]
	(or arXiv:2311.00636v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2311.00636

Submission history

From: Runa Eschenhagen [view email]
[v1] Wed, 1 Nov 2023 16:37:00 UTC (2,058 KB)
[v2] Thu, 11 Jan 2024 17:32:26 UTC (1,664 KB)

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Machine Learning

Title:Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Machine Learning

Title:Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators