Two-Level K-FAC Preconditioning for Deep Learning

Tselepidis, Nikolaos; Kohler, Jonas; Orvieto, Antonio

Abstract:In the context of deep learning, many optimization methods use gradient covariance information to accelerate the convergence of Stochastic Gradient Descent. In particular, starting with Adagrad~\cite{duchi2011adaptive}, a seemingly endless line of research advocates the use of diagonal approximations of the so-called empirical Fisher matrix in stochastic gradient-based algorithms, with the most prominent one arguably being Adam. However, in recent years, several works cast doubt on the theoretical basis for preconditioning with the empirical Fisher matrix, and it has been shown that more sophisticated approximations of the actual Fisher matrix more closely resemble the theoretically well-motivated Natural Gradient Descent. One particularly successful variant of such methods is the so-called K-FAC optimizer~\cite{martens2015optimizing}, which uses a Kronecker-factored block-diagonal Fisher approximation as preconditioner. In this work, drawing inspiration from two-level domain decomposition methods used as preconditioners in the field of scientific computing, we extend K-FAC by enriching it with off-diagonal (i.e. global) curvature information in a computationally efficient way. We achieve this by adding a coarse-space correction term to the preconditioner, which captures the global Fisher information matrix at a coarser scale. We present a small set of experimental results suggesting improved convergence behaviour of our proposed method.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2011.00573 [cs.LG]
	(or arXiv:2011.00573v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2011.00573

Computer Science > Machine Learning

Title:Two-Level K-FAC Preconditioning for Deep Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators