Meta-Learning with Warped Gradient Descent

Flennerhag, Sebastian; Rusu, Andrei A.; Pascanu, Razvan; Yin, Hujun; Hadsell, Raia

Computer Science > Machine Learning

arXiv:1909.00025v1 (cs)

[Submitted on 30 Aug 2019 (this version), latest version 18 Feb 2020 (v2)]

Title:Meta-Learning with Warped Gradient Descent

Authors:Sebastian Flennerhag, Andrei A. Rusu, Razvan Pascanu, Hujun Yin, Raia Hadsell

View PDF

Abstract:A versatile and effective approach to meta-learning is to infer a gradient-based up-date rule directly from data that promotes rapid learning of new tasks from the same distribution. Current methods rely on backpropagating through the learning process, limiting their scope to few-shot learning. In this work, we introduce Warped Gradient Descent (WarpGrad), a family of modular optimisers that can scale to arbitrary adaptation processes. WarpGrad methods meta-learn to warp task loss surfaces across the joint task-parameter distribution to facilitate gradient descent, which is achieved by a reparametrisation of neural networks that interleaves warp layers in the architecture. These layers are shared across task learners and fixed during adaptation; they represent a projection of task parameters into a meta-learned space that is conducive to task adaptation and standard backpropagation induces a form of gradient preconditioning. WarpGrad methods are computationally efficient and easy to implement as they rely on parameter sharing and backpropagation. They are readily combined with other meta-learners and can scale both in terms of model size and length of adaptation trajectories as meta-learning warp parameters do not require differentiation through task adaptation processes. We show empirically that WarpGrad optimisers meta-learn a warped space where gradient descent is well behaved, with faster convergence and better performance in a variety of settings, including few-shot, standard supervised, continual, and reinforcement learning.

Comments:	27 pages, 11 figures, 4 tables
Subjects:	Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Cite as:	arXiv:1909.00025 [cs.LG]
	(or arXiv:1909.00025v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1909.00025

Submission history

From: Sebastian Flennerhag [view email]
[v1] Fri, 30 Aug 2019 18:27:35 UTC (2,177 KB)
[v2] Tue, 18 Feb 2020 08:57:58 UTC (2,178 KB)

Computer Science > Machine Learning

Title:Meta-Learning with Warped Gradient Descent

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Meta-Learning with Warped Gradient Descent

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators