Lookaround Optimizer: $k$ steps around, 1 step average

Zhang, Jiangtao; Liu, Shunyu; Song, Jie; Zhu, Tongtian; Xu, Zhengqi; Song, Mingli

Computer Science > Computer Vision and Pattern Recognition

arXiv:2306.07684 (cs)

[Submitted on 13 Jun 2023 (v1), last revised 2 Nov 2023 (this version, v3)]

Title:Lookaround Optimizer: $k$ steps around, 1 step average

Authors:Jiangtao Zhang, Shunyu Liu, Jie Song, Tongtian Zhu, Zhengqi Xu, Mingli Song

View PDF

Abstract:Weight Average (WA) is an active research topic due to its simplicity in ensembling deep networks and the effectiveness in promoting generalization. Existing weight average approaches, however, are often carried out along only one training trajectory in a post-hoc manner (i.e., the weights are averaged after the entire training process is finished), which significantly degrades the diversity between networks and thus impairs the effectiveness. In this paper, inspired by weight average, we propose Lookaround, a straightforward yet effective SGD-based optimizer leading to flatter minima with better generalization. Specifically, Lookaround iterates two steps during the whole training period: the around step and the average step. In each iteration, 1) the around step starts from a common point and trains multiple networks simultaneously, each on transformed data by a different data augmentation, and 2) the average step averages these trained networks to get the averaged network, which serves as the starting point for the next iteration. The around step improves the functionality diversity while the average step guarantees the weight locality of these networks during the whole training, which is essential for WA to work. We theoretically explain the superiority of Lookaround by convergence analysis, and make extensive experiments to evaluate Lookaround on popular benchmarks including CIFAR and ImageNet with both CNNs and ViTs, demonstrating clear superiority over state-of-the-arts. Our code is available at this https URL.

Comments:	Accepted to NeurIPS 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2306.07684 [cs.CV]
	(or arXiv:2306.07684v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2306.07684

Submission history

From: Jiangtao Zhang [view email]
[v1] Tue, 13 Jun 2023 10:55:20 UTC (1,819 KB)
[v2] Sun, 8 Oct 2023 06:41:12 UTC (1,508 KB)
[v3] Thu, 2 Nov 2023 15:24:29 UTC (479 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Lookaround Optimizer: $k$ steps around, 1 step average

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Lookaround Optimizer: $k$ steps around, 1 step average

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators