Understanding Masked Autoencoders From a Local Contrastive Perspective

Yue, Xiaoyu; Bai, Lei; Wei, Meng; Pang, Jiangmiao; Liu, Xihui; Zhou, Luping; Ouyang, Wanli

Computer Science > Computer Vision and Pattern Recognition

arXiv:2310.01994 (cs)

[Submitted on 3 Oct 2023 (v1), last revised 8 Dec 2023 (this version, v2)]

Title:Understanding Masked Autoencoders From a Local Contrastive Perspective

Authors:Xiaoyu Yue, Lei Bai, Meng Wei, Jiangmiao Pang, Xihui Liu, Luping Zhou, Wanli Ouyang

View PDF HTML (experimental)

Abstract:Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies. However, despite achieving state-of-the-art performance across various downstream vision tasks, the underlying mechanisms that drive MAE's efficacy are less well-explored compared to the canonical contrastive learning paradigm. In this paper, we first propose a local perspective to explicitly extract a local contrastive form from MAE's reconstructive objective at the patch level. And then we introduce a new empirical framework, called Local Contrastive MAE (LC-MAE), to analyze both reconstructive and contrastive aspects of MAE. LC-MAE reveals that MAE learns invariance to random masking and ensures distribution consistency between the learned token embeddings and the original images. Furthermore, we dissect the contribution of the decoder and random masking to MAE's success, revealing both the decoder's learning mechanism and the dual role of random masking as data augmentation and effective receptive field restriction. Our experimental analysis sheds light on the intricacies of MAE and summarizes some useful design methodologies, which can inspire more powerful visual self-supervised methods.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2310.01994 [cs.CV]
	(or arXiv:2310.01994v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2310.01994

Submission history

From: Xiaoyu Yue [view email]
[v1] Tue, 3 Oct 2023 12:08:15 UTC (525 KB)
[v2] Fri, 8 Dec 2023 08:07:29 UTC (735 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Understanding Masked Autoencoders From a Local Contrastive Perspective

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Understanding Masked Autoencoders From a Local Contrastive Perspective

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators