Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks

Pan, Jiachun; Zhou, Pan; Yan, Shuicheng

Computer Science > Machine Learning

arXiv:2206.03826 (cs)

[Submitted on 8 Jun 2022 (v1), last revised 11 Feb 2023 (this version, v5)]

Title:Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks

Authors:Jiachun Pan, Pan Zhou, Shuicheng Yan

View PDF

Abstract:For unsupervised pretraining, mask-reconstruction pretraining (MRP) approaches, e.g. MAE and data2vec, randomly mask input patches and then reconstruct the pixels or semantic features of these masked patches via an auto-encoder. Then for a downstream task, supervised fine-tuning the pretrained encoder remarkably surpasses the conventional ``supervised learning'' (SL) trained from scratch. However, it is still unclear 1) how MRP performs semantic feature learning in the pretraining phase and 2) why it helps in downstream tasks. To solve these problems, we first theoretically show that on an auto-encoder of a two/one-layered convolution encoder/decoder, MRP can capture all discriminative features of each potential semantic class in the pretraining dataset. Then considering the fact that the pretraining dataset is of huge size and high diversity and thus covers most features in downstream dataset, in fine-tuning phase, the pretrained encoder can capture as much features as it can in downstream datasets, and would not lost these features with theoretical guarantees. In contrast, SL only randomly captures some features due to lottery ticket hypothesis. So MRP provably achieves better performance than SL on the classification tasks. Experimental results testify to our data assumptions and also our theoretical implications.

Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Cite as:	arXiv:2206.03826 [cs.LG]
	(or arXiv:2206.03826v5 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2206.03826

Submission history

From: Jiachun Pan [view email]
[v1] Wed, 8 Jun 2022 11:49:26 UTC (15,368 KB)
[v2] Thu, 9 Jun 2022 01:46:19 UTC (1 KB) (withdrawn)
[v3] Fri, 10 Jun 2022 00:37:44 UTC (15,370 KB)
[v4] Tue, 14 Jun 2022 14:06:48 UTC (15,367 KB)
[v5] Sat, 11 Feb 2023 13:19:06 UTC (6,733 KB)

Computer Science > Machine Learning

Title:Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators