MixMask: Revisiting Masking Strategy for Siamese ConvNets

Vishniakov, Kirill; Xing, Eric; Shen, Zhiqiang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2210.11456 (cs)

[Submitted on 20 Oct 2022 (v1), last revised 11 Nov 2024 (this version, v4)]

Title:MixMask: Revisiting Masking Strategy for Siamese ConvNets

Authors:Kirill Vishniakov, Eric Xing, Zhiqiang Shen

View PDF HTML (experimental)

Abstract:The recent progress in self-supervised learning has successfully combined Masked Image Modeling (MIM) with Siamese Networks, harnessing the strengths of both methodologies. Nonetheless, certain challenges persist when integrating conventional erase-based masking within Siamese ConvNets. Two primary concerns are: (1) The continuous data processing nature of ConvNets, which doesn't allow for the exclusion of non-informative masked regions, leading to reduced training efficiency compared to ViT architecture; (2) The misalignment between erase-based masking and the contrastive-based objective, distinguishing it from the MIM technique. To address these challenges, this work introduces a novel filling-based masking approach, termed \textbf{MixMask}. The proposed method replaces erased areas with content from a different image, effectively countering the information depletion seen in traditional masking methods. Additionally, we unveil an adaptive loss function that captures the semantics of the newly patched views, ensuring seamless integration within the architectural framework. We empirically validate the effectiveness of our approach through comprehensive experiments across various datasets and application scenarios. The findings underscore our framework's enhanced performance in areas such as linear probing, semi-supervised and supervised finetuning, object detection and segmentation. Notably, our method surpasses the MSCN, establishing MixMask as a more advantageous masking solution for Siamese ConvNets. Our code and models are publicly available at this https URL.

Comments:	Technical report. Code is available at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2210.11456 [cs.CV]
	(or arXiv:2210.11456v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2210.11456
Journal reference:	BMVC 2024

Submission history

From: Kirill Vishniakov [view email]
[v1] Thu, 20 Oct 2022 17:54:03 UTC (3,533 KB)
[v2] Thu, 24 Nov 2022 17:56:11 UTC (3,962 KB)
[v3] Tue, 21 Mar 2023 16:57:57 UTC (8,452 KB)
[v4] Mon, 11 Nov 2024 14:00:40 UTC (1,458 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MixMask: Revisiting Masking Strategy for Siamese ConvNets

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MixMask: Revisiting Masking Strategy for Siamese ConvNets

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators