CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture

Kalapos, András; Gyires-Tóth, Bálint

doi:10.1109/ICMLA61862.2024.00169

Computer Science > Computer Vision and Pattern Recognition

arXiv:2408.07514 (cs)

[Submitted on 14 Aug 2024 (v1), last revised 11 Mar 2025 (this version, v2)]

Title:CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture

Authors:András Kalapos, Bálint Gyires-Tóth

View PDF HTML (experimental)

Abstract:Self-supervised learning (SSL) has become an important approach in pretraining large neural networks, enabling unprecedented scaling of model and dataset sizes. While recent advances like I-JEPA have shown promising results for Vision Transformers, adapting such methods to Convolutional Neural Networks (CNNs) presents unique challenges. In this paper, we introduce CNN-JEPA, a novel SSL method that successfully applies the joint embedding predictive architecture approach to CNNs. Our method incorporates a sparse CNN encoder to handle masked inputs, a fully convolutional predictor using depthwise separable convolutions, and an improved masking strategy. We demonstrate that CNN-JEPA outperforms I-JEPA with ViT architectures on ImageNet-100, achieving a 73.3% linear top-1 accuracy using a standard ResNet-50 encoder. Compared to other CNN-based SSL methods, CNN-JEPA requires 17-35% less training time for the same number of epochs and approaches the linear and k-NN top-1 accuracies of BYOL, SimCLR, and VICReg. Our approach offers a simpler, more efficient alternative to existing SSL methods for CNNs, requiring minimal augmentations and no separate projector network.

Comments:	Preprint
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2408.07514 [cs.CV]
	(or arXiv:2408.07514v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2408.07514
Journal reference:	2024 International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 2024, pp. 1111-1114
Related DOI:	https://doi.org/10.1109/ICMLA61862.2024.00169

Submission history

From: András Kalapos [view email]
[v1] Wed, 14 Aug 2024 12:48:37 UTC (1,634 KB)
[v2] Tue, 11 Mar 2025 09:42:28 UTC (1,010 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators