Powerful Design of Small Vision Transformer on CIFAR10

Wu, Gent

Computer Science > Machine Learning

arXiv:2501.06220 (cs)

[Submitted on 7 Jan 2025]

Title:Powerful Design of Small Vision Transformer on CIFAR10

Authors:Gent Wu

View PDF HTML (experimental)

Abstract:Vision Transformers (ViTs) have demonstrated remarkable success on large-scale datasets, but their performance on smaller datasets often falls short of convolutional neural networks (CNNs). This paper explores the design and optimization of Tiny ViTs for small datasets, using CIFAR-10 as a benchmark. We systematically evaluate the impact of data augmentation, patch token initialization, low-rank compression, and multi-class token strategies on model performance. Our experiments reveal that low-rank compression of queries in Multi-Head Latent Attention (MLA) incurs minimal performance loss, indicating redundancy in ViTs. Additionally, introducing multiple CLS tokens improves global representation capacity, boosting accuracy. These findings provide a comprehensive framework for optimizing Tiny ViTs, offering practical insights for efficient and effective designs. Code is available at this https URL.

Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.06220 [cs.LG]
	(or arXiv:2501.06220v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2501.06220

Submission history

From: Jiantao Wu [view email]
[v1] Tue, 7 Jan 2025 00:41:34 UTC (521 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2025-01

Change to browse by:

cs
cs.CV

References & Citations

export BibTeX citation

Computer Science > Machine Learning

Title:Powerful Design of Small Vision Transformer on CIFAR10

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Powerful Design of Small Vision Transformer on CIFAR10

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators