Sparse-MLP: A Fully-MLP Architecture with Conditional Computation

Lou, Yuxuan; Xue, Fuzhao; Zheng, Zangwei; You, Yang

Computer Science > Machine Learning

arXiv:2109.02008v2 (cs)

[Submitted on 5 Sep 2021 (v1), revised 8 Sep 2021 (this version, v2), latest version 14 Jan 2022 (v3)]

Title:Sparse-MLP: A Fully-MLP Architecture with Conditional Computation

Authors:Yuxuan Lou, Fuzhao Xue, Zangwei Zheng, Yang You

View PDF

Abstract:Mixture-of-Experts (MoE) with sparse conditional computation has been proved an effective architecture for scaling attention-based models to more parameters with comparable computation cost. In this paper, we propose Sparse-MLP, scaling the recent MLP-Mixer model with sparse MoE layers, to achieve a more computation-efficient architecture. We replace a subset of dense MLP blocks in the MLP-Mixer model with Sparse blocks. In each Sparse block, we apply two stages of MoE layers: one with MLP experts mixing information within channels along image patch dimension, one with MLP experts mixing information within patches along the channel dimension. Besides, to reduce computational cost in routing and improve expert capacity, we design Re-represent layers in each Sparse block. These layers are to re-scale image representations by two simple but effective linear transformations. When pre-training on ImageNet-1k with MoCo v3 algorithm, our models can outperform dense MLP models by 2.5\% on ImageNet Top-1 accuracy with fewer parameters and computational cost. On small-scale downstream image classification tasks, i.e. Cifar10 and Cifar100, our Sparse-MLP can still achieve better performance than baselines.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2109.02008 [cs.LG]
	(or arXiv:2109.02008v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2109.02008

Submission history

From: Yuxuan Lou [view email]
[v1] Sun, 5 Sep 2021 06:43:08 UTC (4,504 KB)
[v2] Wed, 8 Sep 2021 20:10:22 UTC (4,540 KB)
[v3] Fri, 14 Jan 2022 08:06:11 UTC (1,003 KB)

Computer Science > Machine Learning

Title:Sparse-MLP: A Fully-MLP Architecture with Conditional Computation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Sparse-MLP: A Fully-MLP Architecture with Conditional Computation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators