VSSD: Vision Mamba with Non-Causal State Space Duality

Shi, Yuheng; Dong, Minjing; Li, Mingjia; Xu, Chang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.18559 (cs)

[Submitted on 26 Jul 2024 (v1), last revised 4 Aug 2024 (this version, v2)]

Title:VSSD: Vision Mamba with Non-Causal State Space Duality

Authors:Yuheng Shi, Minjing Dong, Mingjia Li, Chang Xu

View PDF HTML (experimental)

Abstract:Vision transformers have significantly advanced the field of computer vision, offering robust modeling capabilities and global receptive field. However, their high computational demands limit their applicability in processing long sequences. To tackle this issue, State Space Models (SSMs) have gained prominence in vision tasks as they offer linear computational complexity. Recently, State Space Duality (SSD), an improved variant of SSMs, was introduced in Mamba2 to enhance model performance and efficiency. However, the inherent causal nature of SSD/SSMs restricts their applications in non-causal vision tasks. To address this limitation, we introduce Visual State Space Duality (VSSD) model, which has a non-causal format of SSD. Specifically, we propose to discard the magnitude of interactions between the hidden state and tokens while preserving their relative weights, which relieves the dependencies of token contribution on previous tokens. Together with the involvement of multi-scan strategies, we show that the scanning results can be integrated to achieve non-causality, which not only improves the performance of SSD in vision tasks but also enhances its efficiency. We conduct extensive experiments on various benchmarks including image classification, detection, and segmentation, where VSSD surpasses existing state-of-the-art SSM-based models. Code and weights are available at \url{this https URL}.

Comments:	16 pages, 5 figures, 7 tables
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.18559 [cs.CV]
	(or arXiv:2407.18559v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.18559

Submission history

From: Yuheng Shi [view email]
[v1] Fri, 26 Jul 2024 07:16:52 UTC (2,483 KB)
[v2] Sun, 4 Aug 2024 04:08:59 UTC (2,481 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:VSSD: Vision Mamba with Non-Causal State Space Duality

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:VSSD: Vision Mamba with Non-Causal State Space Duality

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators