Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection

Zhang, Guowen; Fan, Lue; He, Chenhang; Lei, Zhen; Zhang, Zhaoxiang; Zhang, Lei

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.10700 (cs)

[Submitted on 15 Jun 2024 (v1), last revised 18 Jun 2024 (this version, v2)]

Title:Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection

Authors:Guowen Zhang, Lue Fan, Chenhang He, Zhen Lei, Zhaoxiang Zhang, Lei Zhang

View PDF HTML (experimental)

Abstract:Serialization-based methods, which serialize the 3D voxels and group them into multiple sequences before inputting to Transformers, have demonstrated their effectiveness in 3D object detection. However, serializing 3D voxels into 1D sequences will inevitably sacrifice the voxel spatial proximity. Such an issue is hard to be addressed by enlarging the group size with existing serialization-based methods due to the quadratic complexity of Transformers with feature sizes. Inspired by the recent advances of state space models (SSMs), we present a Voxel SSM, termed as Voxel Mamba, which employs a group-free strategy to serialize the whole space of voxels into a single sequence. The linear complexity of SSMs encourages our group-free design, alleviating the loss of spatial proximity of voxels. To further enhance the spatial proximity, we propose a Dual-scale SSM Block to establish a hierarchical structure, enabling a larger receptive field in the 1D serialization curve, as well as more complete local regions in 3D space. Moreover, we implicitly apply window partition under the group-free framework by positional encoding, which further enhances spatial proximity by encoding voxel positional information. Our experiments on Waymo Open Dataset and nuScenes dataset show that Voxel Mamba not only achieves higher accuracy than state-of-the-art methods, but also demonstrates significant advantages in computational efficiency.

Comments:	10 pages, 4 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Cite as:	arXiv:2406.10700 [cs.CV]
	(or arXiv:2406.10700v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.10700

Submission history

From: Guowen Zhang [view email]
[v1] Sat, 15 Jun 2024 17:45:07 UTC (1,182 KB)
[v2] Tue, 18 Jun 2024 17:49:56 UTC (1,184 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators