X-Align: Cross-Modal Cross-View Alignment for Bird's-Eye-View Segmentation

Borse, Shubhankar; Klingner, Marvin; Kumar, Varun Ravi; Cai, Hong; Almuzairee, Abdulaziz; Yogamani, Senthil; Porikli, Fatih

Computer Science > Computer Vision and Pattern Recognition

arXiv:2210.06778 (cs)

[Submitted on 13 Oct 2022 (v1), last revised 31 Oct 2022 (this version, v2)]

Title:X-Align: Cross-Modal Cross-View Alignment for Bird's-Eye-View Segmentation

Authors:Shubhankar Borse, Marvin Klingner, Varun Ravi Kumar, Hong Cai, Abdulaziz Almuzairee, Senthil Yogamani, Fatih Porikli

View PDF

Abstract:Bird's-eye-view (BEV) grid is a common representation for the perception of road components, e.g., drivable area, in autonomous driving. Most existing approaches rely on cameras only to perform segmentation in BEV space, which is fundamentally constrained by the absence of reliable depth information. Latest works leverage both camera and LiDAR modalities, but sub-optimally fuse their features using simple, concatenation-based mechanisms.
In this paper, we address these problems by enhancing the alignment of the unimodal features in order to aid feature fusion, as well as enhancing the alignment between the cameras' perspective view (PV) and BEV representations. We propose X-Align, a novel end-to-end cross-modal and cross-view learning framework for BEV segmentation consisting of the following components: (i) a novel Cross-Modal Feature Alignment (X-FA) loss, (ii) an attention-based Cross-Modal Feature Fusion (X-FF) module to align multi-modal BEV features implicitly, and (iii) an auxiliary PV segmentation branch with Cross-View Segmentation Alignment (X-SA) losses to improve the PV-to-BEV transformation. We evaluate our proposed method across two commonly used benchmark datasets, i.e., nuScenes and KITTI-360. Notably, X-Align significantly outperforms the state-of-the-art by 3 absolute mIoU points on nuScenes. We also provide extensive ablation studies to demonstrate the effectiveness of the individual components.

Comments:	Accepted to WACV 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2210.06778 [cs.CV]
	(or arXiv:2210.06778v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2210.06778

Submission history

From: Shubhankar Mangesh Borse [view email]
[v1] Thu, 13 Oct 2022 06:42:46 UTC (15,259 KB)
[v2] Mon, 31 Oct 2022 17:58:37 UTC (15,260 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:X-Align: Cross-Modal Cross-View Alignment for Bird's-Eye-View Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:X-Align: Cross-Modal Cross-View Alignment for Bird's-Eye-View Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators