Open-Vocabulary Octree-Graph for 3D Scene Understanding

Wang, Zhigang; Su, Yifei; Li, Chenhui; Wang, Dong; Huang, Yan; Zhao, Bin; Li, Xuelong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.16253 (cs)

[Submitted on 25 Nov 2024]

Title:Open-Vocabulary Octree-Graph for 3D Scene Understanding

Authors:Zhigang Wang, Yifei Su, Chenhui Li, Dong Wang, Yan Huang, Bin Zhao, Xuelong Li

View PDF HTML (experimental)

Abstract:Open-vocabulary 3D scene understanding is indispensable for embodied agents. Recent works leverage pretrained vision-language models (VLMs) for object segmentation and project them to point clouds to build 3D maps. Despite progress, a point cloud is a set of unordered coordinates that requires substantial storage space and does not directly convey occupancy information or spatial relation, making existing methods inefficient for downstream tasks, e.g., path planning and complex text-based object retrieval. To address these issues, we propose Octree-Graph, a novel scene representation for open-vocabulary 3D scene understanding. Specifically, a Chronological Group-wise Segment Merging (CGSM) strategy and an Instance Feature Aggregation (IFA) algorithm are first designed to get 3D instances and corresponding semantic features. Subsequently, an adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape. Finally, the Octree-Graph is constructed where each adaptive-octree acts as a graph node, and edges describe the spatial relations among nodes. Extensive experiments on various tasks are conducted on several widely-used datasets, demonstrating the versatility and effectiveness of our method.

Comments:	11pages,7figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.16253 [cs.CV]
	(or arXiv:2411.16253v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.16253

Submission history

From: Yifei Su [view email]
[v1] Mon, 25 Nov 2024 10:14:10 UTC (23,742 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Open-Vocabulary Octree-Graph for 3D Scene Understanding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Open-Vocabulary Octree-Graph for 3D Scene Understanding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators