InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception

Li, Haijie; Wu, Yanmin; Meng, Jiarui; Gao, Qiankun; Zhang, Zhiyao; Wang, Ronggang; Zhang, Jian

Abstract:3D scene understanding has become an essential area of research with applications in autonomous driving, robotics, and augmented reality. Recently, 3D Gaussian Splatting (3DGS) has emerged as a powerful approach, combining explicit modeling with neural adaptability to provide efficient and detailed scene representations. However, three major challenges remain in leveraging 3DGS for scene understanding: 1) an imbalance between appearance and semantics, where dense Gaussian usage for fine-grained texture modeling does not align with the minimal requirements for semantic attributes; 2) inconsistencies between appearance and semantics, as purely appearance-based Gaussians often misrepresent object boundaries; and 3) reliance on top-down instance segmentation methods, which struggle with uneven category distributions, leading to over- or under-segmentation. In this work, we propose InstanceGaussian, a method that jointly learns appearance and semantic features while adaptively aggregating instances. Our contributions include: i) a novel Semantic-Scaffold-GS representation balancing appearance and semantics to improve feature representations and boundary delineation; ii) a progressive appearance-semantic joint training strategy to enhance stability and segmentation accuracy; and iii) a bottom-up, category-agnostic instance aggregation approach that addresses segmentation challenges through farthest point sampling and connected component analysis. Our approach achieves state-of-the-art performance in category-agnostic, open-vocabulary 3D point-level segmentation, highlighting the effectiveness of the proposed representation and training strategies. Project page: this https URL

Comments:	technical report, 13 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.19235 [cs.CV]
	(or arXiv:2411.19235v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.19235

Computer Science > Computer Vision and Pattern Recognition

Title:InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators