ObPose: Leveraging Canonical Pose for Object-Centric Scene Inference in 3D

Wu, Yizhe; Jones, Oiwi Parker; Posner, Ingmar

Computer Science > Computer Vision and Pattern Recognition

arXiv:2206.03591v1 (cs)

[Submitted on 7 Jun 2022 (this version), latest version 9 Jun 2023 (v3)]

Title:ObPose: Leveraging Canonical Pose for Object-Centric Scene Inference in 3D

Authors:Yizhe Wu, Oiwi Parker Jones, Ingmar Posner

View PDF

Abstract:We present ObPose, an unsupervised object-centric generative model that learns to segment 3D objects from RGB-D video in an unsupervised manner. Inspired by prior art in 2D representation learning, ObPose considers a factorised latent space, separately encoding object-wise location (where) and appearance (what) information. In particular, ObPose leverages an object's canonical pose, defined via a minimum volume principle, as a novel inductive bias for learning the where component. To achieve this, we propose an efficient, voxelised approximation approach to recover the object shape directly from a neural radiance field (NeRF). As a consequence, ObPose models scenes as compositions of NeRFs representing individual objects. When evaluated on the YCB dataset for unsupervised scene segmentation, ObPose outperforms the current state-of-the-art in 3D scene inference (ObSuRF) by a significant margin in terms of segmentation quality for both video inputs as well as for multi-view static scenes. In addition, the design choices made in the ObPose encoder are validated with relevant ablations.

Comments:	16 pages, 6 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
MSC classes:	68T07
Cite as:	arXiv:2206.03591 [cs.CV]
	(or arXiv:2206.03591v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2206.03591

Submission history

From: Yizhe Wu [view email]
[v1] Tue, 7 Jun 2022 21:15:18 UTC (10,394 KB)
[v2] Mon, 3 Oct 2022 18:49:17 UTC (6,740 KB)
[v3] Fri, 9 Jun 2023 20:18:14 UTC (21,561 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ObPose: Leveraging Canonical Pose for Object-Centric Scene Inference in 3D

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ObPose: Leveraging Canonical Pose for Object-Centric Scene Inference in 3D

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators