Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling

Wu, Qirui; Iliash, Denys; Ritchie, Daniel; Savva, Manolis; Chang, Angel X.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.19492 (cs)

[Submitted on 29 Nov 2024 (v1), last revised 14 Mar 2025 (this version, v2)]

Title:Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling

Authors:Qirui Wu, Denys Iliash, Daniel Ritchie, Manolis Savva, Angel X. Chang

View PDF HTML (experimental)

Abstract:Reconstructing structured 3D scenes from RGB images using CAD objects unlocks efficient and compact scene representations that maintain compositionality and interactability. Existing works propose training-heavy methods relying on either expensive yet inaccurate real-world annotations or controllable yet monotonous synthetic data that do not generalize well to unseen objects or domains. We present Diorama, the first zero-shot open-world system that holistically models 3D scenes from single-view RGB observations without requiring end-to-end training or human annotations. We show the feasibility of our approach by decomposing the problem into subtasks and introduce robust, generalizable solutions to each: architecture reconstruction, 3D shape retrieval, object pose estimation, and scene layout optimization. We evaluate our system on both synthetic and real-world data to show we significantly outperform baselines from prior work. We also demonstrate generalization to internet images and the text-to-scene task.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2411.19492 [cs.CV]
	(or arXiv:2411.19492v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.19492

Submission history

From: Qirui Wu [view email]
[v1] Fri, 29 Nov 2024 06:19:04 UTC (47,960 KB)
[v2] Fri, 14 Mar 2025 22:54:30 UTC (49,121 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators