Wonderland: Navigating 3D Scenes from a Single Image

Liang, Hanwen; Cao, Junli; Goel, Vidit; Qian, Guocheng; Korolev, Sergei; Terzopoulos, Demetri; Plataniotis, Konstantinos N.; Tulyakov, Sergey; Ren, Jian

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.12091 (cs)

[Submitted on 16 Dec 2024]

Title:Wonderland: Navigating 3D Scenes from a Single Image

Authors:Hanwen Liang, Junli Cao, Vidit Goel, Guocheng Qian, Sergei Korolev, Demetri Terzopoulos, Konstantinos N. Plataniotis, Sergey Tulyakov, Jian Ren

View PDF HTML (experimental)

Abstract:This paper addresses a challenging question: How can we efficiently create high-quality, wide-scope 3D scenes from a single arbitrary image? Existing methods face several constraints, such as requiring multi-view data, time-consuming per-scene optimization, low visual quality in backgrounds, and distorted reconstructions in unseen areas. We propose a novel pipeline to overcome these limitations. Specifically, we introduce a large-scale reconstruction model that uses latents from a video diffusion model to predict 3D Gaussian Splattings for the scenes in a feed-forward manner. The video diffusion model is designed to create videos precisely following specified camera trajectories, allowing it to generate compressed video latents that contain multi-view information while maintaining 3D consistency. We train the 3D reconstruction model to operate on the video latent space with a progressive training strategy, enabling the efficient generation of high-quality, wide-scope, and generic 3D scenes. Extensive evaluations across various datasets demonstrate that our model significantly outperforms existing methods for single-view 3D scene generation, particularly with out-of-domain images. For the first time, we demonstrate that a 3D reconstruction model can be effectively built upon the latent space of a diffusion model to realize efficient 3D scene generation.

Comments:	Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.12091 [cs.CV]
	(or arXiv:2412.12091v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.12091

Submission history

From: Hanwen Liang [view email]
[v1] Mon, 16 Dec 2024 18:58:17 UTC (8,567 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Wonderland: Navigating 3D Scenes from a Single Image

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Wonderland: Navigating 3D Scenes from a Single Image

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators