Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation

Meng, Xuyi; Wang, Chen; Lei, Jiahui; Daniilidis, Kostas; Gu, Jiatao; Liu, Lingjie

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.05427 (cs)

[Submitted on 9 Jan 2025]

Title:Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation

Authors:Xuyi Meng, Chen Wang, Jiahui Lei, Kostas Daniilidis, Jiatao Gu, Lingjie Liu

View PDF HTML (experimental)

Abstract:Recent advances in 2D image generation have achieved remarkable quality,largely driven by the capacity of diffusion models and the availability of large-scale datasets. However, direct 3D generation is still constrained by the scarcity and lower fidelity of 3D datasets. In this paper, we introduce Zero-1-to-G, a novel approach that addresses this problem by enabling direct single-view generation on Gaussian splats using pretrained 2D diffusion models. Our key insight is that Gaussian splats, a 3D representation, can be decomposed into multi-view images encoding different attributes. This reframes the challenging task of direct 3D generation within a 2D diffusion framework, allowing us to leverage the rich priors of pretrained 2D diffusion models. To incorporate 3D awareness, we introduce cross-view and cross-attribute attention layers, which capture complex correlations and enforce 3D consistency across generated splats. This makes Zero-1-to-G the first direct image-to-3D generative model to effectively utilize pretrained 2D diffusion priors, enabling efficient training and improved generalization to unseen objects. Extensive experiments on both synthetic and in-the-wild datasets demonstrate superior performance in 3D object generation, offering a new approach to high-quality 3D generation.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.05427 [cs.CV]
	(or arXiv:2501.05427v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.05427

Submission history

From: Xuyi Meng [view email]
[v1] Thu, 9 Jan 2025 18:37:35 UTC (12,475 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators