Generic 3D Diffusion Adapter Using Controlled Multi-View Editing

Chen, Hansheng; Shi, Ruoxi; Liu, Yulin; Shen, Bokui; Gu, Jiayuan; Wetzstein, Gordon; Su, Hao; Guibas, Leonidas

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.12032 (cs)

[Submitted on 18 Mar 2024 (v1), last revised 19 Mar 2024 (this version, v2)]

Title:Generic 3D Diffusion Adapter Using Controlled Multi-View Editing

Authors:Hansheng Chen, Ruoxi Shi, Yulin Liu, Bokui Shen, Jiayuan Gu, Gordon Wetzstein, Hao Su, Leonidas Guibas

View PDF HTML (experimental)

Abstract:Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity. To bridge this gap, recent works have investigated multi-view diffusion but often fall short in either 3D consistency, visual quality, or efficiency. This paper proposes MVEdit, which functions as a 3D counterpart of SDEdit, employing ancestral sampling to jointly denoise multi-view images and output high-quality textured meshes. Built on off-the-shelf 2D diffusion models, MVEdit achieves 3D consistency through a training-free 3D Adapter, which lifts the 2D views of the last timestep into a coherent 3D representation, then conditions the 2D views of the next timestep using rendered views, without uncompromising visual quality. With an inference time of only 2-5 minutes, this framework achieves better trade-off between quality and speed than score distillation. MVEdit is highly versatile and extendable, with a wide range of applications including text/image-to-3D generation, 3D-to-3D editing, and high-quality texture synthesis. In particular, evaluations demonstrate state-of-the-art performance in both image-to-3D and text-guided texture generation tasks. Additionally, we introduce a method for fine-tuning 2D latent diffusion models on small 3D datasets with limited resources, enabling fast low-resolution text-to-3D initialization.

Comments:	V2 note: Fix missing acknowledgements. Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Cite as:	arXiv:2403.12032 [cs.CV]
	(or arXiv:2403.12032v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.12032

Submission history

From: Hansheng Chen [view email]
[v1] Mon, 18 Mar 2024 17:59:09 UTC (29,138 KB)
[v2] Tue, 19 Mar 2024 16:45:22 UTC (29,138 KB)

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Computer Vision and Pattern Recognition

Title:Generic 3D Diffusion Adapter Using Controlled Multi-View Editing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Computer Vision and Pattern Recognition

Title:Generic 3D Diffusion Adapter Using Controlled Multi-View Editing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators