Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane

Yan, Han; Li, Yang; Wu, Zhennan; Chen, Shenzhou; Sun, Weixuan; Shang, Taizhang; Liu, Weizhe; Chen, Tian; Dai, Xiaqiang; Ma, Chao; Li, Hongdong; Ji, Pan

doi:10.1145/3680528.3687672

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.16210 (cs)

[Submitted on 24 Mar 2024 (v1), last revised 30 Aug 2024 (this version, v2)]

Title:Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane

Authors:Han Yan, Yang Li, Zhennan Wu, Shenzhou Chen, Weixuan Sun, Taizhang Shang, Weizhe Liu, Tian Chen, Xiaqiang Dai, Chao Ma, Hongdong Li, Pan Ji

View PDF HTML (experimental)

Abstract:We present Frankenstein, a diffusion-based framework that can generate semantic-compositional 3D scenes in a single pass. Unlike existing methods that output a single, unified 3D shape, Frankenstein simultaneously generates multiple separated shapes, each corresponding to a semantically meaningful part. The 3D scene information is encoded in one single tri-plane tensor, from which multiple Singed Distance Function (SDF) fields can be decoded to represent the compositional shapes. During training, an auto-encoder compresses tri-planes into a latent space, and then the denoising diffusion process is employed to approximate the distribution of the compositional scenes. Frankenstein demonstrates promising results in generating room interiors as well as human avatars with automatically separated parts. The generated scenes facilitate many downstream applications, such as part-wise re-texturing, object rearrangement in the room or avatar cloth re-targeting. Our project page is available at: this https URL.

Comments:	SIGGRAPH Asia 2024 Conference Paper
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Cite as:	arXiv:2403.16210 [cs.CV]
	(or arXiv:2403.16210v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.16210
Related DOI:	https://doi.org/10.1145/3680528.3687672

Submission history

From: Han Yan [view email]
[v1] Sun, 24 Mar 2024 16:09:21 UTC (7,287 KB)
[v2] Fri, 30 Aug 2024 17:39:50 UTC (7,413 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators