Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Fan, Wan-Cyuan; Chen, Yen-Chun; Chen, Dongdong; Cheng, Yu; Yuan, Lu; Wang, Yu-Chiang Frank

Computer Science > Computer Vision and Pattern Recognition

arXiv:2208.13753 (cs)

[Submitted on 29 Aug 2022 (v1), last revised 1 Dec 2022 (this version, v2)]

Title:Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Authors:Wan-Cyuan Fan, Yen-Chun Chen, Dongdong Chen, Yu Cheng, Lu Yuan, Yu-Chiang Frank Wang

View PDF

Abstract:Diffusion models (DMs) have shown great potential for high-quality image synthesis. However, when it comes to producing images with complex scenes, how to properly describe both image global structures and object details remains a challenging task. In this paper, we present Frido, a Feature Pyramid Diffusion model performing a multi-scale coarse-to-fine denoising process for image synthesis. Our model decomposes an input image into scale-dependent vector quantized features, followed by a coarse-to-fine gating for producing image output. During the above multi-scale representation learning stage, additional input conditions like text, scene graph, or image layout can be further exploited. Thus, Frido can be also applied for conditional or cross-modality image synthesis. We conduct extensive experiments over various unconditioned and conditional image generation tasks, ranging from text-to-image synthesis, layout-to-image, scene-graph-to-image, to label-to-image. More specifically, we achieved state-of-the-art FID scores on five benchmarks, namely layout-to-image on COCO and OpenImages, scene-graph-to-image on COCO and Visual Genome, and label-to-image on COCO. Code is available at this https URL.

Comments:	AAAI 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2208.13753 [cs.CV]
	(or arXiv:2208.13753v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2208.13753

Submission history

From: Wan-Cyuan Fan [view email]
[v1] Mon, 29 Aug 2022 17:37:29 UTC (23,563 KB)
[v2] Thu, 1 Dec 2022 06:29:07 UTC (17,872 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators