DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis

Duan, Zhongjie; You, Lizhou; Wang, Chengyu; Chen, Cen; Wu, Ziheng; Qian, Weining; Huang, Jun

Computer Science > Computer Vision and Pattern Recognition

arXiv:2308.03463 (cs)

[Submitted on 7 Aug 2023 (v1), last revised 10 Aug 2023 (this version, v3)]

Title:DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis

Authors:Zhongjie Duan, Lizhou You, Chengyu Wang, Cen Chen, Ziheng Wu, Weining Qian, Jun Huang

View PDF

Abstract:In recent years, diffusion models have emerged as the most powerful approach in image synthesis. However, applying these models directly to video synthesis presents challenges, as it often leads to noticeable flickering contents. Although recently proposed zero-shot methods can alleviate flicker to some extent, we still struggle to generate coherent videos. In this paper, we propose DiffSynth, a novel approach that aims to convert image synthesis pipelines to video synthesis pipelines. DiffSynth consists of two key components: a latent in-iteration deflickering framework and a video deflickering algorithm. The latent in-iteration deflickering framework applies video deflickering to the latent space of diffusion models, effectively preventing flicker accumulation in intermediate steps. Additionally, we propose a video deflickering algorithm, named patch blending algorithm, that remaps objects in different frames and blends them together to enhance video consistency. One of the notable advantages of DiffSynth is its general applicability to various video synthesis tasks, including text-guided video stylization, fashion video synthesis, image-guided video stylization, video restoring, and 3D rendering. In the task of text-guided video stylization, we make it possible to synthesize high-quality videos without cherry-picking. The experimental results demonstrate the effectiveness of DiffSynth. All videos can be viewed on our project page. Source codes will also be released.

Comments:	9 pages, 6 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Cite as:	arXiv:2308.03463 [cs.CV]
	(or arXiv:2308.03463v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.03463

Submission history

From: Zhongjie Duan [view email]
[v1] Mon, 7 Aug 2023 10:41:52 UTC (2,774 KB)
[v2] Tue, 8 Aug 2023 07:54:55 UTC (2,774 KB)
[v3] Thu, 10 Aug 2023 02:26:16 UTC (2,774 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators