TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps

Xie, Qingsong; Liao, Zhenyi; Deng, Zhijie; chen, Chen; Lu, Haonan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.05768 (cs)

[Submitted on 9 Jun 2024 (v1), last revised 7 Nov 2024 (this version, v6)]

Title:TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps

Authors:Qingsong Xie, Zhenyi Liao, Zhijie Deng, Chen chen, Haonan Lu

View PDF HTML (experimental)

Abstract:Distilling latent diffusion models (LDMs) into ones that are fast to sample from is attracting growing research interest. However, the majority of existing methods face two critical challenges: (1) They hinge on long training using a huge volume of real data. (2) They routinely lead to quality degradation for generation, especially in text-image alignment. This paper proposes a novel training-efficient Latent Consistency Model (TLCM) to overcome these challenges. Our method first accelerates LDMs via data-free multistep latent consistency distillation (MLCD), and then data-free latent consistency distillation is proposed to efficiently guarantee the inter-segment consistency in MLCD. Furthermore, we introduce bags of techniques, e.g., distribution matching, adversarial learning, and preference learning, to enhance TLCM's performance at few-step inference without any real data. TLCM demonstrates a high level of flexibility by enabling adjustment of sampling steps within the range of 2 to 8 while still producing competitive outputs compared to full-step approaches. Notably, TLCM enjoys the data-free merit by employing synthetic data from the teacher for distillation. With just 70 training hours on an A100 GPU, a 3-step TLCM distilled from SDXL achieves an impressive CLIP Score of 33.68 and an Aesthetic Score of 5.97 on the MSCOCO-2017 5K benchmark, surpassing various accelerated models and even outperforming the teacher model in human preference metrics. We also demonstrate the versatility of TLCMs in applications including image style transfer, controllable generation, and Chinese-to-image generation.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.05768 [cs.CV]
	(or arXiv:2406.05768v6 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.05768

Submission history

From: Qingsong Xie [view email]
[v1] Sun, 9 Jun 2024 12:55:50 UTC (15,861 KB)
[v2] Tue, 11 Jun 2024 06:22:53 UTC (15,861 KB)
[v3] Wed, 12 Jun 2024 02:57:00 UTC (15,861 KB)
[v4] Wed, 30 Oct 2024 06:49:52 UTC (13,220 KB)
[v5] Thu, 31 Oct 2024 02:16:04 UTC (13,220 KB)
[v6] Thu, 7 Nov 2024 01:29:26 UTC (13,220 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators