Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Liu, Feng; Zhang, Shiwei; Wang, Xiaofeng; Wei, Yujie; Qiu, Haonan; Zhao, Yuzhong; Zhang, Yingya; Ye, Qixiang; Wan, Fang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.19108 (cs)

[Submitted on 28 Nov 2024 (v1), last revised 18 Mar 2025 (this version, v2)]

Title:Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Authors:Feng Liu, Shiwei Zhang, Xiaofeng Wang, Yujie Wei, Haonan Qiu, Yuzhong Zhao, Yingya Zhang, Qixiang Ye, Fang Wan

View PDF HTML (experimental)

Abstract:As a fundamental backbone for video generation, diffusion models are challenged by low inference speed due to the sequential nature of denoising. Previous methods speed up the models by caching and reusing model outputs at uniformly selected timesteps. However, such a strategy neglects the fact that differences among model outputs are not uniform across timesteps, which hinders selecting the appropriate model outputs to cache, leading to a poor balance between inference efficiency and visual quality. In this study, we introduce Timestep Embedding Aware Cache (TeaCache), a training-free caching approach that estimates and leverages the fluctuating differences among model outputs across timesteps. Rather than directly using the time-consuming model outputs, TeaCache focuses on model inputs, which have a strong correlation with the modeloutputs while incurring negligible computational cost. TeaCache first modulates the noisy inputs using the timestep embeddings to ensure their differences better approximating those of model outputs. TeaCache then introduces a rescaling strategy to refine the estimated differences and utilizes them to indicate output caching. Experiments show that TeaCache achieves up to 4.41x acceleration over Open-Sora-Plan with negligible (-0.07% Vbench score) degradation of visual quality.

Comments:	Accepted in CVPR 2025. Project: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.19108 [cs.CV]
	(or arXiv:2411.19108v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.19108

Submission history

From: Feng Liu [view email]
[v1] Thu, 28 Nov 2024 12:50:05 UTC (2,823 KB)
[v2] Tue, 18 Mar 2025 04:49:23 UTC (3,743 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators