DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training

Tan, Xin; Chen, Yuetao; Jiang, Yimin; Chen, Xing; Yan, Kun; Duan, Nan; Zhu, Yibo; Jiang, Daxin; Xu, Hong

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2502.07590 (cs)

[Submitted on 11 Feb 2025]

Title:DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training

Authors:Xin Tan, Yuetao Chen, Yimin Jiang, Xing Chen, Kun Yan, Nan Duan, Yibo Zhu, Daxin Jiang, Hong Xu

View PDF HTML (experimental)

Abstract:Diffusion Transformers (DiTs) have shown remarkable performance in modeling and generating high-quality videos. However, the quadratic computational complexity of 3D full attention mechanism presents significant challenges in scaling video DiT training, especially for high-definition and lengthy videos, where attention can dominate up to 95% of the end-to-end time and necessitate specialized communication paradigms to handle large input sizes.
This paper introduces DSV, a novel framework designed to accelerate and scale the training of video DiTs by leveraging the inherent dynamic attention sparsity throughout the training process. DSV employs a two-stage training algorithm that exploits sparsity patterns, focusing on critical elements supported by efficient, tailored kernels. To accommodate the new sparsity dimension, we develop a hybrid sparsity-aware context parallelism that effectively scales to large inputs by addressing the heterogeneity of sparsity across attention heads and blocks, resulting in optimized sparse computation and communication. Extensive evaluations demonstrate that DSV achieves up to 3.02x gain in training throughput with nearly no quality degradation.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2502.07590 [cs.DC]
	(or arXiv:2502.07590v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2502.07590

Submission history

From: Xin Tan [view email]
[v1] Tue, 11 Feb 2025 14:39:59 UTC (7,358 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators