Fast Autoregressive Video Generation with Diagonal Decoding

Ye, Yang; Guo, Junliang; Wu, Haoyu; He, Tianyu; Pearce, Tim; Rashid, Tabish; Hofmann, Katja; Bian, Jiang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.14070 (cs)

[Submitted on 18 Mar 2025]

Title:Fast Autoregressive Video Generation with Diagonal Decoding

Authors:Yang Ye, Junliang Guo, Haoyu Wu, Tianyu He, Tim Pearce, Tabish Rashid, Katja Hofmann, Jiang Bian

View PDF HTML (experimental)

Abstract:Autoregressive Transformer models have demonstrated impressive performance in video generation, but their sequential token-by-token decoding process poses a major bottleneck, particularly for long videos represented by tens of thousands of tokens. In this paper, we propose Diagonal Decoding (DiagD), a training-free inference acceleration algorithm for autoregressively pre-trained models that exploits spatial and temporal correlations in videos. Our method generates tokens along diagonal paths in the spatial-temporal token grid, enabling parallel decoding within each frame as well as partially overlapping across consecutive frames. The proposed algorithm is versatile and adaptive to various generative models and tasks, while providing flexible control over the trade-off between inference speed and visual quality. Furthermore, we propose a cost-effective finetuning strategy that aligns the attention patterns of the model with our decoding order, further mitigating the training-inference gap on small-scale models. Experiments on multiple autoregressive video generation models and datasets demonstrate that DiagD achieves up to $10\times$ speedup compared to naive sequential decoding, while maintaining comparable visual fidelity.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2503.14070 [cs.CV]
	(or arXiv:2503.14070v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.14070

Submission history

From: Junliang Guo [view email]
[v1] Tue, 18 Mar 2025 09:42:55 UTC (3,377 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Fast Autoregressive Video Generation with Diagonal Decoding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Fast Autoregressive Video Generation with Diagonal Decoding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators