FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute

Anagnostidis, Sotiris; Bachmann, Gregor; Kim, Yeongmin; Kohler, Jonas; Georgopoulos, Markos; Sanakoyeu, Artsiom; Du, Yuming; Pumarola, Albert; Thabet, Ali; Schönfeld, Edgar

Computer Science > Machine Learning

arXiv:2502.20126 (cs)

[Submitted on 27 Feb 2025]

Title:FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute

Authors:Sotiris Anagnostidis, Gregor Bachmann, Yeongmin Kim, Jonas Kohler, Markos Georgopoulos, Artsiom Sanakoyeu, Yuming Du, Albert Pumarola, Ali Thabet, Edgar Schönfeld

View PDF HTML (experimental)

Abstract:Despite their remarkable performance, modern Diffusion Transformers are hindered by substantial resource requirements during inference, stemming from the fixed and large amount of compute needed for each denoising step. In this work, we revisit the conventional static paradigm that allocates a fixed compute budget per denoising iteration and propose a dynamic strategy instead. Our simple and sample-efficient framework enables pre-trained DiT models to be converted into \emph{flexible} ones -- dubbed FlexiDiT -- allowing them to process inputs at varying compute budgets. We demonstrate how a single \emph{flexible} model can generate images without any drop in quality, while reducing the required FLOPs by more than $40$\% compared to their static counterparts, for both class-conditioned and text-conditioned image generation. Our method is general and agnostic to input and conditioning modalities. We show how our approach can be readily extended for video generation, where FlexiDiT models generate samples with up to $75$\% less compute without compromising performance.

Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2502.20126 [cs.LG]
	(or arXiv:2502.20126v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.20126

Submission history

From: Sotiris Anagnostidis [view email]
[v1] Thu, 27 Feb 2025 14:16:56 UTC (33,027 KB)

Computer Science > Machine Learning

Title:FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators