UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer

Wang, Haoxuan; Peng, Jinlong; He, Qingdong; Yang, Hao; Jin, Ying; Wu, Jiafu; Hu, Xiaobin; Pan, Yanjie; Gan, Zhenye; Chi, Mingmin; Peng, Bo; Wang, Yabiao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.09277 (cs)

[Submitted on 12 Mar 2025]

Title:UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer

Authors:Haoxuan Wang, Jinlong Peng, Qingdong He, Hao Yang, Ying Jin, Jiafu Wu, Xiaobin Hu, Yanjie Pan, Zhenye Gan, Mingmin Chi, Bo Peng, Yabiao Wang

View PDF HTML (experimental)

Abstract:With the rapid development of diffusion models in image generation, the demand for more powerful and flexible controllable frameworks is increasing. Although existing methods can guide generation beyond text prompts, the challenge of effectively combining multiple conditional inputs while maintaining consistency with all of them remains unsolved. To address this, we introduce UniCombine, a DiT-based multi-conditional controllable generative framework capable of handling any combination of conditions, including but not limited to text prompts, spatial maps, and subject images. Specifically, we introduce a novel Conditional MMDiT Attention mechanism and incorporate a trainable LoRA module to build both the training-free and training-based versions. Additionally, we propose a new pipeline to construct SubjectSpatial200K, the first dataset designed for multi-conditional generative tasks covering both the subject-driven and spatially-aligned conditions. Extensive experimental results on multi-conditional generation demonstrate the outstanding universality and powerful capability of our approach with state-of-the-art performance.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2503.09277 [cs.CV]
	(or arXiv:2503.09277v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.09277

Submission history

From: Haoxuan Wang [view email]
[v1] Wed, 12 Mar 2025 11:22:47 UTC (10,836 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators