Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing

Qi, Fan; Duan, Yu; Xu, Changsheng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.21069 (cs)

[Submitted on 27 Mar 2025]

Title:Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing

Authors:Fan Qi, Yu Duan, Changsheng Xu

View PDF HTML (experimental)

Abstract:Recent advances in text-guided diffusion models have revolutionized conditional image generation, yet they struggle to synthesize complex scenes with multiple objects due to imprecise spatial grounding and limited scalability. We address these challenges through two key modules: 1) Janus-Pro-driven Prompt Parsing, a prompt-layout parsing module that bridges text understanding and layout generation via a compact 1B-parameter architecture, and 2) MIGLoRA, a parameter-efficient plug-in integrating Low-Rank Adaptation (LoRA) into UNet (SD1.5) and DiT (SD3) backbones. MIGLoRA is capable of preserving the base model's parameters and ensuring plug-and-play adaptability, minimizing architectural intrusion while enabling efficient fine-tuning. To support a comprehensive evaluation, we create DescripBox and DescripBox-1024, benchmarks that span diverse scenes and resolutions. The proposed method achieves state-of-the-art performance on COCO and LVIS benchmarks while maintaining parameter efficiency, demonstrating superior layout fidelity and scalability for open-world synthesis.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2503.21069 [cs.CV]
	(or arXiv:2503.21069v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.21069

Submission history

From: Fan Qi [view email]
[v1] Thu, 27 Mar 2025 00:59:14 UTC (23,203 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators