OmniPrism: Learning Disentangled Visual Concept for Image Generation

Li, Yangyang; Liu, Daqing; Liu, Wu; He, Allen; Liu, Xinchen; Zhang, Yongdong; Jin, Guoqing

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.12242 (cs)

[Submitted on 16 Dec 2024]

Title:OmniPrism: Learning Disentangled Visual Concept for Image Generation

Authors:Yangyang Li, Daqing Liu, Wu Liu, Allen He, Xinchen Liu, Yongdong Zhang, Guoqing Jin

View PDF HTML (experimental)

Abstract:Creative visual concept generation often draws inspiration from specific concepts in a reference image to produce relevant outcomes. However, existing methods are typically constrained to single-aspect concept generation or are easily disrupted by irrelevant concepts in multi-aspect concept scenarios, leading to concept confusion and hindering creative generation. To address this, we propose OmniPrism, a visual concept disentangling approach for creative image generation. Our method learns disentangled concept representations guided by natural language and trains a diffusion model to incorporate these concepts. We utilize the rich semantic space of a multimodal extractor to achieve concept disentanglement from given images and concept guidance. To disentangle concepts with different semantics, we construct a paired concept disentangled dataset (PCD-200K), where each pair shares the same concept such as content, style, and composition. We learn disentangled concept representations through our contrastive orthogonal disentangled (COD) training pipeline, which are then injected into additional diffusion cross-attention layers for generation. A set of block embeddings is designed to adapt each block's concept domain in the diffusion models. Extensive experiments demonstrate that our method can generate high-quality, concept-disentangled results with high fidelity to text prompts and desired concepts.

Comments:	WebPage available at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2412.12242 [cs.CV]
	(or arXiv:2412.12242v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.12242

Submission history

From: Yangyang Li [view email]
[v1] Mon, 16 Dec 2024 18:59:52 UTC (35,085 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:OmniPrism: Learning Disentangled Visual Concept for Image Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:OmniPrism: Learning Disentangled Visual Concept for Image Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators