Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models

Zhao, Juntu; Deng, Junyu; Ye, Yixin; Li, Chongxuan; Deng, Zhijie; Wang, Dequan

Computer Science > Artificial Intelligence

arXiv:2408.00230 (cs)

[Submitted on 1 Aug 2024 (v1), last revised 5 Aug 2024 (this version, v2)]

Title:Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models

Authors:Juntu Zhao, Junyu Deng, Yixin Ye, Chongxuan Li, Zhijie Deng, Dequan Wang

View PDF HTML (experimental)

Abstract:Advancements in text-to-image diffusion models have broadened extensive downstream practical applications, but such models often encounter misalignment issues between text and image. Taking the generation of a combination of two disentangled concepts as an example, say given the prompt "a tea cup of iced coke", existing models usually generate a glass cup of iced coke because the iced coke usually co-occurs with the glass cup instead of the tea one during model training. The root of such misalignment is attributed to the confusion in the latent semantic space of text-to-image diffusion models, and hence we refer to the "a tea cup of iced coke" phenomenon as Latent Concept Misalignment (LC-Mis). We leverage large language models (LLMs) to thoroughly investigate the scope of LC-Mis, and develop an automated pipeline for aligning the latent semantics of diffusion models to text prompts. Empirical assessments confirm the effectiveness of our approach, substantially reducing LC-Mis errors and enhancing the robustness and versatility of text-to-image diffusion models. The code and dataset are here: this https URL.

Comments:	Accepted by the 18th European Conference on Computer Vision ECCV 2024
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2408.00230 [cs.AI]
	(or arXiv:2408.00230v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2408.00230

Submission history

From: Juntu Zhao [view email]
[v1] Thu, 1 Aug 2024 01:54:17 UTC (17,341 KB)
[v2] Mon, 5 Aug 2024 08:36:20 UTC (17,341 KB)

Computer Science > Artificial Intelligence

Title:Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators