Norm-guided latent space exploration for text-to-image generation

Samuel, Dvir; Ben-Ari, Rami; Darshan, Nir; Maron, Haggai; Chechik, Gal

Computer Science > Computer Vision and Pattern Recognition

arXiv:2306.08687 (cs)

[Submitted on 14 Jun 2023 (v1), last revised 5 Nov 2023 (this version, v3)]

Title:Norm-guided latent space exploration for text-to-image generation

Authors:Dvir Samuel, Rami Ben-Ari, Nir Darshan, Haggai Maron, Gal Chechik

View PDF

Abstract:Text-to-image diffusion models show great potential in synthesizing a large variety of concepts in new compositions and scenarios. However, the latent space of initial seeds is still not well understood and its structure was shown to impact the generation of various concepts. Specifically, simple operations like interpolation and finding the centroid of a set of seeds perform poorly when using standard Euclidean or spherical metrics in the latent space. This paper makes the observation that, in current training procedures, diffusion models observed inputs with a narrow range of norm values. This has strong implications for methods that rely on seed manipulation for image generation, with applications to few-shot and long-tail learning tasks. To address this issue, we propose a novel method for interpolating between two seeds and demonstrate that it defines a new non-Euclidean metric that takes into account a norm-based prior on seeds. We describe a simple yet efficient algorithm for approximating this interpolation procedure and use it to further define centroids in the latent seed space. We show that our new interpolation and centroid techniques significantly enhance the generation of rare concept images. This further leads to state-of-the-art performance on few-shot and long-tail benchmarks, improving prior approaches in terms of generation speed, image quality, and semantic content.

Comments:	Accepted to NeurIPS 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2306.08687 [cs.CV]
	(or arXiv:2306.08687v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2306.08687

Submission history

From: Dvir Samuel [view email]
[v1] Wed, 14 Jun 2023 18:12:15 UTC (18,502 KB)
[v2] Tue, 31 Oct 2023 07:38:26 UTC (21,586 KB)
[v3] Sun, 5 Nov 2023 09:39:07 UTC (21,586 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Norm-guided latent space exploration for text-to-image generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Norm-guided latent space exploration for text-to-image generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators