Transparent Image Layer Diffusion using Latent Transparency

Zhang, Lvmin; Agrawala, Maneesh

Computer Science > Computer Vision and Pattern Recognition

arXiv:2402.17113 (cs)

[Submitted on 27 Feb 2024 (v1), last revised 23 Jun 2024 (this version, v4)]

Title:Transparent Image Layer Diffusion using Latent Transparency

Authors:Lvmin Zhang, Maneesh Agrawala

View PDF HTML (experimental)

Abstract:We present LayerDiffuse, an approach enabling large-scale pretrained latent diffusion models to generate transparent images. The method allows generation of single transparent images or of multiple transparent layers. The method learns a "latent transparency" that encodes alpha channel transparency into the latent manifold of a pretrained latent diffusion model. It preserves the production-ready quality of the large diffusion model by regulating the added transparency as a latent offset with minimal changes to the original latent distribution of the pretrained model. In this way, any latent diffusion model can be converted into a transparent image generator by finetuning it with the adjusted latent space. We train the model with 1M transparent image layer pairs collected using a human-in-the-loop collection scheme. We show that latent transparency can be applied to different open source image generators, or be adapted to various conditional control systems to achieve applications like foreground/background-conditioned layer generation, joint layer generation, structural control of layer contents, etc. A user study finds that in most cases (97%) users prefer our natively generated transparent content over previous ad-hoc solutions such as generating and then matting. Users also report the quality of our generated transparent images is comparable to real commercial transparent assets like Adobe Stock.

Comments:	44 pages, 37 figures, this http URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Cite as:	arXiv:2402.17113 [cs.CV]
	(or arXiv:2402.17113v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2402.17113

Submission history

From: Lvmin Zhang [view email]
[v1] Tue, 27 Feb 2024 01:19:53 UTC (25,100 KB)
[v2] Wed, 28 Feb 2024 06:07:56 UTC (25,100 KB)
[v3] Fri, 1 Mar 2024 21:36:19 UTC (25,100 KB)
[v4] Sun, 23 Jun 2024 03:47:27 UTC (25,649 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Transparent Image Layer Diffusion using Latent Transparency

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Transparent Image Layer Diffusion using Latent Transparency

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators