Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Chen, Yida; Viégas, Fernanda; Wattenberg, Martin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2306.05720 (cs)

[Submitted on 9 Jun 2023 (v1), last revised 4 Nov 2023 (this version, v2)]

Title:Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Authors:Yida Chen, Fernanda Viégas, Martin Wattenberg

View PDF

Abstract:Latent diffusion models (LDMs) exhibit an impressive ability to produce realistic images, yet the inner workings of these models remain mysterious. Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple scene geometry? Using linear probes, we find evidence that the internal activations of the LDM encode linear representations of both 3D depth data and a salient-object / background distinction. These representations appear surprisingly early in the denoising process$-$well before a human can easily make sense of the noisy images. Intervention experiments further indicate these representations play a causal role in image synthesis, and may be used for simple high-level editing of an LDM's output. Project page: this https URL

Comments:	A short version of this paper is accepted in the NeurIPS 2023 Workshop on Diffusion Models: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2306.05720 [cs.CV]
	(or arXiv:2306.05720v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2306.05720

Submission history

From: Yida Chen [view email]
[v1] Fri, 9 Jun 2023 07:34:34 UTC (22,386 KB)
[v2] Sat, 4 Nov 2023 19:22:35 UTC (22,386 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators