Enhancing Diffusion Models with 3D Perspective Geometry Constraints

Upadhyay, Rishi; Zhang, Howard; Ba, Yunhao; Yang, Ethan; Gella, Blake; Jiang, Sicheng; Wong, Alex; Kadambi, Achuta

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.00944 (cs)

[Submitted on 1 Dec 2023]

Title:Enhancing Diffusion Models with 3D Perspective Geometry Constraints

Authors:Rishi Upadhyay, Howard Zhang, Yunhao Ba, Ethan Yang, Blake Gella, Sicheng Jiang, Alex Wong, Achuta Kadambi

View PDF

Abstract:While perspective is a well-studied topic in art, it is generally taken for granted in images. However, for the recent wave of high-quality image synthesis methods such as latent diffusion models, perspective accuracy is not an explicit requirement. Since these methods are capable of outputting a wide gamut of possible images, it is difficult for these synthesized images to adhere to the principles of linear perspective. We introduce a novel geometric constraint in the training process of generative models to enforce perspective accuracy. We show that outputs of models trained with this constraint both appear more realistic and improve performance of downstream models trained on generated images. Subjective human trials show that images generated with latent diffusion models trained with our constraint are preferred over images from the Stable Diffusion V2 model 70% of the time. SOTA monocular depth estimation models such as DPT and PixelFormer, fine-tuned on our images, outperform the original models trained on real images by up to 7.03% in RMSE and 19.3% in SqRel on the KITTI test set for zero-shot transfer.

Comments:	Project Webpage: this http URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Cite as:	arXiv:2312.00944 [cs.CV]
	(or arXiv:2312.00944v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.00944

Submission history

From: Rishi Upadhyay [view email]
[v1] Fri, 1 Dec 2023 21:56:43 UTC (25,179 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Enhancing Diffusion Models with 3D Perspective Geometry Constraints

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Enhancing Diffusion Models with 3D Perspective Geometry Constraints

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators