Computer Science > Computer Vision and Pattern Recognition
[Submitted on 22 Jan 2025]
Title:Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation
View PDF HTML (experimental)Abstract:Diffusion models are state-of-the-art for image generation. Trained on large datasets, they capture expressive image priors that have been used for tasks like inpainting, depth, and (surface) normal prediction. However, these models are typically trained for one specific task, e.g., a separate model for each of color, depth, and normal prediction. Such models do not leverage the intrinsic correlation between appearance and geometry, often leading to inconsistent predictions.
In this paper, we propose using a novel image diffusion prior that jointly encodes appearance and geometry. We introduce a diffusion model Orchid, comprising a Variational Autoencoder (VAE) to encode color, depth, and surface normals to a latent space, and a Latent Diffusion Model (LDM) for generating these joint latents. Orchid directly generates photo-realistic color images, relative depth, and surface normals from user-provided text, and can be used to create image-aligned partial 3D scenes seamlessly. It can also perform image-conditioned tasks like joint monocular depth and normal prediction and is competitive in accuracy to state-of-the-art methods designed for those tasks alone. Lastly, our model learns a joint prior that can be used zero-shot as a regularizer for many inverse problems that entangle appearance and geometry. For example, we demonstrate its effectiveness in color-depth-normal inpainting, showcasing its applicability to problems in 3D generation from sparse views.
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.