Image Generation with Multimodal Priors using Denoising Diffusion Probabilistic Models

Nair, Nithin Gopalakrishnan; Bandara, Wele Gedara Chaminda; Patel, Vishal M

Computer Science > Computer Vision and Pattern Recognition

arXiv:2206.05039 (cs)

[Submitted on 10 Jun 2022]

Title:Image Generation with Multimodal Priors using Denoising Diffusion Probabilistic Models

Authors:Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara, Vishal M Patel

View PDF

Abstract:Image synthesis under multi-modal priors is a useful and challenging task that has received increasing attention in recent years. A major challenge in using generative models to accomplish this task is the lack of paired data containing all modalities (i.e. priors) and corresponding outputs. In recent work, a variational auto-encoder (VAE) model was trained in a weakly supervised manner to address this challenge. Since the generative power of VAEs is usually limited, it is difficult for this method to synthesize images belonging to complex distributions. To this end, we propose a solution based on a denoising diffusion probabilistic models to synthesise images under multi-model priors. Based on the fact that the distribution over each time step in the diffusion model is Gaussian, in this work we show that there exists a closed-form expression to the generate the image corresponds to the given modalities. The proposed solution does not require explicit retraining for all modalities and can leverage the outputs of individual modalities to generate realistic images according to different constraints. We conduct studies on two real-world datasets to demonstrate the effectiveness of our approach

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2206.05039 [cs.CV]
	(or arXiv:2206.05039v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2206.05039

Submission history

From: Nithin Gopalakrishnan Nair [view email]
[v1] Fri, 10 Jun 2022 12:23:05 UTC (11,171 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Image Generation with Multimodal Priors using Denoising Diffusion Probabilistic Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Image Generation with Multimodal Priors using Denoising Diffusion Probabilistic Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators