LoMOE: Localized Multi-Object Editing via Multi-Diffusion

Chakrabarty, Goirik; Chandrasekar, Aditya; Hebbalaguppe, Ramya; AP, Prathosh

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.00437 (cs)

[Submitted on 1 Mar 2024]

Title:LoMOE: Localized Multi-Object Editing via Multi-Diffusion

Authors:Goirik Chakrabarty, Aditya Chandrasekar, Ramya Hebbalaguppe, Prathosh AP

View PDF HTML (experimental)

Abstract:Recent developments in the field of diffusion models have demonstrated an exceptional capacity to generate high-quality prompt-conditioned image edits. Nevertheless, previous approaches have primarily relied on textual prompts for image editing, which tend to be less effective when making precise edits to specific objects or fine-grained regions within a scene containing single/multiple objects. We introduce a novel framework for zero-shot localized multi-object editing through a multi-diffusion process to overcome this challenge. This framework empowers users to perform various operations on objects within an image, such as adding, replacing, or editing $\textbf{many}$ objects in a complex scene $\textbf{in one pass}$. Our approach leverages foreground masks and corresponding simple text prompts that exert localized influences on the target regions resulting in high-fidelity image editing. A combination of cross-attention and background preservation losses within the latent space ensures that the characteristics of the object being edited are preserved while simultaneously achieving a high-quality, seamless reconstruction of the background with fewer artifacts compared to the current methods. We also curate and release a dataset dedicated to multi-object editing, named $\texttt{LoMOE}$-Bench. Our experiments against existing state-of-the-art methods demonstrate the improved effectiveness of our approach in terms of both image editing quality and inference speed.

Comments:	18 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
Cite as:	arXiv:2403.00437 [cs.CV]
	(or arXiv:2403.00437v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.00437

Submission history

From: Goirik Chakrabarty [view email]
[v1] Fri, 1 Mar 2024 10:46:47 UTC (33,299 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:LoMOE: Localized Multi-Object Editing via Multi-Diffusion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:LoMOE: Localized Multi-Object Editing via Multi-Diffusion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators