Thinking Outside the BBox: Unconstrained Generative Object Compositing

Tarrés, Gemma Canet; Lin, Zhe; Zhang, Zhifei; Zhang, Jianming; Song, Yizhi; Ruta, Dan; Gilbert, Andrew; Collomosse, John; Kim, Soo Ye

Computer Science > Computer Vision and Pattern Recognition

arXiv:2409.04559 (cs)

[Submitted on 6 Sep 2024 (v1), last revised 11 Sep 2024 (this version, v2)]

Title:Thinking Outside the BBox: Unconstrained Generative Object Compositing

Authors:Gemma Canet Tarrés, Zhe Lin, Zhifei Zhang, Jianming Zhang, Yizhi Song, Dan Ruta, Andrew Gilbert, John Collomosse, Soo Ye Kim

View PDF HTML (experimental)

Abstract:Compositing an object into an image involves multiple non-trivial sub-tasks such as object placement and scaling, color/lighting harmonization, viewpoint/geometry adjustment, and shadow/reflection generation. Recent generative image compositing methods leverage diffusion models to handle multiple sub-tasks at once. However, existing models face limitations due to their reliance on masking the original object during training, which constrains their generation to the input mask. Furthermore, obtaining an accurate input mask specifying the location and scale of the object in a new image can be highly challenging. To overcome such limitations, we define a novel problem of unconstrained generative object compositing, i.e., the generation is not bounded by the mask, and train a diffusion-based model on a synthesized paired dataset. Our first-of-its-kind model is able to generate object effects such as shadows and reflections that go beyond the mask, enhancing image realism. Additionally, if an empty mask is provided, our model automatically places the object in diverse natural locations and scales, accelerating the compositing workflow. Our model outperforms existing object placement and compositing models in various quality metrics and user studies.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2409.04559 [cs.CV]
	(or arXiv:2409.04559v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2409.04559

Submission history

From: Gemma Canet Tarrés [view email]
[v1] Fri, 6 Sep 2024 18:42:30 UTC (43,792 KB)
[v2] Wed, 11 Sep 2024 11:05:56 UTC (43,792 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Thinking Outside the BBox: Unconstrained Generative Object Compositing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Thinking Outside the BBox: Unconstrained Generative Object Compositing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators