BOOTPLACE: Bootstrapped Object Placement with Detection Transformers

Zhou, Hang; Zuo, Xinxin; Ma, Rui; Cheng, Li

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.21991 (cs)

[Submitted on 27 Mar 2025]

Title:BOOTPLACE: Bootstrapped Object Placement with Detection Transformers

Authors:Hang Zhou, Xinxin Zuo, Rui Ma, Li Cheng

View PDF HTML (experimental)

Abstract:In this paper, we tackle the copy-paste image-to-image composition problem with a focus on object placement learning. Prior methods have leveraged generative models to reduce the reliance for dense supervision. However, this often limits their capacity to model complex data distributions. Alternatively, transformer networks with a sparse contrastive loss have been explored, but their over-relaxed regularization often leads to imprecise object placement. We introduce BOOTPLACE, a novel paradigm that formulates object placement as a placement-by-detection problem. Our approach begins by identifying suitable regions of interest for object placement. This is achieved by training a specialized detection transformer on object-subtracted backgrounds, enhanced with multi-object supervisions. It then semantically associates each target compositing object with detected regions based on their complementary characteristics. Through a boostrapped training approach applied to randomly object-subtracted images, our model enforces meaningful placements through extensive paired data augmentation. Experimental results on established benchmarks demonstrate BOOTPLACE's superior performance in object repositioning, markedly surpassing state-of-the-art baselines on Cityscapes and OPA datasets with notable improvements in IOU scores. Additional ablation studies further showcase the compositionality and generalizability of our approach, supported by user study evaluations.

Comments:	CVPR 2025. Project page: this https URL , code: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Cite as:	arXiv:2503.21991 [cs.CV]
	(or arXiv:2503.21991v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.21991

Submission history

From: Hang Zhou [view email]
[v1] Thu, 27 Mar 2025 21:21:20 UTC (46,676 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:BOOTPLACE: Bootstrapped Object Placement with Detection Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:BOOTPLACE: Bootstrapped Object Placement with Detection Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators