UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Dai, Zhigang; Cai, Bolun; Lin, Yugeng; Chen, Junying

Computer Science > Computer Vision and Pattern Recognition

arXiv:2011.09094v2 (cs)

[Submitted on 18 Nov 2020 (v1), revised 7 Apr 2021 (this version, v2), latest version 24 Jul 2023 (v3)]

Title:UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Authors:Zhigang Dai, Bolun Cai, Yugeng Lin, Junying Chen

View PDF

Abstract:Object detection with transformers (DETR) reaches competitive performance with Faster R-CNN via a transformer encoder-decoder architecture. Inspired by the great success of pre-training transformers in natural language processing, we propose a pretext task named random query patch detection to Unsupervisedly Pre-train DETR (UP-DETR) for object detection. Specifically, we randomly crop patches from the given image and then feed them as queries to the decoder. The model is pre-trained to detect these query patches from the original image. During the pre-training, we address two critical issues: multi-task learning and multi-query localization. (1) To trade off classification and localization preferences in the pretext task, we freeze the CNN backbone and propose a patch feature reconstruction branch which is jointly optimized with patch detection. (2) To perform multi-query localization, we introduce UP-DETR from single-query patch and extend it to multi-query patches with object query shuffle and attention mask. In our experiments, UP-DETR significantly boosts the performance of DETR with faster convergence and higher average precision on object detection, one-shot detection and panoptic segmentation. Code and pre-training models: this https URL.

Comments:	Accepted by CVPR 2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2011.09094 [cs.CV]
	(or arXiv:2011.09094v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2011.09094

Submission history

From: Bolun Cai [view email]
[v1] Wed, 18 Nov 2020 05:16:11 UTC (2,209 KB)
[v2] Wed, 7 Apr 2021 15:15:49 UTC (1,557 KB)
[v3] Mon, 24 Jul 2023 11:28:46 UTC (3,262 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators