Rethinking Transformer-based Set Prediction for Object Detection

Sun, Zhiqing; Cao, Shengcao; Yang, Yiming; Kitani, Kris

Computer Science > Computer Vision and Pattern Recognition

arXiv:2011.10881 (cs)

[Submitted on 21 Nov 2020 (v1), last revised 12 Oct 2021 (this version, v2)]

Title:Rethinking Transformer-based Set Prediction for Object Detection

Authors:Zhiqing Sun, Shengcao Cao, Yiming Yang, Kris Kitani

View PDF

Abstract:DETR is a recently proposed Transformer-based method which views object detection as a set prediction problem and achieves state-of-the-art performance but demands extra-long training time to converge. In this paper, we investigate the causes of the optimization difficulty in the training of DETR. Our examinations reveal several factors contributing to the slow convergence of DETR, primarily the issues with the Hungarian loss and the Transformer cross-attention mechanism. To overcome these issues we propose two solutions, namely, TSP-FCOS (Transformer-based Set Prediction with FCOS) and TSP-RCNN (Transformer-based Set Prediction with RCNN). Experimental results show that the proposed methods not only converge much faster than the original DETR, but also significantly outperform DETR and other baselines in terms of detection accuracy.

Comments:	Accepted to ICCV 2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2011.10881 [cs.CV]
	(or arXiv:2011.10881v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2011.10881

Submission history

From: Zhiqing Sun [view email]
[v1] Sat, 21 Nov 2020 21:59:42 UTC (173 KB)
[v2] Tue, 12 Oct 2021 06:09:03 UTC (3,343 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2020-11

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Zhiqing Sun
Shengcao Cao
Yiming Yang
Kris Kitani

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Rethinking Transformer-based Set Prediction for Object Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Rethinking Transformer-based Set Prediction for Object Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators