Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

Zhang, Gongjie; Luo, Zhipeng; Tian, Zichen; Zhang, Jingyi; Zhang, Xiaoqin; Lu, Shijian

Computer Science > Computer Vision and Pattern Recognition

arXiv:2208.11356 (cs)

[Submitted on 24 Aug 2022 (v1), last revised 24 Mar 2023 (this version, v2)]

Title:Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

Authors:Gongjie Zhang, Zhipeng Luo, Zichen Tian, Jingyi Zhang, Xiaoqin Zhang, Shijian Lu

View PDF

Abstract:Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs, especially for the recent Transformer-based detectors. In this paper, we propose Iterative Multi-scale Feature Aggregation (IMFA) -- a generic paradigm that enables efficient use of multi-scale features in Transformer-based object detectors. The core idea is to exploit sparse multi-scale features from just a few crucial locations, and it is achieved with two novel designs. First, IMFA rearranges the Transformer encoder-decoder pipeline so that the encoded features can be iteratively updated based on the detection predictions. Second, IMFA sparsely samples scale-adaptive features for refined detection from just a few keypoint locations under the guidance of prior detection predictions. As a result, the sampled multi-scale features are sparse yet still highly beneficial for object detection. Extensive experiments show that the proposed IMFA boosts the performance of multiple Transformer-based object detectors significantly yet with only slight computational overhead.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
Cite as:	arXiv:2208.11356 [cs.CV]
	(or arXiv:2208.11356v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2208.11356

Submission history

From: Gongjie Zhang [view email]
[v1] Wed, 24 Aug 2022 08:09:25 UTC (5,575 KB)
[v2] Fri, 24 Mar 2023 02:06:36 UTC (5,577 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators