CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection

Sun, Zhichao; Hu, Huazhang; Ma, Yidong; Liu, Gang; Chen, Nemo; Tang, Xu; Hu, Yao; Xu, Yongchao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.18430 (cs)

[Submitted on 24 Mar 2025 (v1), last revised 25 Mar 2025 (this version, v2)]

Title:CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection

Authors:Zhichao Sun, Huazhang Hu, Yidong Ma, Gang Liu, Nemo Chen, Xu Tang, Yao Hu, Yongchao Xu

View PDF HTML (experimental)

Abstract:With the exponential growth of data, traditional object detection methods are increasingly struggling to handle vast vocabulary object detection tasks effectively. We analyze two key limitations of classification-based detectors: positive gradient dilution, where rare positive categories receive insufficient learning signals, and hard negative gradient dilution, where discriminative gradients are overwhelmed by numerous easy negatives. To address these challenges, we propose CQ-DINO, a category query-based object detection framework that reformulates classification as a contrastive task between object queries and learnable category queries. Our method introduces image-guided query selection, which reduces the negative space by adaptively retrieving top-K relevant categories per image via cross-attention, thereby rebalancing gradient distributions and facilitating implicit hard example mining. Furthermore, CQ-DINO flexibly integrates explicit hierarchical category relationships in structured datasets (e.g., V3Det) or learns implicit category correlations via self-attention in generic datasets (e.g., COCO). Experiments demonstrate that CQ-DINO achieves superior performance on the challenging V3Det benchmark (surpassing previous methods by 2.1% AP) while maintaining competitiveness in COCO. Our work provides a scalable solution for real-world detection systems requiring wide category coverage. The dataset and code will be publicly at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2503.18430 [cs.CV]
	(or arXiv:2503.18430v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.18430

Submission history

From: Zhichao Sun [view email]
[v1] Mon, 24 Mar 2025 08:22:55 UTC (5,531 KB)
[v2] Tue, 25 Mar 2025 07:39:46 UTC (5,531 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators