SKDF: A Simple Knowledge Distillation Framework for Distilling Open-Vocabulary Knowledge to Open-world Object Detector

Ma, Shuailei; Wang, Yuefeng; Wei, Ying; Fan, Jiaqi; Zhang, Enming; Sun, Xinyu; Chen, Peihao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.08653 (cs)

[Submitted on 14 Dec 2023 (v1), last revised 30 Mar 2024 (this version, v2)]

Title:SKDF: A Simple Knowledge Distillation Framework for Distilling Open-Vocabulary Knowledge to Open-world Object Detector

Authors:Shuailei Ma, Yuefeng Wang, Ying Wei, Jiaqi Fan, Enming Zhang, Xinyu Sun, Peihao Chen

View PDF HTML (experimental)

Abstract:In this paper, we attempt to specialize the VLM model for OWOD tasks by distilling its open-world knowledge into a language-agnostic detector. Surprisingly, we observe that the combination of a simple \textbf{knowledge distillation} approach and the automatic pseudo-labeling mechanism in OWOD can achieve better performance for unknown object detection, even with a small amount of data. Unfortunately, knowledge distillation for unknown objects severely affects the learning of detectors with conventional structures for known objects, leading to catastrophic forgetting. To alleviate these problems, we propose the \textbf{down-weight loss function} for knowledge distillation from vision-language to single vision modality. Meanwhile, we propose the \textbf{cascade decouple decoding structure} that decouples the learning of localization and recognition to reduce the impact of category interactions of known and unknown objects on the localization learning process. Ablation experiments demonstrate that both of them are effective in mitigating the impact of open-world knowledge distillation on the learning of known objects. Additionally, to alleviate the current lack of comprehensive benchmarks for evaluating the ability of the open-world detector to detect unknown objects in the open world, we propose two benchmarks, which we name "\textbf{StandardSet}$\heartsuit$" and "\textbf{IntensiveSet}$\spadesuit$" respectively, based on the complexity of their testing scenarios. Comprehensive experiments performed on OWOD, MS-COCO, and our proposed benchmarks demonstrate the effectiveness of our methods. The code and proposed dataset are available at \url{this https URL}.

Comments:	arXiv admin note: substantial text overlap with arXiv:2303.11623
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.08653 [cs.CV]
	(or arXiv:2312.08653v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.08653

Submission history

From: Ying Wei [view email]
[v1] Thu, 14 Dec 2023 04:47:20 UTC (10,972 KB)
[v2] Sat, 30 Mar 2024 06:05:40 UTC (41,474 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SKDF: A Simple Knowledge Distillation Framework for Distilling Open-Vocabulary Knowledge to Open-world Object Detector

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SKDF: A Simple Knowledge Distillation Framework for Distilling Open-Vocabulary Knowledge to Open-world Object Detector

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators