SDVPT: Semantic-Driven Visual Prompt Tuning for Open-World Object Counting

Zhao, Yiming; Li, Guorong; Qing, Laiyun; Beheshti, Amin; Yang, Jian; Sheng, Michael; Qi, Yuankai; Huang, Qingming

Computer Science > Computer Vision and Pattern Recognition

arXiv:2504.17395 (cs)

[Submitted on 24 Apr 2025]

Title:SDVPT: Semantic-Driven Visual Prompt Tuning for Open-World Object Counting

Authors:Yiming Zhao, Guorong Li, Laiyun Qing, Amin Beheshti, Jian Yang, Michael Sheng, Yuankai Qi, Qingming Huang

View PDF HTML (experimental)

Abstract:Open-world object counting leverages the robust text-image alignment of pre-trained vision-language models (VLMs) to enable counting of arbitrary categories in images specified by textual queries. However, widely adopted naive fine-tuning strategies concentrate exclusively on text-image consistency for categories contained in training, which leads to limited generalizability for unseen categories. In this work, we propose a plug-and-play Semantic-Driven Visual Prompt Tuning framework (SDVPT) that transfers knowledge from the training set to unseen categories with minimal overhead in parameters and inference time. First, we introduce a two-stage visual prompt learning strategy composed of Category-Specific Prompt Initialization (CSPI) and Topology-Guided Prompt Refinement (TGPR). The CSPI generates category-specific visual prompts, and then TGPR distills latent structural patterns from the VLM's text encoder to refine these prompts. During inference, we dynamically synthesize the visual prompts for unseen categories based on the semantic correlation between unseen and training categories, facilitating robust text-image alignment for unseen categories. Extensive experiments integrating SDVPT with all available open-world object counting models demonstrate its effectiveness and adaptability across three widely used datasets: FSC-147, CARPK, and PUCPR+.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2504.17395 [cs.CV]
	(or arXiv:2504.17395v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2504.17395

Submission history

From: Yiming Zhao [view email]
[v1] Thu, 24 Apr 2025 09:31:08 UTC (4,754 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SDVPT: Semantic-Driven Visual Prompt Tuning for Open-World Object Counting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SDVPT: Semantic-Driven Visual Prompt Tuning for Open-World Object Counting

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators