Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

Fu, Siming; He, Xiaoxuan; Ding, Xinpeng; Cao, Yuchen; Wang, Hualiang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2308.12522 (cs)

[Submitted on 24 Aug 2023 (v1), last revised 6 Nov 2023 (this version, v2)]

Title:Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

Authors:Siming Fu, Xiaoxuan He, Xinpeng Ding, Yuchen Cao, Hualiang Wang

View PDF

Abstract:Recently, large-scale pre-trained vision-language models have presented benefits for alleviating class imbalance in long-tailed recognition. However, the long-tailed data distribution can corrupt the representation space, where the distance between head and tail categories is much larger than the distance between two tail categories. This uneven feature space distribution causes the model to exhibit unclear and inseparable decision boundaries on the uniformly distributed test set, which lowers its performance. To address these challenges, we propose the uniformly category prototype-guided vision-language framework to effectively mitigate feature space bias caused by data imbalance. Especially, we generate a set of category prototypes uniformly distributed on a hypersphere. Category prototype-guided mechanism for image-text matching makes the features of different classes converge to these distinct and uniformly distributed category prototypes, which maintain a uniform distribution in the feature space, and improve class boundaries. Additionally, our proposed irrelevant text filtering and attribute enhancement module allows the model to ignore irrelevant noisy text and focus more on key attribute information, thereby enhancing the robustness of our framework. In the image recognition fine-tuning stage, to address the positive bias problem of the learnable classifier, we design the class feature prototype-guided classifier, which compensates for the performance of tail classes while maintaining the performance of head classes. Our method outperforms previous vision-language methods for long-tailed learning work by a large margin and achieves state-of-the-art performance.

Comments:	11pages, 5figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
MSC classes:	14J60 (Primary) 14F05, 14J26 (Secondary)
ACM classes:	I.4.10
Cite as:	arXiv:2308.12522 [cs.CV]
	(or arXiv:2308.12522v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.12522
Journal reference:	ACM MM2023

Submission history

From: Siming Fu [view email]
[v1] Thu, 24 Aug 2023 03:21:28 UTC (5,598 KB)
[v2] Mon, 6 Nov 2023 16:16:02 UTC (5,603 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators