Waffling around for Performance: Visual Classification with Random Words and Broad Concepts

Roth, Karsten; Kim, Jae Myung; Koepke, A. Sophia; Vinyals, Oriol; Schmid, Cordelia; Akata, Zeynep

Computer Science > Computer Vision and Pattern Recognition

arXiv:2306.07282 (cs)

[Submitted on 12 Jun 2023 (v1), last revised 17 Aug 2023 (this version, v2)]

Title:Waffling around for Performance: Visual Classification with Random Words and Broad Concepts

Authors:Karsten Roth, Jae Myung Kim, A. Sophia Koepke, Oriol Vinyals, Cordelia Schmid, Zeynep Akata

View PDF

Abstract:The visual classification performance of vision-language models such as CLIP has been shown to benefit from additional semantic knowledge from large language models (LLMs) such as GPT-3. In particular, averaging over LLM-generated class descriptors, e.g. "waffle, which has a round shape", can notably improve generalization performance. In this work, we critically study this behavior and propose WaffleCLIP, a framework for zero-shot visual classification which simply replaces LLM-generated descriptors with random character and word descriptors. Without querying external models, we achieve comparable performance gains on a large number of visual classification tasks. This allows WaffleCLIP to both serve as a low-cost alternative, as well as a sanity check for any future LLM-based vision-language model extensions. We conduct an extensive experimental study on the impact and shortcomings of additional semantics introduced with LLM-generated descriptors, and showcase how - if available - semantic context is better leveraged by querying LLMs for high-level concepts, which we show can be done to jointly resolve potential class name ambiguities. Code is available here: this https URL.

Comments:	Accepted to ICCV 2023. Main paper with 9 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2306.07282 [cs.CV]
	(or arXiv:2306.07282v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2306.07282

Submission history

From: Karsten Roth [view email]
[v1] Mon, 12 Jun 2023 17:59:48 UTC (8,349 KB)
[v2] Thu, 17 Aug 2023 02:27:32 UTC (8,350 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Waffling around for Performance: Visual Classification with Random Words and Broad Concepts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Waffling around for Performance: Visual Classification with Random Words and Broad Concepts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators