Semantically-Prompted Language Models Improve Visual Descriptions

Ogezi, Michael; Hauer, Bradley; Kondrak, Grzegorz

doi:10.18653/v1/2024.findings-naacl.267

Computer Science > Computer Vision and Pattern Recognition

arXiv:2306.06077 (cs)

[Submitted on 5 Jun 2023 (v1), last revised 22 Nov 2024 (this version, v4)]

Title:Semantically-Prompted Language Models Improve Visual Descriptions

Authors:Michael Ogezi, Bradley Hauer, Grzegorz Kondrak

View PDF HTML (experimental)

Abstract:Language-vision models like CLIP have made significant strides in vision tasks, such as zero-shot image classification (ZSIC). However, generating specific and expressive visual descriptions remains challenging; descriptions produced by current methods are often ambiguous and lacking in granularity. To tackle these issues, we propose V-GLOSS: Visual Glosses, a novel method built upon two key ideas. The first is Semantic Prompting, which conditions a language model on structured semantic knowledge. The second is a new contrastive algorithm that elicits fine-grained distinctions between similar concepts. With both ideas, we demonstrate that V-GLOSS improves visual descriptions and achieves strong results in the zero-shot setting on general and fine-grained image-classification datasets, including ImageNet, STL-10, FGVC Aircraft, and Flowers 102. Moreover, these descriptive capabilities contribute to enhancing image-generation performance. Finally, we introduce a quality-tested silver dataset with descriptions generated with V-GLOSS for all ImageNet classes.

Comments:	Published at NAACL 2024. See this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2306.06077 [cs.CV]
	(or arXiv:2306.06077v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2306.06077
Journal reference:	In Findings of the Association for Computational Linguistics: NAACL 2024, pages 4285-4302
Related DOI:	https://doi.org/10.18653/v1/2024.findings-naacl.267

Submission history

From: Michael Ogezi [view email]
[v1] Mon, 5 Jun 2023 17:22:54 UTC (8,717 KB)
[v2] Fri, 23 Jun 2023 16:29:51 UTC (1 KB) (withdrawn)
[v3] Tue, 2 Apr 2024 16:19:22 UTC (4,634 KB)
[v4] Fri, 22 Nov 2024 15:58:28 UTC (4,634 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Semantically-Prompted Language Models Improve Visual Descriptions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Semantically-Prompted Language Models Improve Visual Descriptions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators