Generate, Transduct, Adapt: Iterative Transduction with VLMs

Saha, Oindrila; Lawrence, Logan; Van Horn, Grant; Maji, Subhransu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.06031 (cs)

[Submitted on 10 Jan 2025]

Title:Generate, Transduct, Adapt: Iterative Transduction with VLMs

Authors:Oindrila Saha, Logan Lawrence, Grant Van Horn, Subhransu Maji

View PDF HTML (experimental)

Abstract:Transductive zero-shot learning with vision-language models leverages image-image similarities within the dataset to achieve better classification accuracy compared to the inductive setting. However, there is little work that explores the structure of the language space in this context. We propose GTA-CLIP, a novel technique that incorporates supervision from language models for joint transduction in language and vision spaces. Our approach is iterative and consists of three steps: (i) incrementally exploring the attribute space by querying language models, (ii) an attribute-augmented transductive inference procedure, and (iii) fine-tuning the language and vision encoders based on inferred labels within the dataset. Through experiments with CLIP encoders, we demonstrate that GTA-CLIP, yields an average performance improvement of 8.6% and 3.7% across 12 datasets and 3 encoders, over CLIP and transductive CLIP respectively in the zero-shot setting. We also observe similar improvements in a few-shot setting. We present ablation studies that demonstrate the value of each step and visualize how the vision and language spaces evolve over iterations driven by the transductive learning.

Comments:	Code will be released at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.06031 [cs.CV]
	(or arXiv:2501.06031v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.06031

Submission history

From: Oindrila Saha [view email]
[v1] Fri, 10 Jan 2025 15:07:57 UTC (28,133 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Generate, Transduct, Adapt: Iterative Transduction with VLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Generate, Transduct, Adapt: Iterative Transduction with VLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators