CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages

Santos, Gabriel Oliveira dos; Moreira, Diego A. B.; Ferreira, Alef Iury; Silva, Jhessica; Pereira, Luiz; Bueno, Pedro; Sousa, Thiago; Maia, Helena; Da Silva, Nádia; Colombini, Esther; Pedrini, Helio; Avila, Sandra

Computer Science > Machine Learning

arXiv:2310.13683 (cs)

[Submitted on 20 Oct 2023 (v1), last revised 23 Oct 2023 (this version, v2)]

Title:CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages

Authors:Gabriel Oliveira dos Santos, Diego A. B. Moreira, Alef Iury Ferreira, Jhessica Silva, Luiz Pereira, Pedro Bueno, Thiago Sousa, Helena Maia, Nádia Da Silva, Esther Colombini, Helio Pedrini, Sandra Avila

View PDF

Abstract:This work introduces CAPIVARA, a cost-efficient framework designed to enhance the performance of multilingual CLIP models in low-resource languages. While CLIP has excelled in zero-shot vision-language tasks, the resource-intensive nature of model training remains challenging. Many datasets lack linguistic diversity, featuring solely English descriptions for images. CAPIVARA addresses this by augmenting text data using image captioning and machine translation to generate multiple synthetic captions in low-resource languages. We optimize the training pipeline with LiT, LoRA, and gradient checkpointing to alleviate the computational cost. Through extensive experiments, CAPIVARA emerges as state of the art in zero-shot tasks involving images and Portuguese texts. We show the potential for significant improvements in other low-resource languages, achieved by fine-tuning the pre-trained multilingual CLIP using CAPIVARA on a single GPU for 2 hours. Our model and code is available at this https URL.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2310.13683 [cs.LG]
	(or arXiv:2310.13683v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2310.13683

Submission history

From: Gabriel Oliveira dos Santos [view email]
[v1] Fri, 20 Oct 2023 17:44:25 UTC (12,004 KB)
[v2] Mon, 23 Oct 2023 17:06:07 UTC (12,004 KB)

Computer Science > Machine Learning

Title:CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators