Modular Prompt Learning Improves Vision-Language Models

Huang, Zhenhan; Pedapati, Tejaswini; Chen, Pin-Yu; Gao, Jianxi

Abstract:Pre-trained vision-language models are able to interpret visual concepts and language semantics. Prompt learning, a method of constructing prompts for text encoders or image encoders, elicits the potentials of pre-trained models and readily adapts them to new scenarios. Compared to fine-tuning, prompt learning enables the model to achieve comparable or better performance using fewer trainable parameters. Besides, prompt learning freezes the pre-trained model and avoids the catastrophic forgetting issue in the fine-tuning. Continuous prompts inserted into the input of every transformer layer (i.e. deep prompts) can improve the performances of pre-trained models on downstream tasks. For i-th transformer layer, the inserted prompts replace previously inserted prompts in the $(i-1)$-th layer. Although the self-attention mechanism contextualizes newly inserted prompts for the current layer and embeddings from the previous layer's output, removing all inserted prompts from the previous layer inevitably loses information contained in the continuous prompts. In this work, we propose Modular Prompt Learning (MPL) that is designed to promote the preservation of information contained in the inserted prompts. We evaluate the proposed method on base-to-new generalization and cross-dataset tasks. On average of 11 datasets, our method achieves 0.7% performance gain on the base-to-new generalization task compared to the state-of-the-art method. The largest improvement on the individual dataset is 10.7% (EuroSAT dataset).

Comments:	2025 IEEE International Conference on Acoustics, Speech, and Signal Processing
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2502.14125 [cs.CV]
	(or arXiv:2502.14125v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2502.14125

Computer Science > Computer Vision and Pattern Recognition

Title:Modular Prompt Learning Improves Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators