MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning

Ma, Yiwei; Xu, Guohai; Sun, Xiaoshuai; Ji, Jiayi; Lou, Jie; Zhang, Debing; Ji, Rongrong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.20502 (cs)

[Submitted on 26 Mar 2025 (v1), last revised 30 Mar 2025 (this version, v2)]

Title:MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning

Authors:Yiwei Ma, Guohai Xu, Xiaoshuai Sun, Jiayi Ji, Jie Lou, Debing Zhang, Rongrong Ji

View PDF HTML (experimental)

Abstract:Visual instruction tuning (VIT) has emerged as a crucial technique for enabling multi-modal large language models (MLLMs) to follow user instructions adeptly. Yet, a significant gap persists in understanding the attributes of high-quality instruction tuning data and frameworks for its automated selection. To address this, we introduce MLLM-Selector, an automated approach that identifies valuable data for VIT by weighing necessity and diversity. Our process starts by randomly sampling a subset from the VIT data pool to fine-tune a pretrained model, thus creating a seed model with an initial ability to follow instructions. Then, leveraging the seed model, we calculate necessity scores for each sample in the VIT data pool to identify samples pivotal for enhancing model performance. Our findings underscore the importance of mixing necessity and diversity in data choice, leading to the creation of MLLM-Selector, our methodology that fuses necessity scoring with strategic sampling for superior data refinement. Empirical results indicate that within identical experimental conditions, MLLM-Selector surpasses LLaVA-1.5 in some benchmarks with less than 1% of the data and consistently exceeds performance across all validated benchmarks when using less than 50%.

Comments:	Tech Report
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2503.20502 [cs.CV]
	(or arXiv:2503.20502v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.20502

Submission history

From: Yiwei Ma [view email]
[v1] Wed, 26 Mar 2025 12:42:37 UTC (2,370 KB)
[v2] Sun, 30 Mar 2025 03:54:36 UTC (2,371 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators