Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

Zhang, Zefeng; Tang, Hengzhu; Sheng, Jiawei; Zhang, Zhenyu; Ren, Yiming; Li, Zhenyang; Yin, Dawei; Ma, Duohe; Liu, Tingwen

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.17928 (cs)

[Submitted on 23 Mar 2025]

Title:Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

Authors:Zefeng Zhang, Hengzhu Tang, Jiawei Sheng, Zhenyu Zhang, Yiming Ren, Zhenyang Li, Dawei Yin, Duohe Ma, Tingwen Liu

View PDF HTML (experimental)

Abstract:Multimodal Large Language Models excel in various tasks, yet often struggle with modality bias, where the model tends to rely heavily on a single modality and overlook critical information in other modalities, which leads to incorrect focus and generating irrelevant responses. In this paper, we propose using the paradigm of preference optimization to solve the modality bias problem, including RLAIFVBias, a debiased preference optimization dataset, and a Noise Aware Preference Optimization algorithm. Specifically, we first construct the dataset by introducing perturbations to reduce the informational content of certain modalities, compelling the model to rely on a specific modality when generating negative responses. To address the inevitable noise in automatically constructed data, we combine the noise robust Mean Absolute Error with the Binary Cross Entropy in Direct Preference Optimization by a negative Box Cox transformation, and dynamically adjust the algorithm noise robustness based on the evaluated noise levels in the data. Extensive experiments validate our approach, demonstrating not only its effectiveness in mitigating modality bias but also its significant role in minimizing hallucinations.

Comments:	CVPR 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2503.17928 [cs.CV]
	(or arXiv:2503.17928v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.17928

Submission history

From: Zefeng Zhang [view email]
[v1] Sun, 23 Mar 2025 04:00:11 UTC (1,526 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators