Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation

Zhao, Yunpu; Zhang, Rui; Xiao, Junbin; Hou, Ruibo; Guo, Jiaming; Zhang, Zihao; Hao, Yifan; Chen, Yunji

Computer Science > Computer Vision and Pattern Recognition

arXiv:2504.14848 (cs)

[Submitted on 21 Apr 2025]

Title:Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation

Authors:Yunpu Zhao, Rui Zhang, Junbin Xiao, Ruibo Hou, Jiaming Guo, Zihao Zhang, Yifan Hao, Yunji Chen

View PDF HTML (experimental)

Abstract:Vision-language models (VLMs) excel in various multimodal tasks but frequently suffer from poor calibration, resulting in misalignment between their verbalized confidence and response correctness. This miscalibration undermines user trust, especially when models confidently provide incorrect or fabricated information. In this work, we propose a novel Confidence Calibration through Semantic Perturbation (CSP) framework to improve the calibration of verbalized confidence for VLMs in response to object-centric queries. We first introduce a perturbed dataset where Gaussian noise is applied to the key object regions to simulate visual uncertainty at different confidence levels, establishing an explicit mapping between visual ambiguity and confidence levels. We further enhance calibration through a two-stage training process combining supervised fine-tuning on the perturbed dataset with subsequent preference optimization. Extensive experiments on popular benchmarks demonstrate that our method significantly improves the alignment between verbalized confidence and response correctness while maintaining or enhancing overall task performance. These results highlight the potential of semantic perturbation as a practical tool for improving the reliability and interpretability of VLMs.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2504.14848 [cs.CV]
	(or arXiv:2504.14848v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2504.14848

Submission history

From: Yunpu Zhao [view email]
[v1] Mon, 21 Apr 2025 04:01:22 UTC (8,058 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators