Adopting Whisper for Confidence Estimation

Aggarwal, Vaibhav; Nair, Shabari S; Verma, Yash; Jogi, Yash

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2502.13446 (eess)

[Submitted on 19 Feb 2025]

Title:Adopting Whisper for Confidence Estimation

Authors:Vaibhav Aggarwal, Shabari S Nair, Yash Verma, Yash Jogi

View PDF HTML (experimental)

Abstract:Recent research on word-level confidence estimation for speech recognition systems has primarily focused on lightweight models known as Confidence Estimation Modules (CEMs), which rely on hand-engineered features derived from Automatic Speech Recognition (ASR) outputs. In contrast, we propose a novel end-to-end approach that leverages the ASR model itself (Whisper) to generate word-level confidence scores. Specifically, we introduce a method in which the Whisper model is fine-tuned to produce scalar confidence scores given an audio input and its corresponding hypothesis transcript. Our experiments demonstrate that the fine-tuned Whisper-tiny model, comparable in size to a strong CEM baseline, achieves similar performance on the in-domain dataset and surpasses the CEM baseline on eight out-of-domain datasets, whereas the fine-tuned Whisper-large model consistently outperforms the CEM baseline by a substantial margin across all datasets.

Comments:	Accepted at IEEE ICASSP 2025
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
Cite as:	arXiv:2502.13446 [eess.AS]
	(or arXiv:2502.13446v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2502.13446

Submission history

From: Yash Jogi [view email]
[v1] Wed, 19 Feb 2025 05:45:28 UTC (373 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Adopting Whisper for Confidence Estimation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Adopting Whisper for Confidence Estimation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators