Correction Focused Language Model Training for Speech Recognition

Ma, Yingyi; Liu, Zhe; Kalinli, Ozlem

Computer Science > Computation and Language

arXiv:2310.11003 (cs)

[Submitted on 17 Oct 2023]

Title:Correction Focused Language Model Training for Speech Recognition

Authors:Yingyi Ma, Zhe Liu, Ozlem Kalinli

View PDF

Abstract:Language models (LMs) have been commonly adopted to boost the performance of automatic speech recognition (ASR) particularly in domain adaptation tasks. Conventional way of LM training treats all the words in corpora equally, resulting in suboptimal improvements in ASR performance. In this work, we introduce a novel correction focused LM training approach which aims to prioritize ASR fallible words. The word-level ASR fallibility score, representing the likelihood of ASR mis-recognition, is defined and shaped as a prior word distribution to guide the LM training. To enable correction focused training with text-only corpora, large language models (LLMs) are employed as fallibility score predictors and text generators through multi-task fine-tuning. Experimental results for domain adaptation tasks demonstrate the effectiveness of our proposed method. Compared with conventional LMs, correction focused training achieves up to relatively 5.5% word error rate (WER) reduction in sufficient text scenarios. In insufficient text scenarios, LM training with LLM-generated text achieves up to relatively 13% WER reduction, while correction focused training further obtains up to relatively 6% WER reduction.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2310.11003 [cs.CL]
	(or arXiv:2310.11003v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.11003

Submission history

From: Yingyi Ma [view email]
[v1] Tue, 17 Oct 2023 05:10:39 UTC (251 KB)

Computer Science > Computation and Language

Title:Correction Focused Language Model Training for Speech Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Correction Focused Language Model Training for Speech Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators