Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding

Zhao, Jiahui; Shi, Hao; Cui, Chenrui; Wang, Tianrui; Liu, Hexin; Ni, Zhaoheng; Ye, Lingxuan; Wang, Longbiao

Computer Science > Computation and Language

arXiv:2412.16507 (cs)

[Submitted on 21 Dec 2024 (v1), last revised 24 Dec 2024 (this version, v2)]

Title:Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding

Authors:Jiahui Zhao, Hao Shi, Chenrui Cui, Tianrui Wang, Hexin Liu, Zhaoheng Ni, Lingxuan Ye, Longbiao Wang

View PDF HTML (experimental)

Abstract:Code-switching (CS) automatic speech recognition (ASR) faces challenges due to the language confusion resulting from accents, auditory similarity, and seamless language switches. Adaptation on the pre-trained multi-lingual model has shown promising performance for CS-ASR. In this paper, we adapt Whisper, which is a large-scale multilingual pre-trained speech recognition model, to CS from both encoder and decoder parts. First, we propose an encoder refiner to enhance the encoder's capacity of intra-sentence swithching. Second, we propose using two sets of language-aware adapters with different language prompt embeddings to achieve language-specific decoding information in each decoder layer. Then, a fusion module is added to fuse the language-aware decoding. The experimental results using the SEAME dataset show that, compared with the baseline model, the proposed approach achieves a relative MER reduction of 4.1% and 7.2% on the dev_man and dev_sge test sets, respectively, surpassing state-of-the-art methods. Through experiments, we found that the proposed method significantly improves the performance on non-native language in CS speech, indicating that our approach enables Whisper to better distinguish between the two languages.

Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2412.16507 [cs.CL]
	(or arXiv:2412.16507v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.16507
Journal reference:	ICASSP 2025

Submission history

From: Hao Shi [view email]
[v1] Sat, 21 Dec 2024 07:06:44 UTC (243 KB)
[v2] Tue, 24 Dec 2024 04:08:22 UTC (344 KB)

Computer Science > Computation and Language

Title:Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators