Joint unsupervised and supervised learning for context-aware language identification

Park, Jinseok; Kim, Hyung Yong; Park, Jihwan; Kim, Byeong-Yeol; Choi, Shukjae; Lim, Yunkyu

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2303.16511 (eess)

[Submitted on 29 Mar 2023 (v1), last revised 14 Apr 2023 (this version, v2)]

Title:Joint unsupervised and supervised learning for context-aware language identification

Authors:Jinseok Park, Hyung Yong Kim, Jihwan Park, Byeong-Yeol Kim, Shukjae Choi, Yunkyu Lim

View PDF

Abstract:Language identification (LID) recognizes the language of a spoken utterance automatically. According to recent studies, LID models trained with an automatic speech recognition (ASR) task perform better than those trained with a LID task only. However, we need additional text labels to train the model to recognize speech, and acquiring the text labels is a cost high. In order to overcome this problem, we propose context-aware language identification using a combination of unsupervised and supervised learning without any text labels. The proposed method learns the context of speech through masked language modeling (MLM) loss and simultaneously trains to determine the language of the utterance with supervised learning loss. The proposed joint learning was found to reduce the error rate by 15.6% compared to the same structure model trained by supervised-only learning on a subset of the VoxLingua107 dataset consisting of sub-three-second utterances in 11 languages.

Comments:	Accepted by ICASSP 2023
Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2303.16511 [eess.AS]
	(or arXiv:2303.16511v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2303.16511

Submission history

From: Jinseok Park [view email]
[v1] Wed, 29 Mar 2023 07:39:11 UTC (1,170 KB)
[v2] Fri, 14 Apr 2023 07:27:36 UTC (1,166 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Joint unsupervised and supervised learning for context-aware language identification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Joint unsupervised and supervised learning for context-aware language identification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators