Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech

Yan, Bi-Cheng; Jiang, Shao-Wei Fan; Chao, Fu-An; Chen, Berlin

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2108.13816 (eess)

[Submitted on 31 Aug 2021 (v1), last revised 9 Jul 2022 (this version, v4)]

Title:Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech

Authors:Bi-Cheng Yan, Shao-Wei Fan Jiang, Fu-An Chao, Berlin Chen

View PDF

Abstract:End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approach for mispronunciation detection and diagnosis (MDD). Typically, these models are trained by optimizing a cross-entropy criterion, which corresponds to improving the log-likelihood of the training data. However, there is a discrepancy between the objectives of model training and the MDD evaluation, since the performance of an MDD model is commonly evaluated in terms of F1-score instead of phone or word error rate (PER/WER). In view of this, we in this paper explore the use of a discriminative objective function for training E2E MDD models, which aims to maximize the expected F1-score directly. A series of experiments conducted on the L2-ARCTIC dataset show that our proposed method can yield considerable performance improvements in relation to some state-of-the-art E2E MDD approaches and the celebrated GOP method.

Comments:	Accepted by IEEE International Conference on Multimedia and Expo (ICME 2022)
Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2108.13816 [eess.AS]
	(or arXiv:2108.13816v4 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2108.13816

Submission history

From: Bi-Cheng Yan [view email]
[v1] Tue, 31 Aug 2021 13:19:13 UTC (392 KB)
[v2] Thu, 23 Dec 2021 16:07:43 UTC (1 KB) (withdrawn)
[v3] Fri, 15 Apr 2022 16:17:06 UTC (264 KB)
[v4] Sat, 9 Jul 2022 16:45:54 UTC (499 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators